1 |
|
---|
2 | SUMMARY of the 260 random web page URLs sampled:
|
---|
3 | ================================================
|
---|
4 | * Only NZ and US had genuine pages in MRI
|
---|
5 | * 225 pages were NZ (.nz and NZ origin) and remaining, 35 from US
|
---|
6 | * 2 NZ pages were not in NZ MRI (Rarotongan/Cook Islands Maori page, Tokelauan page),
|
---|
7 | a 3rd had a single sentence in MRI but the rest were links with repeated English anchor text with digit suffixes File###
|
---|
8 |
|
---|
9 | So 222 NZ pages, 35 US web pages were largely in MRI.
|
---|
10 |
|
---|
11 | 11 unique domains from US (10 if mi.wikipedia and mi.m.wikipedia counted as one)
|
---|
12 | 34 unique domains from NZ (35 if admin.teara counted distinct from teara),
|
---|
13 | 33 unique domains from NZ after further skipping site with only a page in Cook Islands Maori in it.
|
---|
14 |
|
---|
15 |
|
---|
16 |
|
---|
17 | NZ sites with many (>=6) sampled pages inMRI are:
|
---|
18 | tmoa.tki.org.nz (83)
|
---|
19 | tetaurawhiri.govt.nz (31)
|
---|
20 | tiritiowaitangi.govt.nz (17)
|
---|
21 | pukoro.co.nz (15)
|
---|
22 | waiata.maori.nz (9)
|
---|
23 | twtop.school.nz (7)
|
---|
24 | paekupu.co.nz (6)
|
---|
25 |
|
---|
26 | Among the US sites those with >=6 sampled pages inMRI are:
|
---|
27 | m.biblepub.com (11 pages), and mi.m.wikipedia.org (8) though mi.m.wiki pages usually have
|
---|
28 | individual words or short phrases in MRI rather than several contiguous sentences or paragraphs.
|
---|
29 |
|
---|
30 |
|
---|
31 | 123 pages' contents are SIGNIFICANTLY_MAORI
|
---|
32 | 35 contain MRI, but it's in NAV (navigation menus) or pictures of non-OCR-ed text, with practically no other text on the page
|
---|
33 | 31 pages have one or more MAORI_PARAGRAPHS, with one or more other paras in other languages
|
---|
34 | 18 pages contain noticeably MIXED_TEXT in MRI and one or more languages within a single paragraph or set of sentences or a single sentence.
|
---|
35 | 15 pages contain POEMS_OR_SONGS
|
---|
36 | 15 pages have a SINGLE_MRI_SENTENCE
|
---|
37 | 13 pages have a set of singleton WORDS in MRI (often MRI language learning sites)
|
---|
38 | 4 contain any LITTLE of any non-navigation TEXT
|
---|
39 | 3 LINK_TEXT
|
---|
40 | 3 pages contain non-nav text in OTHER_LANGUAGES (English, Tokelau, Cook Islands or Rarotongan Maori)
|
---|
41 | = 260 sampled web pages
|
---|
42 |
|
---|