Changeset 33849
- Timestamp:
- 2020-01-17T22:22:18+13:00 (4 years ago)
- Location:
- other-projects/maori-lang-detection
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/maori-lang-detection/MoreReading/mongodb.txt
r33847 r33849 1154 1154 !! https://www.kiwiproperty.com - e.g. https://www.kiwiproperty.com/the-base/mi/he-paepaki/ has some actual MRI sentences. [Not autotranslated] 1155 1155 ? http://fionajack.net - Wellington gallery of artist. A few occurrences of Kia Ora in a title like context (e.g. "Street Party Kia Ora! Kia Ora!") 1156 !! https://infogram.com/te-marautanga-o-aotearoa-moe-pld-allocations-2012-1go502ygvn562jd - site of individual pages (like docs.google.com). This one has a relevant infogram image.1156 X!! https://infogram.com/te-marautanga-o-aotearoa-moe-pld-allocations-2012-1go502ygvn562jd - site of individual pages (like docs.google.com). This one has a relevant infogram image. But it's English with MRI in the image legend and captions. 1157 1157 !! https://koreromaori.com - some actual Maori language sentences 1158 1158 http://theunderwaterworld.com/Galleries/Roimata/roimata-frame.html - placenames … … 1310 1310 + http://www.unicode.org, [Universal declaration of Human Rights] 1311 1311 + https://static-promote.weebly.com, 1312 + http://aclhokiangarocks.blogspot.com, [often English, but COMMUNITY ]1312 + http://aclhokiangarocks.blogspot.com, [often English, but COMMUNITY. At least short or partial MRI sentences.] 1313 1313 1314 1314 BIBLE/MOHAMMED/BAHAI TRANSLATIONS probably not auto translations: … … 1329 1329 !! https://maorinews.com, 1330 1330 !! http://maaori.com, 1331 !!+ http://kiaorahola.blogspot.com /1331 !!+ http://kiaorahola.blogspot.com, 1332 1332 + https://kjohnsonnz.blogspot.com, 1333 1333 + http://pumanawawhangara.blogspot.com, 1334 1334 + http://dannykahei.tripod.com, 1335 + http://burkekm001.tripod.com 1335 + http://burkekm001.tripod.com, 1336 1336 + http://tkkpipipaopao.blogspot.com, 1337 1337 + http://manateina.blogspot.com, … … 1472 1472 1473 1473 --------------- 1474 All sites except NZ or .nz TLD where containingMRI=true manually inspected. Includes overseas sites with mi in URL path. All NZ sites passed through without inspection. 1474 1475 1475 1476 MANUAL - TOTAL NUM SITES WITH SOME MRI CONTENT BY COUNTRY … … 1489 1490 NZ: 176 1490 1491 US: 25+4 from US with mi in URL path = 29 1491 AU: 31492 AU: 2 1492 1493 DE: 2 1493 1494 DK: 2 … … 1497 1498 FR: 1 1498 1499 IE: 1 1499 TOTAL: 213+4 from US with mi in URL path = 21 71500 TOTAL: 213+4 from US with mi in URL path = 216 1500 1501 1501 1502 -
other-projects/maori-lang-detection/journal-paper/writeup
r33842 r33849 34 34 mri 0.0014 0.0017 0.0012 35 35 36 Over 1400 sites were detected and CommonCrawl returned over 1400 unique site domain containing pages it detected as Maori in the twelve-month period from Sep 2018 to Aug 2019. The above percentages are for the 3 final crawls (June to Aug 2019). Of these 1400 sites, 21 3+3 = 216 sites appeared to contain actual Maori language sentences composed by humans when manually inspected. The percentage of the high-quality web content that is in Maori may therefore be almost an order of magnitude less.36 Over 1400 sites were detected and CommonCrawl returned over 1400 unique site domain containing pages it detected as Maori in the twelve-month period from Sep 2018 to Aug 2019. The above percentages are for the 3 final crawls (June to Aug 2019). Of these 1400 sites, 216 sites appeared to contain actual Maori language sentences composed by humans when manually inspected. The percentage of the high-quality web content that is in Maori may therefore be almost an order of magnitude less. 37 37 38 38
Note:
See TracChangeset
for help on using the changeset viewer.