- Timestamp:
- 2020-02-13T17:09:07+13:00 (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/maori-lang-detection/mongodb-data/ManualShortlisting.txt
r33891 r33914 1762 1762 "http://teaohou.natlib.govt.nz", 4/4, 2/4 1763 1763 "http://www.tuwharetoa.iwi.nz", 2/3 0/3 1764 +"http://auturoa.nz", 0/4 0/3 [lots of MRI terms among English] - COMMUNITY (But there are pages inMRI to be found by non-random sampling, e.g. http://auturoa.nz/KarakiaMoKuaToRangiTeRaa.html)1764 X "http://auturoa.nz", 0/4 0/3 [lots of MRI terms among English] - COMMUNITY (But there are pages inMRI to be found by non-random sampling, e.g. http://auturoa.nz/KarakiaMoKuaToRangiTeRaa.html) 1765 1765 "https://www.terito.school.nz", 3/3, 0/2 total 1766 1766 "https://ttw1.cwp.govt.nz", 3/3 3/3 … … 1991 1991 3. GRAND TOTALS 1992 1992 1993 Count per country of web SITES that contain at least 1 web page containing at least 1 genuine MRI sentence: 1994 1993 Count per country of web SITES that contain at least 1 web page containing at least 1 genuine MRI sentence. (Number in brackets for overseas is number of sites of that geolocation if nz TLDs were NOT grouped with NZ geolocation under "NZ". Number in brackets for NZ indicates the number of sites that are only of NZ geolocation ignoring nz TLDs hosted overseas.) 1994 1995 OLD 1995 1996 countryCode, num manually inspected sites as having pages containing MRI, num sites openNLP detected as having pages containing MRI 1996 NZ: 126 actual sites out of 176 detected sites1997 US: 29 actual out of 4 86detected sites1998 AU: 2 actual out of 21detected sites1997 NZ: 126 actual sites out of 176 (89) detected sites 1998 US: 29 actual out of 422 (486) detected sites 1999 AU: 2 actual out of 5 (21) detected sites 1999 2000 DE, Germany: 2 actual out of 27 detected sites 2000 2001 DK, Denmark: 2 out of 8 2001 2002 BG, Bulgaria: 1 out of 1 2002 2003 CZ, Czech Republic: 1 out of 4 2003 ES, Spain: 1 out of 72004 FR, France: 1 out of 3 62004 ES, Spain: 1 out of 5 (7) 2005 FR, France: 1 out of 35 (36) 2005 2006 IE, Ireland: 1 out of 2 2007 2006 2008 2007 2009 TOTAL: 166 sites of all the crawled sites where the crawled set of pages per site actually contained at least one sentence in MÄori based on manual inspection.
Note:
See TracChangeset
for help on using the changeset viewer.