Changeset 33914

Show
Ignore:
Timestamp:
13.02.2020 17:09:07 (5 days ago)
Author:
ak19
Message:

Shortlisted just the domain sites by country into ManualShortlist?2.txt after taking the reingest into MongoDB into account. And then put all these shortlisted domains for which containsMRI=true as per manual inspection into a separate new file.

Location:
other-projects/maori-lang-detection
Files:
1 added
3 modified

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/MoreReading/mongodb.txt

    r33913 r33914  
    11031103- RUSSIA: https://www.gismeteo.lv - misidentification of an email address 
    11041104- JAPAN: http://yutaka.it-n.jp - many pages of scientific names of (plants?) which are often misdetected as MRI 
    1105 !! - Ireland, ie: https://coggle.it 
     1105!! - IRELAND, IE: https://coggle.it 
    11061106- IRAN: https://www.dideo.ir/v/yt/d6cgya0ze-E - video title from MaoriTelevision website 
    11071107- CZECH republic: 
     
    13711371X https://docs.google.com, timetable with occasional Maori language word 
    13721372+ https://drive.google.com, https://drive.google.com/file/d/1NwuzafjddaP8gxI7O_Zapts5bM7mrtwn/preview is an image of Maori number names. But other page on drive.google.com is a NZ certificate or ID (in English) of a person's position. 
    1373 http://ritusehji.blogspot.com - no page with more than 1 sentence detected. But short string of actual MRI content. Educator blog with pictures and English language content. 
     1373~+ http://ritusehji.blogspot.com - no page with more than 1 sentence detected. But short string of actual MRI content. Educator blog with pictures and English language content. 
    13741374 
    13751375 
     
    15411541X https://mi.lawyers.cafe - autotranslated 
    15421542    X https://mi.centr-zashity.ru - same as lawyers.cafe above: autotranslated 
    1543 ! https://policies.oclc.org - not completely translated. Copyright page, privacy statement and cookie statement pages appear to be in Maori. Not sure if autotranslated since other pages aren't available in MI. Dutch equivalent pages seem human translated. 
     1543~! https://policies.oclc.org - not completely translated. Copyright page, privacy statement and cookie statement pages appear to be in Maori. Not sure if autotranslated since other pages aren't available in MI. Dutch equivalent pages seem human translated. 
    15441544X http://jobdescriptionsample.org - autotranslated 
    15451545X http://mi.broadcastbeat.com - autotranslated product site 
     
    16191619   IT, AT, RO, CH, RU, BG, MX, JP, CN, IE, IR, FI same 
    16201620 
    1621 US gained 3: 
    1622 anglican.org (NEW) 
    1623 articles.imperialtometric.com (from CA) 
    1624 daandehn.com (CA) 
     1621US gained 3 + 1 from mi in URL path: 
     1622+ anglican.org (NEW) 
     1623X articles.imperialtometric.com (from CA) 
     1624X daandehn.com (from CA) 
     1625+ kiwiproperty.com (from AU) 
    16251626 
    16261627CA lost 2: 
    1627 articles.imperialtometric.com (to US) 
    1628 daandehn.com (to US) 
     1628X articles.imperialtometric.com (to US) 
     1629X daandehn.com (to US) 
    16291630 
    16301631AU: 
    1631 lost kiwiproperty.com (to US - mi in URL path version file!) 
     1632! lost kiwiproperty.com (to US - mi in URL path version file!) 
    16321633 
    16331634 
    16341635CZ: 
    1635 gained viveipcl.com (from UNKNOWN) 
     1636X gained viveipcl.com (from UNKNOWN) 
    16361637 
    16371638UNKNOWN: 
    1638 gained hitiaotera.com from IL 
     1639X gained hitiaotera.com from IL 
    16391640 
    16401641IL: 
    1641 lost one to (UNKNOWN) 
    1642  
     1642X lost one (hitiaotera.com to UNKNOWN) 
     1643 
     1644 
     1645FINAL SITE COUNT (contain >= 1 page with >= 1 MRI sentence) 
     1646 
     1647DK: 
     1648http://ngapuhiradio.com 
     1649http://ngapuhitelevision.com 
     1650    [http://akona.ngapuhitelevision.com 
     1651    http://waiatarangatiratanga.ngapuhitelevision.com  
     1652    http://jazz.ngapuhitelevision.com 
     1653    http://powhiri.ngapuhitelevision.com 
     1654    http://komisch.ngapuhitelevision.com] 
     1655 
     1656DE 
     1657http://www.udhr.de 
     1658https://www.cartogiraffe.com/ 
     1659 
     1660AU 
     1661https://koreromaori.com 
     1662(https://infogram.com/) 
     1663 
     1664FR 
     1665http://chantsdeluttes.free.fr/ 
     1666 
     1667ES 
     1668https://www.uv.es/ 
     1669 
     1670IE 
     1671https://coggle.it 
     1672 
     1673CZ: 
     1674http://www.henryklahola.nazory.cz 
     1675 
     1676BG: 
     1677http://anitra.net/ 
     1678 
     1679US finals: 
     1680http://anglican.org 
     1681http://anglicanhistory.org 
     1682http://www.unicode.org 
     1683https://static-promote.weebly.com 
     1684http://aclhokiangarocks.blogspot.com 
     1685http://bahaiprayers.net 
     1686https://biblehub.com 
     1687http://www.muhammad.com 
     1688http://www.godrules.net 
     1689http://m.biblepub.com 
     1690http://www.krassotkin.ru 
     1691http://www.gotquestions.org 
     1692https://maorinews.com 
     1693http://maaori.com 
     1694http://kiaorahola.blogspot.com 
     1695https://kjohnsonnz.blogspot.com 
     1696http://pumanawawhangara.blogspot.com 
     1697http://dannykahei.tripod.com 
     1698http://burkekm001.tripod.com 
     1699http://tkkpipipaopao.blogspot.com 
     1700http://manateina.blogspot.com 
     1701http://tatai09.blogspot.com  
     1702http://www.twttoa.com 
     1703http://tuhua2010.blogspot.com 
     1704http://piripi.blogspot.com 
     1705https://www.breaker.audio 
     1706https://drive.google.com 
     1707http://ritusehji.blogspot.com 
     1708https://in.pinterest.com 
     1709 
     171029 
     1711 
     1712https://www.kiwiproperty.com 
     1713http://indigenousblogs.com 
     1714https://mi.m.wikipedia.org, https://mi.wikipedia.org 
     1715http://csunplugged.org, https://www.csunplugged.org 
     1716(https://policies.oclc.org) 
     1717 
     171834 incl with MI in URL Path 
     1719 
     1720 
     1721--------------------- 
     1722NZ: 
     1723    http://www.teipukarea.maori.nz 
     1724        http://ngatipahauwera.co.nz 
     1725        http://www.oag.govt.nz 
     1726        https://sexualviolence.victimsinfo.govt.nz 
     1727        http://tmoa.tki.org.nz 
     1728        http://www.tewhanake.maori.nz 
     1729        http://www.matarikifestival.org.nz 
     1730        http://www.otepoti.school.nz 
     1731        https://www.maoritelevision.com 
     1732        http://pukapuka.nz 
     1733        http://community.nzdl.org 
     1734        http://maori.livingheritage.org.nz [http://www.livingheritage.org.nz] 
     1735        http://pukoro.co.nz 
     1736    https://cdn.tehiku.nz [DOMAIN: tehiku.nz] 
     1737        http://www.runanga.co.nz 
     1738        http://kuraaiwi.maori.nz 
     1739        http://kurataiao.tki.org.nz 
     1740        http://satellites.co.nz 
     1741        http://teaohou.natlib.govt.nz 
     1742        http://www.tuwharetoa.iwi.nz 
     1743        https://www.terito.school.nz 
     1744        https://ttw1.cwp.govt.nz 
     1745        https://www.whanau-tahi.school.nz 
     1746        https://e-ako-pangarau.nzmaths.co.nz 
     1747        https://teaomaori.news 
     1748        http://tetaurawhiri.govt.nz 
     1749        https://www.tuiatematangi.ac.nz 
     1750        http://animations.tewhanake.maori.nz 
     1751        https://www.dnc.org.nz 
     1752        http://firstworldwar.tki.org.nz [http://www.firstworldwar.tki.org.nz] 
     1753        http://www.28maoribattalion.org.nz 
     1754        http://www.tewikiotereomaori.co.nz 
     1755        http://www.brettgraham.co.nz 
     1756        https://hepatakakupu.nz 
     1757    http://anglicanprayerbook.nz 
     1758        http://arataua.nz 
     1759        http://maori.tki.org.nz 
     1760        https://paekupu.co.nz 
     1761        https://haereheikaiako.co.nz 
     1762        https://curriculumtool.education.govt.nz 
     1763        http://kurakokiri.maori.nz [includes: http://www.kurakokiri.maori.nz] 
     1764        http://www.kkmmaungarongo.co.nz 
     1765        http://www.heartland.co.nz 
     1766        http://oilcrash.com 
     1767        http://www.kura-porirua.school.nz 
     1768        https://www.sporty.co.nz 
     1769        https://www.tematawai.maori.nz 
     1770        https://www.terakipaewhenua.school.nz 
     1771        http://www.tetaurawhiri.govt.nz 
     1772        http://archive.stats.govt.nz 
     1773        http://tiritiowaitangi.govt.nz 
     1774        http://www.waiata.maori.nz [includes: http://waiata.maori.nz] 
     1775        http://hana.co.nz 
     1776        http://kaupare.co.nz 
     1777        http://www.tereowrap.nz 
     1778        http://www.hrc.co.nz 
     1779        http://ngatiporoukiponeke.org.nz 
     1780        http://rurued.school.nz 
     1781        http://www.twtop.school.nz 
     1782        http://www.huri-translations.pf 
     1783        https://teara.govt.nz/ [https://admin.teara.govt.nz, http://blog.teara.govt.nz] 
     1784        https://tiritiowaitangi.govt.nz  
     1785        http://www.tmoa.tki.org.nz 
     1786        https://www.komako.org.nz 
     1787        http://www.wcl.govt.nz [included: http://kete.wcl.govt.nz]         
     1788        http://punareo.co.nz 
     1789        https://rapuatearatika.education.govt.nz 
     1790        http://tmmkkm.school.nz 
     1791        http://www.cs.waikato.ac.nz 
     1792        http://www.kupengahao.co.nz 
     1793        https://www.hapuhauora.health.nz 
     1794        http://cms.sunsmartschools.co.nz [http://sunsmartschools.co.nz/] 
     1795        http://kuraproductions.co.nz 
     1796        https://keepourmoneyclean.govt.nz 
     1797        http://www.tekura.school.nz 
     1798        http://www.tkkmmokopuna.school.nz 
     1799        http://hangaraumatihiko.tki.org.nz 
     1800        http://www.pakanae.maori.nz 
     1801 
     1802 
     1803    http://holyspirit.nz 
     1804    https://www.ngamanawainc.co.nz, [includes http://www.ngamanawainc.co.nz] 
     1805    http://www.finlaysonpark.school.nz 
     1806    http://www.w3vietnam.org.nz [includes http://w3vietnam.org.nz] 
     1807    https://www.takitimu.ac.nz 
     1808        https://kotahimiriona.co.nz 
     1809        https://rehuamarae.co.nz 
     1810        http://reoora.co.nz 
     1811 
     1812        https://manawatuheritage.pncc.govt.nz 
     1813        http://rsnz.natlib.govt.nz 
     1814        https://www.taitokerautrust.org.nz 
     1815        http://tewikiotereomaori.nz 
     1816        https://www.korokikahukura.co.nz 
     1817        https://www.pinterest.nz 
     1818        https://www.rereahu.maori.nz 
     1819        http://givealittle.co.nz 
     1820        https://kaiiwicamp.nz [includes http://kaiiwicamp.nz] 
     1821        http://ngarauhuia.ngatiapakiterato.iwi.nz 
     1822        https://m.wairarapatv.co.nz 
     1823 
     1824        http://avonside.net 
     1825        http://www.maoriinvestments.co.nz 
     1826        http://conference.tpwt.maori.nz 
     1827        https://www.puau.school.nz 
     1828        http://tehauora.org.nz 
     1829 
     1830        http://temahurehure.maori.nz 
     1831        http://www.temarareo.org 
     1832        http://www.tetaumuturunanga.iwi.nz 
     1833        http://www.writersfestival.co.nz 
     1834        http://www.kmk.maori.nz 
     1835        https://www.stats.govt.nz [includes http://archive.stats.govt.nz] 
     1836 
     1837+?       http://ngatiwhakaue.iwi.nz 
     1838+?       https://interactives.stuff.co.nz 
     1839+?       http://whatonga.school.nz 
     1840+?       https://player.vimeo.com 
     1841+?       http://southerntribes.co.nz 
     1842 
     1843?X      https://www.e-agent.nz [includes: https://office.e-agent.nz, http://videos.e-agent.nz] 
  • other-projects/maori-lang-detection/mongodb-data/ManualShortlisting.txt

    r33891 r33914  
    17621762        "http://teaohou.natlib.govt.nz", 4/4, 2/4 
    17631763        "http://www.tuwharetoa.iwi.nz", 2/3 0/3 
    1764 +        "http://auturoa.nz", 0/4 0/3 [lots of MRI terms among English] - COMMUNITY (But there are pages inMRI to be found by non-random sampling, e.g. http://auturoa.nz/KarakiaMoKuaToRangiTeRaa.html) 
     1764X        "http://auturoa.nz", 0/4 0/3 [lots of MRI terms among English] - COMMUNITY (But there are pages inMRI to be found by non-random sampling, e.g. http://auturoa.nz/KarakiaMoKuaToRangiTeRaa.html) 
    17651765        "https://www.terito.school.nz", 3/3, 0/2 total 
    17661766        "https://ttw1.cwp.govt.nz", 3/3 3/3 
     
    199119913. GRAND TOTALS 
    19921992 
    1993 Count per country of web SITES that contain at least 1 web page containing at least 1 genuine MRI sentence: 
    1994  
     1993Count per country of web SITES that contain at least 1 web page containing at least 1 genuine MRI sentence. (Number in brackets for overseas is number of sites of that geolocation if nz TLDs were NOT grouped with NZ geolocation under "NZ". Number in brackets for NZ indicates the number of sites that are only of NZ geolocation ignoring nz TLDs hosted overseas.) 
     1994 
     1995OLD 
    19951996countryCode, num manually inspected sites as having pages containing MRI, num sites openNLP detected as having pages containing MRI 
    1996 NZ: 126 actual sites out of 176 detected sites 
    1997 US: 29 actual out of 486 detected sites 
    1998 AU: 2 actual out of 21 detected sites 
     1997NZ: 126 actual sites out of 176 (89) detected sites 
     1998US: 29 actual out of 422 (486) detected sites 
     1999AU: 2 actual out of 5 (21) detected sites 
    19992000DE, Germany: 2 actual out of 27 detected sites 
    20002001DK, Denmark: 2 out of 8 
    20012002BG, Bulgaria: 1 out of 1 
    20022003CZ, Czech Republic: 1 out of 4 
    2003 ES, Spain: 1 out of 7 
    2004 FR, France: 1 out of 36 
     2004ES, Spain: 1 out of 5 (7) 
     2005FR, France: 1 out of 35 (36) 
    20052006IE, Ireland: 1 out of 2 
     2007 
    20062008 
    20072009TOTAL: 166 sites of all the crawled sites where the crawled set of pages per site actually contained at least one sentence in Māori based on manual inspection. 
  • other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2.txt

    r33907 r33914  
    200820083. GRAND TOTALS 
    20092009 
    2010 Count per country of web SITES that contain at least 1 web page containing at least 1 genuine MRI sentence: 
    2011  
     2010Count per country of web SITES that contain at least 1 web page containing at least 1 genuine MRI sentence. (Number in brackets for overseas is number of sites of that geolocation if nz TLDs were NOT grouped with NZ geolocation under "NZ". Number in brackets for NZ indicates the number of sites that are only of NZ geolocation ignoring nz TLDs hosted overseas. Numbers only present where different from counts of site by geolocation, which is the number indicated out of brackets.) 
     2011 
     2012OLD 
    20122013countryCode, num manually inspected sites as having pages containing MRI, num sites openNLP detected as having pages containing MRI 
    2013 NZ: 126 actual sites out of 176 detected sites 
    2014 US: 29 actual out of 486 detected sites 
    2015 AU: 2 actual out of 21 detected sites 
     2014NZ: 126 actual sites out of 176 (89) detected sites 
     2015US: 29 actual out of 422 (486) detected sites 
     2016AU: 2 actual out of 5 (21) detected sites 
    20162017DE, Germany: 2 actual out of 27 detected sites 
    20172018DK, Denmark: 2 out of 8 
    20182019BG, Bulgaria: 1 out of 1 
    20192020CZ, Czech Republic: 1 out of 4 
    2020 ES, Spain: 1 out of 7 
    2021 FR, France: 1 out of 36 
     2021ES, Spain: 1 out of 5 (7) 
     2022FR, France: 1 out of 35 (36) 
     2023IE, Ireland: 1 out of 2 
     2024 
     2025NEW - Adjusted grand totals above with changes to values after reingesting into mongodb (the adjusted values are from section C below). The number in brackets here are the UNIQUE domain names/sites that OpenNLP detected as having pages containing MRI, where different. 
     2026 
     2027countryCode, num manually inspected sites as having pages containing MRI, num sites openNLP detected as having pages containing MRI 
     2028NZ: 124 (113 + 11 non-unique) actual sites out of 176 (159) detected sites 
     2029US: 32 actual out of 422 (405) detected sites 
     2030AU: 1 actual out of 5 detected sites 
     2031DE, Germany: 2 actual out of 26 (24) detected sites 
     2032DK, Denmark: 2 out of 8 
     2033BG, Bulgaria: 1 out of 1 
     2034CZ, Czech Republic: 1 out of 5 (4) 
     2035ES, Spain: 1 out of 5 
     2036FR, France: 1 out of 35 (34) 
    20222037IE, Ireland: 1 out of 2 
    20232038 
     
    20262041 
    20272042======================================== 
     2043Adjusted grand totals in manualShortlisting.txt with the following. 
     2044 
     2045---------------------------------------------------------------------- 
     2046C GEOLOCATION CHANGES AFTER REINGESTING UPON INTRODUCING ANGLICAN.ORG: 
     2047---------------------------------------------------------------------- 
     2048NZ the same as before 
     2049   NL, DE, FR, DK, ES, GB same 
     2050   IT, AT, RO, CH, RU, BG, MX, JP, CN, IE, IR, FI same 
     2051 
     2052US gained 3: 
     2053+ anglican.org (NEW) 
     2054X articles.imperialtometric.com (from CA) 
     2055X daandehn.com (CA) 
     2056 
     2057CA lost 2: 
     2058X articles.imperialtometric.com (to US) 
     2059X daandehn.com (to US) 
     2060 
     2061AU: 
     2062+ ! lost kiwiproperty.com (to US - mi in URL path version file!) 
     2063 
     2064 
     2065CZ: 
     2066X gained viveipcl.com (from UNKNOWN) 
     2067 
     2068UNKNOWN: 
     2069X gained hitiaotera.com from IL 
     2070 
     2071IL: 
     2072X lost one (hitiaotera.com to UNKNOWN) 
     2073 
     2074----------------- 
     2075FINAL SITE COUNT (contain >= 1 page with >= 1 MRI sentence) 
     2076----------------- 
     2077DK (2): 
     2078http://ngapuhiradio.com 
     2079http://ngapuhitelevision.com 
     2080    [http://akona.ngapuhitelevision.com 
     2081    http://waiatarangatiratanga.ngapuhitelevision.com  
     2082    http://jazz.ngapuhitelevision.com 
     2083    http://powhiri.ngapuhitelevision.com 
     2084    http://komisch.ngapuhitelevision.com] 
     2085 
     2086DE (2) 
     2087http://www.udhr.de 
     2088https://www.cartogiraffe.com 
     2089 
     2090AU (1) 
     2091https://koreromaori.com 
     2092 
     2093FR (1) 
     2094http://chantsdeluttes.free.fr 
     2095 
     2096ES (1) 
     2097https://www.uv.es 
     2098 
     2099IE (1) 
     2100https://coggle.it 
     2101 
     2102CZ: (1) 
     2103http://www.henryklahola.nazory.cz 
     2104 
     2105BG: (1) 
     2106http://anitra.net 
     2107 
     2108US finals 31 (33): 
     2109http://anglican.org 
     2110http://anglicanhistory.org 
     2111http://www.unicode.org 
     2112https://static-promote.weebly.com 
     2113http://aclhokiangarocks.blogspot.com 
     2114http://bahaiprayers.net 
     2115https://biblehub.com 
     2116http://www.muhammad.com 
     2117http://www.godrules.net 
     2118http://m.biblepub.com 
     2119http://www.krassotkin.ru 
     2120http://www.gotquestions.org 
     2121https://maorinews.com 
     2122http://maaori.com 
     2123http://kiaorahola.blogspot.com 
     2124https://kjohnsonnz.blogspot.com 
     2125http://pumanawawhangara.blogspot.com 
     2126http://dannykahei.tripod.com 
     2127http://burkekm001.tripod.com 
     2128http://tkkpipipaopao.blogspot.com 
     2129http://manateina.blogspot.com 
     2130http://tatai09.blogspot.com  
     2131http://www.twttoa.com 
     2132http://tuhua2010.blogspot.com 
     2133http://piripi.blogspot.com 
     2134https://drive.google.com 
     2135https://in.pinterest.com 
     2136+? https://www.breaker.audio [AUDIO] 
     2137+X http://ritusehji.blogspot.com 
     213827 (28) 
     2139 
     2140https://www.kiwiproperty.com 
     2141http://indigenousblogs.com 
     2142https://mi.m.wikipedia.org [https://mi.wikipedia.org] 
     2143http://csunplugged.org [includes https://www.csunplugged.org] 
     2144?~ https://policies.oclc.org 
     2145 
     2146+ 4 (5) = 31 (33) incl with MI in URL Path 
     2147 
     2148 
     2149NZ: 113 unique + 11 non-unique 
     2150http://www.teipukarea.maori.nz 
     2151http://ngatipahauwera.co.nz 
     2152http://www.oag.govt.nz 
     2153https://sexualviolence.victimsinfo.govt.nz 
     2154http://tmoa.tki.org.nz 
     2155http://www.tewhanake.maori.nz 
     2156http://www.matarikifestival.org.nz 
     2157http://www.otepoti.school.nz 
     2158https://www.maoritelevision.com 
     2159http://pukapuka.nz 
     2160http://community.nzdl.org 
     2161http://maori.livingheritage.org.nz [http://www.livingheritage.org.nz] 
     2162http://pukoro.co.nz 
     2163https://cdn.tehiku.nz [DOMAIN: tehiku.nz] 
     2164http://www.runanga.co.nz 
     2165http://kuraaiwi.maori.nz 
     2166http://kurataiao.tki.org.nz 
     2167http://satellites.co.nz 
     2168http://teaohou.natlib.govt.nz 
     2169http://www.tuwharetoa.iwi.nz 
     2170https://www.terito.school.nz 
     2171https://ttw1.cwp.govt.nz 
     2172https://www.whanau-tahi.school.nz 
     2173https://e-ako-pangarau.nzmaths.co.nz 
     2174https://teaomaori.news 
     2175http://tetaurawhiri.govt.nz 
     2176https://www.tuiatematangi.ac.nz 
     2177http://animations.tewhanake.maori.nz 
     2178https://www.dnc.org.nz 
     2179http://firstworldwar.tki.org.nz [http://www.firstworldwar.tki.org.nz] 
     2180http://www.28maoribattalion.org.nz 
     2181http://www.tewikiotereomaori.co.nz 
     2182http://www.brettgraham.co.nz 
     2183https://hepatakakupu.nz 
     2184http://anglicanprayerbook.nz 
     2185http://arataua.nz 
     2186http://maori.tki.org.nz 
     2187https://paekupu.co.nz 
     2188https://haereheikaiako.co.nz 
     2189https://curriculumtool.education.govt.nz 
     2190http://kurakokiri.maori.nz [includes: http://www.kurakokiri.maori.nz] 
     2191http://www.kkmmaungarongo.co.nz 
     2192http://www.heartland.co.nz 
     2193http://oilcrash.com 
     2194http://www.kura-porirua.school.nz 
     2195https://www.sporty.co.nz 
     2196https://www.tematawai.maori.nz 
     2197https://www.terakipaewhenua.school.nz 
     2198http://www.tetaurawhiri.govt.nz 
     2199http://archive.stats.govt.nz 
     2200http://tiritiowaitangi.govt.nz 
     2201http://www.waiata.maori.nz [includes: http://waiata.maori.nz] 
     2202http://hana.co.nz 
     2203http://kaupare.co.nz 
     2204http://www.tereowrap.nz 
     2205http://www.hrc.co.nz 
     2206http://ngatiporoukiponeke.org.nz 
     2207http://rurued.school.nz 
     2208http://www.twtop.school.nz 
     2209http://www.huri-translations.pf 
     2210https://teara.govt.nz [https://admin.teara.govt.nz, http://blog.teara.govt.nz] 
     2211https://tiritiowaitangi.govt.nz 
     2212http://www.tmoa.tki.org.nz 
     2213https://www.komako.org.nz 
     2214http://www.wcl.govt.nz [included:http://kete.wcl.govt.nz] 
     2215http://punareo.co.nz 
     2216https://rapuatearatika.education.govt.nz 
     2217http://tmmkkm.school.nz 
     2218http://www.cs.waikato.ac.nz 
     2219http://www.kupengahao.co.nz 
     2220https://www.hapuhauora.health.nz 
     2221http://cms.sunsmartschools.co.nz [http://sunsmartschools.co.nz/] 
     2222http://kuraproductions.co.nz 
     2223https://keepourmoneyclean.govt.nz 
     2224http://www.tekura.school.nz 
     2225http://www.tkkmmokopuna.school.nz 
     2226http://hangaraumatihiko.tki.org.nz 
     2227http://www.pakanae.maori.nz 
     2228--- 78+9 
     2229http://holyspirit.nz 
     2230https://www.ngamanawainc.co.nz [includes http://www.ngamanawainc.co.nz] 
     2231http://www.finlaysonpark.school.nz 
     2232http://www.w3vietnam.org.nz [includes http://w3vietnam.org.nz] 
     2233https://www.takitimu.ac.nz 
     2234https://kotahimiriona.co.nz 
     2235https://rehuamarae.co.nz 
     2236http://reoora.co.nz 
     2237https://manawatuheritage.pncc.govt.nz 
     2238http://rsnz.natlib.govt.nz 
     2239https://www.taitokerautrust.org.nz 
     2240http://tewikiotereomaori.nz 
     2241https://www.korokikahukura.co.nz 
     2242https://www.pinterest.nz 
     2243https://www.rereahu.maori.nz 
     2244http://givealittle.co.nz 
     2245https://kaiiwicamp.nz [includes http://kaiiwicamp.nz] 
     2246http://ngarauhuia.ngatiapakiterato.iwi.nz 
     2247https://m.wairarapatv.co.nz 
     2248http://avonside.net 
     2249http://www.maoriinvestments.co.nz 
     2250http://conference.tpwt.maori.nz 
     2251https://www.puau.school.nz 
     2252http://tehauora.org.nz 
     2253http://temahurehure.maori.nz 
     2254http://www.temarareo.org 
     2255http://www.tetaumuturunanga.iwi.nz 
     2256http://www.writersfestival.co.nz 
     2257http://www.kmk.maori.nz 
     2258https://www.stats.govt.nz [includes http://archive.stats.govt.nz] 
     2259---30+4 
     2260+? http://ngatiwhakaue.iwi.nz 
     2261+? https://interactives.stuff.co.nz 
     2262+? http://whatonga.school.nz 
     2263+? https://player.vimeo.com 
     2264+? http://southerntribes.co.nz 
     2265---78+30+(5)=113 unique + 11 non-unique 
     2266?X https://www.e-agent.nz [includes: https://office.e-agent.nz,http://videos.e-agent.nz]