1 | /*
|
---|
2 | For sites originating in NZ or with nz TLD, none of the URLs are manually inspected and all URLs are accepted.
|
---|
3 |
|
---|
4 | For all but NZ, get final column results with:
|
---|
5 | db.getCollection('Websites').find({domain:/coggle\.it/})
|
---|
6 | And can check for URLs with:
|
---|
7 | db.getCollection('Webpages').find({URL: /coggle\.it/, isMRI: true})
|
---|
8 |
|
---|
9 |
|
---|
10 | NOTES:
|
---|
11 | 1. DE:
|
---|
12 |
|
---|
13 | "de","2.0","0+1","9+35 misdetected", http://www.cartogiraffe.com, https://www.cartogiraffe.com,
|
---|
14 | Ought to be 2+2 numPagesInMRICount and 9+2 numPagesContainingMRICount:
|
---|
15 | - both cartogiraffe.com pages were identical and had mostly MRI sentences with one name not being MRI. So isMRI should have been true for both pages.
|
---|
16 | - Only one of the 2 MRI translations of the universal declaration of human rights at http://www.udhr.de got downloaded. A total of 75 pages were downloaded, but more translated pages appeared to be on the webpage. Not sure why the crawl had a _SUCCESS file to indicate completed download.
|
---|
17 | - Then http://www.udhr.de had 35-1 non-MRI language translations of the universal declaration of human rights where one or more sentences were misdetected as MRI. With the additional MRI page that didn't get downloaded, should have 9+2 = 11 pages containing MRI.
|
---|
18 |
|
---|
19 | So instead of
|
---|
20 | "de","2.0","1","44", http://www.cartogiraffe.com, https://www.cartogiraffe.com, http://www.udhr.de
|
---|
21 | "de","2.0","4","11", http://www.cartogiraffe.com, https://www.cartogiraffe.com, http://www.udhr.de
|
---|
22 |
|
---|
23 |
|
---|
24 | "au","3.0",7+0+1,83+1+3,https://www.kiwiproperty.com, https://infogram.com/te-marautanga-o-aotearoa-moe-pld-allocations-2012-1go502ygvn562jd,https://koreromaori.com
|
---|
25 |
|
---|
26 | 2. US:
|
---|
27 | aclhokiangarocks.blogspot.com contains at least a page with MRI paragraphs. See http://aclhokiangarocks.blogspot.com/feeds/posts/default under section "Nga Tuhinga o tatou Tupuna"
|
---|
28 | Although this page has been crawled by Nutch, the contents were presented in the blog in a complex way and therefore the text wasn't retrieved here. See also the dedicated page this text should have been in http://aclhokiangarocks.blogspot.com/2012/05/nga-tuhinga-o-tatou-tupuna.html
|
---|
29 |
|
---|
30 | "_id","siteCount","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
|
---|
31 | "nz","176.0" containsMRI vs 96 pages inMRI,"4360","9641" in 176 containsMRI pages vs 7968 in isMRI pages
|
---|
32 | "us","29.0",
|
---|
33 | 1+2+0+0+4+166+0+39 +257+2+21+12+25+13+53+0+1+0+1+11 +32+37+4 +0+0+0 = 681,
|
---|
34 | 31+2+2+20+58+166+3+91 +258+2+25+12+66+22+53+6+1+1+2+10 +58+54+6 +1+2+1 = 953,
|
---|
35 | anglicanhistory.org,unicode.org,static-promote.weebly.com,aclhokiangarocks.blogspot.com,bahaiprayers.net,biblehub.com,muhammad.com,godrules.net,m.biblepub.com, krassotkin.ru,gotquestions.org,
|
---|
36 | maorinews.com,maaori.com,kiaorahola.blogspot.com,kjohnsonnz.blogspot.com,pumanawawhangara.blogspot.com,dannykahei.tripod.com,burkekm001.tripod.com,tkkpipipaopao.blogspot.com, manateina.blogspot.com,
|
---|
37 | tatai09.blogspot.com,twttoa.com,tuhua2010.blogspot.com,
|
---|
38 | breaker.audio,drive.google.com/file/d/1NwuzafjddaP8gxI7O_Zapts5bM7mrtwn/preview,in.pinterest.com/pin/317363104978423418/
|
---|
39 | "au","2.0","8","86", https://www.kiwiproperty.com, https://koreromaori.com
|
---|
40 | "de","2.0","4","11", http://www.cartogiraffe.com, https://www.cartogiraffe.com, http://www.udhr.de
|
---|
41 | "dk","2.0","4","7", *.ngapuhitelevision.com, *.ngapuhiradio.com
|
---|
42 | "bg","1.0","2","2", http://anitra.net/activism/humanrights/UDHR/mbf_print.htm, http://anitra.net/activism/humanrights/UDHR/rrt_print.htm
|
---|
43 | "cz","1.0","0","1", http://www.henryklahola.nazory.cz/094.Maori.htm, http://henryklahola.nazory.cz/094.Maori.htm
|
---|
44 | "es","1.0","1","1", https://www.uv.es/~pla/red.net/intmaori.html
|
---|
45 | "fr","1.0","1","1", http://chantsdeluttes.free.fr/versionsinter/page%20maori.html
|
---|
46 | "ie","1.0","1","3", https://coggle.it/diagram/WSYB0mLA2QABD5BH/t/ko-au-ko-koe
|
---|
47 |
|
---|
48 |
|
---|
49 |
|
---|
50 |
|
---|
51 |
|
---|
52 | --------------
|
---|
53 |
|
---|
54 | https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/find-sample-size/#CI1
|
---|
55 | https://stats.stackexchange.com/questions/207584/sample-size-choice-with-binary-outcome
|
---|
56 | https://www.statisticshowto.datasciencecentral.com/z-alpha2-za2/
|
---|
57 |
|
---|
58 | N (NZ pages where isMRI comes out true) = 4360
|
---|
59 | solving for n, the sample size
|
---|
60 | confidence level = 90%
|
---|
61 | m, margin of error = 5%
|
---|
62 |
|
---|
63 | From the "z alpha/2" table, for 90% confidence, we get a z alpha/2 value of 1.6449 (or 1.645).
|
---|
64 |
|
---|
65 | Then the sample size, n, we need is = 1.6449^2 * 4360 / ( 1.6449^2 + (4 * 4359) * 0.05^2) = 255 (rounded up)
|
---|
66 |
|
---|
67 |
|
---|
68 | For N = 681,
|
---|
69 | sample size n is = 1.6449^2 * 681 / ( 1.6449^2 + (4 * 680) * 0.05^2) = 194 (rounded up)
|
---|
70 |
|
---|
71 |
|
---|
72 | sample size for NZ: 255 (90% confidence with 5% margine of error, Including a finite correction factor)
|
---|
73 | sample size for US: 194
|
---|
74 |
|
---|
75 | */
|
---|
76 |
|
---|
77 |
|
---|
78 |
|
---|
79 | "_id","siteCount","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
|
---|
80 | "nz","176.0","4360","9641"
|
---|
81 | "us","29.0","681","953"
|
---|
82 | "au","2.0","8","86"
|
---|
83 | "de","2.0","4","11"
|
---|
84 | "dk","2.0","4","7"
|
---|
85 | "bg","1.0","2","2"
|
---|
86 | "cz","1.0","0","1"
|
---|
87 | "es","1.0","1","1"
|
---|
88 | "fr","1.0","1","1"
|
---|
89 | "ie","1.0","1","3"
|
---|
90 |
|
---|
91 | Total sites containing MRI: 216
|
---|
92 | Total pages detected as being in MRI: 5062
|
---|
93 | Total pages detected as containing MRI sentences: 10706
|
---|
94 |
|
---|
95 |
|
---|
96 |
|
---|
97 | NZ - sample 255 pages from:
|
---|
98 | /*
|
---|
99 | db.Websites.aggregate([
|
---|
100 | {
|
---|
101 | $match: {
|
---|
102 | $and: [
|
---|
103 | {numPagesContainingMRI: {$gt: 0}},
|
---|
104 | {$or: [{geoLocationCountryCode:"NZ"},{domain: /\.nz/}]}
|
---|
105 | ]
|
---|
106 | }
|
---|
107 | },
|
---|
108 | { $unwind: "$geoLocationCountryCode" },
|
---|
109 | {
|
---|
110 | $group: {
|
---|
111 | _id: "nz",
|
---|
112 | count: { $sum: 1 },
|
---|
113 | domain: { $addToSet: '$domain' },
|
---|
114 | numPagesInMRICount: { $sum: '$numPagesInMRI' },
|
---|
115 | numPagesContainingMRICount: { $sum: '$numPagesContainingMRI' }
|
---|
116 | }
|
---|
117 | },
|
---|
118 | { $sort : { count : -1} }
|
---|
119 | ]);
|
---|
120 |
|
---|
121 |
|
---|
122 | OR is this better:
|
---|
123 |
|
---|
124 | db.Websites.aggregate([
|
---|
125 | {
|
---|
126 | $match: {
|
---|
127 | $and: [
|
---|
128 | {numPagesInMRI: {$gt: 0}},
|
---|
129 | {$or: [{geoLocationCountryCode:"NZ"},{domain: /\.nz/}]}
|
---|
130 | ]
|
---|
131 | }
|
---|
132 | },
|
---|
133 | { $unwind: "$geoLocationCountryCode" },
|
---|
134 | {
|
---|
135 | $group: {
|
---|
136 | _id: "nz",
|
---|
137 | count: { $sum: 1 },
|
---|
138 | domain: { $addToSet: '$domain' },
|
---|
139 | numPagesInMRICount: { $sum: '$numPagesInMRI' },
|
---|
140 | numPagesContainingMRICount: { $sum: '$numPagesContainingMRI' }
|
---|
141 | }
|
---|
142 | },
|
---|
143 | { $sort : { count : -1} }
|
---|
144 | ]);
|
---|
145 | */
|
---|
146 |
|
---|
147 | num NZ sites with > 0 isMRI pages = 96
|
---|
148 | Total numPagesInMRI in NZ sites = 4360
|
---|
149 | Total numPagesContainingMRI in NZ sites = 7968
|
---|
150 |
|
---|
151 | Using the results you get a list of domains that matched. 171 nz domains, though it should be 176? -1
|
---|
152 |
|
---|
153 | Copy each domain (up to 255 of them) and look for the first 1 or 2 max that matches isMRI:
|
---|
154 |
|
---|
155 | 1. db.getCollection('Webpages').find({URL:/pukekohe.directorybusiness.co.nz/, isMRI: true}) - check it contains a positive number of pages in MRI and check the first 1-2 pages to make sure they are indeed in MRI. Note down the ratio of MRI finds. e.g. 2/2.
|
---|
156 |
|
---|
157 | 2. Find those pages that containsMRI but not isMRI and check if there are indeed sentences in MRI. Note down the ratio for the first 2 pages.
|
---|
158 | db.getCollection('Webpages').find({URL:/maori.livingheritage.org.nz/, isMRI: false, containsMRI: true})
|
---|
159 |
|
---|
160 |
|
---|
161 |
|
---|
162 | /* 1 */
|
---|
163 | {
|
---|
164 | "_id" : "nz",
|
---|
165 | "count" : 96.0,
|
---|
166 | "domain" : [
|
---|
167 | "http://www.teipukarea.maori.nz", 3/3 1/3
|
---|
168 | "http://ngatipahauwera.co.nz", 2/2, 2/2
|
---|
169 | "http://www.oag.govt.nz", 2/2 0/2
|
---|
170 | "https://sexualviolence.victimsinfo.govt.nz", 3/3 0/3
|
---|
171 | "http://tmoa.tki.org.nz", 3/3 3/3
|
---|
172 | "http://www.tewhanake.maori.nz", 3/3 2/3
|
---|
173 | "http://www.matarikifestival.org.nz", 4/4 0/3
|
---|
174 | "http://www.otepoti.school.nz", 3/3 0/4
|
---|
175 | !! "https://www.maoritelevision.com", 3/4, 0 [no containsMRI outside isMRI pages]
|
---|
176 | "http://pukapuka.nz", 3/3 1/4 [lorem ipsum used on first 3 pages]
|
---|
177 | "http://community.nzdl.org", 3/3 0/3 [containsMRI has detected Te Taka Keegan as MRI sentence]
|
---|
178 | !! "http://kmpmusic.co.nz", 0-4/4? [but CD listing of some MRI song titles] 0 [no other pages containsMRI]
|
---|
179 | "http://maori.livingheritage.org.nz", 2/2 2/2
|
---|
180 | "http://pukoro.co.nz", 2/2 0/2
|
---|
181 | "https://register.tpota.org.nz", 0/1 [form] 0/2
|
---|
182 | X "https://cdn.tehiku.nz" => DOMAIN: "tehiku.nz", 0/4, 1/3 [but audio content may be in MRI]
|
---|
183 | !! "http://www.runanga.co.nz", 3/3 0 [no containsMRI outside isMRI pages]
|
---|
184 | ! "http://kuraaiwi.maori.nz", 2/4 [navigation only downloaded. But site content checked] 2/3
|
---|
185 | "http://kurataiao.tki.org.nz", 3/3, 1/total 3
|
---|
186 |
|
---|
187 | !! "http://satellites.co.nz", 3/3 [kpop], 0 [no containsMRI outside isMRI pages]
|
---|
188 | "http://teaohou.natlib.govt.nz", 4/4, 2/4
|
---|
189 | "http://www.tuwharetoa.iwi.nz", 2/3 0/3
|
---|
190 | X "http://auturoa.nz", 0/4 0/3 [lots of MRI terms among English] - COMMUNITY
|
---|
191 | "https://www.terito.school.nz", 3/3, 0/2 total
|
---|
192 | "https://ttw1.cwp.govt.nz", 3/3 3/3
|
---|
193 | "https://www.whanau-tahi.school.nz", 4/4, 1/2 total
|
---|
194 | "https://e-ako-pangarau.nzmaths.co.nz", 3/3 total, 1/1 total
|
---|
195 | "https://teaomaori.news", 3/3, 0/1 total
|
---|
196 | "http://tetaurawhiri.govt.nz", 3/3 /3/3 [MÄori Language Commission site]
|
---|
197 | "https://www.tuiatematangi.ac.nz", 4/4 3/3
|
---|
198 | "http://animations.tewhanake.maori.nz", 3/3 3/3
|
---|
199 | !! "https://www.dnc.org.nz", 1/1 total, 0 [no containsMRI outside isMRI pages]
|
---|
200 | !! "http://firstworldwar.tki.org.nz", 3/3, 0 [no containsMRI outside isMRI pages]
|
---|
201 | "http://www.28maoribattalion.org.nz", 3/3, 1/3
|
---|
202 | "http://www.tewikiotereomaori.co.nz", 1/1 total, 3/3
|
---|
203 | "http://www.brettgraham.co.nz", 1/1 total, 0/3
|
---|
204 | !! "https://hepatakakupu.nz", 3/3, 0 [no containsMRI outside isMRI pages]
|
---|
205 |
|
---|
206 | "http://anglicanprayerbook.nz", 3/3 3/3
|
---|
207 | "http://arataua.nz", 4/4, 2/3
|
---|
208 | "http://blog.teara.govt.nz", 3/3, 0/3 [AS: teara.govt.nz]
|
---|
209 | "http://maori.tki.org.nz", 3/3 3/3
|
---|
210 | DONE (with/out www): "http://www.firstworldwar.tki.org.nz",
|
---|
211 | X "http://www.topomap.co.nz", 0/2 [all placenames], 0 [no containsMRI outside isMRI pages]
|
---|
212 | "https://paekupu.co.nz", 4/4, 0 [no containsMRI outside isMRI pages]
|
---|
213 | "https://haereheikaiako.co.nz", 1/1, 0 [no containsMRI outside isMRI pages]
|
---|
214 | "https://curriculumtool.education.govt.nz", 4/4, 3/3
|
---|
215 | "http://kurakokiri.maori.nz", 3/3, 3/3 [same nav menus on each page]
|
---|
216 | "http://kete.wcl.govt.nz", 2/5 [first 3 misdetected: Tokelauan (American Samoa), Kiribati, Tongan], 0/3
|
---|
217 | "http://www.kkmmaungarongo.co.nz", 3/3, 3/3
|
---|
218 | "http://www.heartland.co.nz", 3/3, 1/1 total
|
---|
219 | "http://oilcrash.com", 2/2 total, 0/3
|
---|
220 | "http://www.kura-porirua.school.nz", 4/4, 2/3
|
---|
221 | "http://videos.e-agent.nz", [AT: e-agent.nz] 3/3, 3/3 [repeated nav]
|
---|
222 | "https://www.sporty.co.nz", 3/3, 0 [no containsMRI outside isMRI pages]
|
---|
223 | "https://www.tematawai.maori.nz", 3/3, 3/3
|
---|
224 |
|
---|
225 | "https://www.terakipaewhenua.school.nz",
|
---|
226 | "http://www.tetaurawhiri.govt.nz",
|
---|
227 | "http://archive.stats.govt.nz",
|
---|
228 | "http://tiritiowaitangi.govt.nz",
|
---|
229 | "http://www.waiata.maori.nz",
|
---|
230 | "http://hana.co.nz",
|
---|
231 | "http://kaupare.co.nz",
|
---|
232 | "http://www.tereowrap.nz",
|
---|
233 | "https://www.e-agent.nz",
|
---|
234 | "http://www.hrc.co.nz",
|
---|
235 | "http://ngatiporoukiponeke.org.nz",
|
---|
236 | "http://rurued.school.nz",
|
---|
237 | "http://www.twtop.school.nz",
|
---|
238 | "https://www.infinite-electronic.nz",
|
---|
239 | "http://www.huri-translations.pf",
|
---|
240 | "https://admin.teara.govt.nz",
|
---|
241 | "https://tiritiowaitangi.govt.nz",
|
---|
242 | "http://www.tmoa.tki.org.nz",
|
---|
243 | "https://www.komako.org.nz",
|
---|
244 | "http://www.wcl.govt.nz",
|
---|
245 | "https://office.e-agent.nz",
|
---|
246 | "http://punareo.co.nz",
|
---|
247 | "http://www.kurakokiri.maori.nz",
|
---|
248 | "https://rapuatearatika.education.govt.nz",
|
---|
249 | "http://tmmkkm.school.nz",
|
---|
250 | "https://www.components-mart.nz",
|
---|
251 | "http://www.cs.waikato.ac.nz",
|
---|
252 | "http://www.kupengahao.co.nz",
|
---|
253 | "https://www.hapuhauora.health.nz",
|
---|
254 | "https://www.lcds-display.nz",
|
---|
255 | "http://waiata.maori.nz",
|
---|
256 | "http://cms.sunsmartschools.co.nz",
|
---|
257 | "http://www.livingheritage.org.nz",
|
---|
258 | "http://kuraproductions.co.nz",
|
---|
259 | "https://keepourmoneyclean.govt.nz",
|
---|
260 | "http://www.tekura.school.nz",
|
---|
261 | "http://www.tkkmmokopuna.school.nz",
|
---|
262 | "http://hangaraumatihiko.tki.org.nz",
|
---|
263 | "http://www.pakanae.maori.nz"
|
---|
264 | ],
|
---|
265 | "numPagesInMRICount" : 4360,
|
---|
266 | "numPagesContainingMRICount" : 7968
|
---|
267 | }
|
---|
268 |
|
---|
269 | ----------------------------
|
---|
270 |
|
---|
271 | /* 1 */
|
---|
272 | {
|
---|
273 | "_id" : "nz",
|
---|
274 | "count" : 176.0,
|
---|
275 | "domain" : [
|
---|
276 | !! "http://pukekohe.directorybusiness.co.nz", 0/2, 0/2, isMRI = 0!!
|
---|
277 | "http://maori.livingheritage.org.nz", 2/2 2/2
|
---|
278 | "http://pukoro.co.nz", 2/2 0/2
|
---|
279 | "http://www.rakaumanga.school.nz", 0/4 0/4
|
---|
280 | "http://www.ngamanawainc.co.nz", 0/2 0/2
|
---|
281 | "https://office.e-agent.nz",
|
---|
282 | "https://www.components-mart.nz",
|
---|
283 | "http://tmmkkm.school.nz",
|
---|
284 | "http://www.rotoruanz.com",
|
---|
285 | "http://www.huri-translations.pf",
|
---|
286 | "https://admin.teara.govt.nz",
|
---|
287 | "http://hangaraumatihiko.tki.org.nz",
|
---|
288 | "https://sexualviolence.victimsinfo.govt.nz",
|
---|
289 | "http://www.tekura.school.nz",
|
---|
290 | "http://philipbeadle.co.nz",
|
---|
291 | "http://www.cs.waikato.ac.nz",
|
---|
292 | "https://www.hapuhauora.health.nz",
|
---|
293 | "http://cms.sunsmartschools.co.nz",
|
---|
294 | "https://keepourmoneyclean.govt.nz",
|
---|
295 | "http://www.kura-porirua.school.nz",
|
---|
296 | "http://waitarahistory.org.nz",
|
---|
297 | "http://oilcrash.com",
|
---|
298 | "http://videos.e-agent.nz",
|
---|
299 | "https://manawatuheritage.pncc.govt.nz",
|
---|
300 | "https://www.terakipaewhenua.school.nz",
|
---|
301 | "http://dev.nzpcn.org.nz",
|
---|
302 | "https://kotahimiriona.co.nz",
|
---|
303 | "http://kurakokiri.maori.nz",
|
---|
304 | "https://www.sporty.co.nz",
|
---|
305 | "http://kaupare.co.nz",
|
---|
306 | "http://ngatiporoukiponeke.org.nz",
|
---|
307 | "https://www.takitimu.ac.nz",
|
---|
308 | "http://www.tetaurawhiri.govt.nz",
|
---|
309 | "http://www.waiata.maori.nz",
|
---|
310 | "http://conference.tpwt.maori.nz",
|
---|
311 | "http://ngatiwhakaue.iwi.nz",
|
---|
312 | "http://www.nzpcn.org.nz",
|
---|
313 | "http://www.ruralfind.co.nz",
|
---|
314 | "https://www.dnc.org.nz",
|
---|
315 | "https://www.puau.school.nz",
|
---|
316 | "https://kaiiwicamp.nz",
|
---|
317 | "https://www.terito.school.nz",
|
---|
318 | "https://www.pinterest.nz",
|
---|
319 | "https://e-ako-pangarau.nzmaths.co.nz",
|
---|
320 | "http://givealittle.co.nz",
|
---|
321 | "https://teaomaori.news",
|
---|
322 | "https://www.korokikahukura.co.nz",
|
---|
323 | "http://myfathersworld.net.nz",
|
---|
324 | "http://www.firstworldwar.tki.org.nz",
|
---|
325 | "https://www.ashtangatauranga.co.nz",
|
---|
326 | "http://biketorqueyamaha.co.nz",
|
---|
327 | "https://www.rereahu.maori.nz",
|
---|
328 | "http://www.tewikiotereomaori.co.nz",
|
---|
329 | "http://www.brettgraham.co.nz",
|
---|
330 | "http://tewikiotereomaori.nz",
|
---|
331 | "http://anglicanprayerbook.nz",
|
---|
332 | "http://arataua.nz",
|
---|
333 | "http://blog.teara.govt.nz",
|
---|
334 | "http://www.otepoti.school.nz",
|
---|
335 | "http://www.kmk.maori.nz",
|
---|
336 | "http://www.eventcinemas.co.nz",
|
---|
337 | "https://www.stats.govt.nz",
|
---|
338 | "http://www.oag.govt.nz", 2/2 0/2
|
---|
339 | "http://whatonga.school.nz",
|
---|
340 | "http://www.tewhanake.maori.nz",
|
---|
341 | "https://www.maoritelevision.com",
|
---|
342 | "http://kuraaiwi.maori.nz",
|
---|
343 | "http://kurataiao.tki.org.nz",
|
---|
344 | "http://teaohou.natlib.govt.nz",
|
---|
345 | "http://www.tetaumuturunanga.iwi.nz",
|
---|
346 | "http://www.tasteofplenty.co.nz",
|
---|
347 | "http://community.nzdl.org",
|
---|
348 | "https://www.blushandbrows.nz",
|
---|
349 | "https://register.tpota.org.nz",
|
---|
350 | "https://cdn.tehiku.nz",
|
---|
351 | "http://www.wcl.govt.nz",
|
---|
352 | "http://www.jeremybaker.nz",
|
---|
353 | "http://punareo.co.nz",
|
---|
354 | "https://rapuatearatika.education.govt.nz",
|
---|
355 | "http://www.kurakokiri.maori.nz",
|
---|
356 | "https://www.cruisetourstauranga.co.nz",
|
---|
357 | "https://sooty.nz",
|
---|
358 | "http://rakaumanga.school.nz",
|
---|
359 | "https://tiritiowaitangi.govt.nz",
|
---|
360 | "http://www.tmoa.tki.org.nz",
|
---|
361 | "http://www.w3vietnam.org.nz",
|
---|
362 | "https://www.infinite-electronic.nz",
|
---|
363 | "https://www.komako.org.nz",
|
---|
364 | "http://nzpostcard.co.nz",
|
---|
365 | "http://artizani.co.nz",
|
---|
366 | "http://www.finlaysonpark.school.nz",
|
---|
367 | "http://crimson.co.nz",
|
---|
368 | "http://holyspirit.nz",
|
---|
369 | "http://www.tkkmmokopuna.school.nz",
|
---|
370 | "http://www.pakanae.maori.nz",
|
---|
371 | "http://www.teipukarea.maori.nz",
|
---|
372 | "http://archerpix.com",
|
---|
373 | "https://2019.nethui.nz",
|
---|
374 | "http://www.kupengahao.co.nz",
|
---|
375 | "https://www.lcds-display.nz",
|
---|
376 | "http://waiata.maori.nz",
|
---|
377 | "http://kuraproductions.co.nz",
|
---|
378 | "http://www.biketorqueyamaha.co.nz",
|
---|
379 | "http://www.livingheritage.org.nz",
|
---|
380 | "http://www.zoomin.co.nz",
|
---|
381 | "http://rsnz.natlib.govt.nz",
|
---|
382 | "http://otorohanga.directorybusiness.co.nz",
|
---|
383 | "http://reoora.co.nz",
|
---|
384 | "http://w3vietnam.org.nz",
|
---|
385 | "https://rehuamarae.co.nz",
|
---|
386 | "https://www.electionresults.org.nz",
|
---|
387 | "https://www.ngamanawainc.co.nz",
|
---|
388 | "https://www.rotorua-rafting.co.nz",
|
---|
389 | "https://www.taitokerautrust.org.nz",
|
---|
390 | "https://www.wingspan.co.nz",
|
---|
391 | "http://www.kkmmaungarongo.co.nz",
|
---|
392 | "http://kete.wcl.govt.nz",
|
---|
393 | "http://www.heartland.co.nz",
|
---|
394 | "http://www.electionresults.govt.nz",
|
---|
395 | "https://www.tematawai.maori.nz",
|
---|
396 | "http://hana.co.nz",
|
---|
397 | "http://www.tereowrap.nz",
|
---|
398 | "http://rurued.school.nz",
|
---|
399 | "http://www.twtop.school.nz",
|
---|
400 | "http://rexedra.gen.nz",
|
---|
401 | "http://archive.stats.govt.nz",
|
---|
402 | "https://liveresults.co.nz",
|
---|
403 | "https://www.e-agent.nz",
|
---|
404 | "http://tiritiowaitangi.govt.nz",
|
---|
405 | "http://www.hrc.co.nz",
|
---|
406 | "http://animations.tewhanake.maori.nz",
|
---|
407 | "https://interactives.stuff.co.nz",
|
---|
408 | "http://avonside.net",
|
---|
409 | "http://www.methodist.org.nz",
|
---|
410 | "https://www.tasteofplenty.co.nz",
|
---|
411 | "http://www.maoriinvestments.co.nz",
|
---|
412 | "https://m.wairarapatv.co.nz",
|
---|
413 | "http://www.gans.co.nz",
|
---|
414 | "https://ttw1.cwp.govt.nz",
|
---|
415 | "http://ngarauhuia.ngatiapakiterato.iwi.nz",
|
---|
416 | "https://www.tuiatematangi.ac.nz",
|
---|
417 | "http://tetaurawhiri.govt.nz",
|
---|
418 | "http://maori.tki.org.nz",
|
---|
419 | "http://www.topomap.co.nz",
|
---|
420 | "https://www.puhaandpakeha.co.nz",
|
---|
421 | "https://haereheikaiako.co.nz",
|
---|
422 | "https://paekupu.co.nz",
|
---|
423 | "https://curriculumtool.education.govt.nz",
|
---|
424 | "http://firstworldwar.tki.org.nz",
|
---|
425 | "http://www.28maoribattalion.org.nz",
|
---|
426 | "https://hepatakakupu.nz",
|
---|
427 | "https://www.zenbu.co.nz",
|
---|
428 | "http://www.matarikifestival.org.nz",
|
---|
429 | "http://pukapuka.nz",
|
---|
430 | "http://ngatipahauwera.co.nz", 2/2 2/2
|
---|
431 | "http://southerntribes.co.nz",
|
---|
432 | "https://player.vimeo.com",
|
---|
433 | "http://tmoa.tki.org.nz",
|
---|
434 | "http://www.writersfestival.co.nz",
|
---|
435 | "http://talkingtothecan.com",
|
---|
436 | "https://www.whanau-tahi.school.nz",
|
---|
437 | "http://satellites.co.nz",
|
---|
438 | "http://auturoa.nz",
|
---|
439 | "http://www.tuwharetoa.iwi.nz",
|
---|
440 | "http://kmpmusic.co.nz",
|
---|
441 | "http://www.temarareo.org",
|
---|
442 | "http://archive.electionresults.govt.nz",
|
---|
443 | "http://kaiiwicamp.nz",
|
---|
444 | "http://tehauora.org.nz",
|
---|
445 | "http://temahurehure.maori.nz",
|
---|
446 | "http://www.runanga.co.nz"
|
---|
447 | ],
|
---|
448 | "numPagesInMRICount" : 4360,
|
---|
449 | "numPagesContainingMRICount" : 9641
|
---|
450 | }
|
---|
451 |
|
---|
452 |
|
---|