source: other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json@ 33884

Last change on this file since 33884 was 33884, checked in by ak19, 4 years ago
  1. Previous commit had lots of modifications, and only 2 files matched the simple commit message of clarifications. The code changes in the prev commit were to incorporate the processing of a domains File (of curated sites) and write out all webPages in ach of those sites where isMRI=true. And then calculate a representative sample size n out of N total isMRI webPages, then shuffle that list of isMRI webPages and write out the first n webPage URLs in that list. 1. This commit: incorporating country code alongside URLs as Dr Bainbridge requested.
File size: 19.9 KB
Line 
1/*
2
3db.Websites.aggregate([
4 {
5 $match: {
6 $and: [
7 {numPagesInMRI: {$gt: 0}},
8 {$or: [{geoLocationCountryCode:"NZ"},{domain: /\.nz/}]}
9 ]
10 }
11 },
12 { $unwind: "$geoLocationCountryCode" },
13 {
14 $group: {
15 _id: "nz",
16 count: { $sum: 1 },
17 domain: { $addToSet: '$domain' },
18 numPagesInMRICount: { $sum: '$numPagesInMRI' },
19 numPagesContainingMRICount: { $sum: '$numPagesContainingMRI' }
20 }
21 },
22 { $sort : { count : -1} }
23]);
24
25For sites originating in NZ or with nz TLD, none of the URLs are manually inspected and all URLs are accepted.
26
27For all but NZ, get final column results with:
28 db.getCollection('Websites').find({domain:/coggle\.it/})
29And can check for URLs with:
30 db.getCollection('Webpages').find({URL: /coggle\.it/, isMRI: true})
31
32
33NOTES:
341. DE:
35
36"de","2.0","0+1","9+35 misdetected", http://www.cartogiraffe.com, https://www.cartogiraffe.com,
37Ought to be 2+2 numPagesInMRICount and 9+2 numPagesContainingMRICount:
38- both cartogiraffe.com pages were identical and had mostly MRI sentences with one name not being MRI. So isMRI should have been true for both pages.
39- Only one of the 2 MRI translations of the universal declaration of human rights at http://www.udhr.de got downloaded. A total of 75 pages were downloaded, but more translated pages appeared to be on the webpage. Not sure why the crawl had a _SUCCESS file to indicate completed download.
40- Then http://www.udhr.de had 35-1 non-MRI language translations of the universal declaration of human rights where one or more sentences were misdetected as MRI. With the additional MRI page that didn't get downloaded, should have 9+2 = 11 pages containing MRI.
41
42So instead of
43"de","2.0","1","44", http://www.cartogiraffe.com, https://www.cartogiraffe.com, http://www.udhr.de
44"de","2.0","4","11", http://www.cartogiraffe.com, https://www.cartogiraffe.com, http://www.udhr.de
45
46
47"au","3.0",7+0+1,83+1+3,https://www.kiwiproperty.com, https://infogram.com/te-marautanga-o-aotearoa-moe-pld-allocations-2012-1go502ygvn562jd,https://koreromaori.com
48
492. US:
50aclhokiangarocks.blogspot.com contains at least a page with MRI paragraphs. See http://aclhokiangarocks.blogspot.com/feeds/posts/default under section "Nga Tuhinga o tatou Tupuna"
51Although this page has been crawled by Nutch, the contents were presented in the blog in a complex way and therefore the text wasn't retrieved here. See also the dedicated page this text should have been in http://aclhokiangarocks.blogspot.com/2012/05/nga-tuhinga-o-tatou-tupuna.html
52
53"_id","siteCount","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
54"nz","176.0" containsMRI vs 96 pages inMRI,"4360","9641" in 176 containsMRI pages vs 7968 in isMRI pages
55"us","29.0",
56 1+2+0+0+4+166+0+39 +257+2+21+12+25+13+53+0+1+0+1+11 +32+37+4 +0+0+0 = 681,
57 31+2+2+20+58+166+3+91 +258+2+25+12+66+22+53+6+1+1+2+10 +58+54+6 +1+2+1 = 953,
58 anglicanhistory.org,unicode.org,static-promote.weebly.com,aclhokiangarocks.blogspot.com,bahaiprayers.net,biblehub.com,muhammad.com,godrules.net,m.biblepub.com, krassotkin.ru,gotquestions.org,
59 maorinews.com,maaori.com,kiaorahola.blogspot.com,kjohnsonnz.blogspot.com,pumanawawhangara.blogspot.com,dannykahei.tripod.com,burkekm001.tripod.com,tkkpipipaopao.blogspot.com, manateina.blogspot.com,
60 tatai09.blogspot.com,twttoa.com,tuhua2010.blogspot.com,
61 breaker.audio,drive.google.com/file/d/1NwuzafjddaP8gxI7O_Zapts5bM7mrtwn/preview,in.pinterest.com/pin/317363104978423418/
62"au","2.0","8","86", https://www.kiwiproperty.com, https://koreromaori.com
63"de","2.0","4","11", http://www.cartogiraffe.com, https://www.cartogiraffe.com, http://www.udhr.de
64"dk","2.0","4","7", *.ngapuhitelevision.com, *.ngapuhiradio.com
65"bg","1.0","2","2", http://anitra.net/activism/humanrights/UDHR/mbf_print.htm, http://anitra.net/activism/humanrights/UDHR/rrt_print.htm
66"cz","1.0","0","1", http://www.henryklahola.nazory.cz/094.Maori.htm, http://henryklahola.nazory.cz/094.Maori.htm
67"es","1.0","1","1", https://www.uv.es/~pla/red.net/intmaori.html
68"fr","1.0","1","1", http://chantsdeluttes.free.fr/versionsinter/page%20maori.html
69"ie","1.0","1","3", https://coggle.it/diagram/WSYB0mLA2QABD5BH/t/ko-au-ko-koe
70
71
72--------------
73
74 https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/find-sample-size/#CI1
75 https://stats.stackexchange.com/questions/207584/sample-size-choice-with-binary-outcome
76 https://www.statisticshowto.datasciencecentral.com/z-alpha2-za2/
77
78 N (NZ pages where isMRI comes out true) = 4360
79 solving for n, the sample size
80 confidence level = 90%
81 m, margin of error = 5%
82
83 From the "z alpha/2" table, for 90% confidence, we get a z alpha/2 value of 1.6449 (or 1.645).
84
85 Then the sample size, n, we need is = 1.6449^2 * 4360 / ( 1.6449^2 + (4 * 4359) * 0.05^2) = 255 (rounded up)
86
87
88 For N = 681,
89 sample size n is = 1.6449^2 * 681 / ( 1.6449^2 + (4 * 680) * 0.05^2) = 194 (rounded up)
90
91
92 sample size for NZ: 255 (90% confidence with 5% margine of error, Including a finite correction factor)
93 sample size for US: 194
94
95*/
96
97
98// To add column: "URLs of pages detected as inMRI"
99"_id","siteCount containsMRI","numPagesInMRICount","numPagesContainingMRICount"
100"nz","176.0","4360","9641"
101"us","29.0","681","953"
102"au","2.0","8","86"
103"de","2.0","4","11"
104"dk","2.0","4","7"
105"bg","1.0","2","2"
106"cz","1.0","0","1"
107"es","1.0","1","1"
108"fr","1.0","1","1"
109"ie","1.0","1","3"
110
111Total sites containing MRI: 216
112[of which 96 isMRI sites from NZ]
113Total pages detected as being in MRI: 5062
114Total pages detected as containing MRI sentences: 10706
115
116
117
118NZ - sample 255 pages from:
119/*
120db.Websites.aggregate([
121 {
122 $match: {
123 $and: [
124 {numPagesContainingMRI: {$gt: 0}},
125 {$or: [{geoLocationCountryCode:"NZ"},{domain: /\.nz/}]}
126 ]
127 }
128 },
129 { $unwind: "$geoLocationCountryCode" },
130 {
131 $group: {
132 _id: "nz",
133 count: { $sum: 1 },
134 domain: { $addToSet: '$domain' },
135 numPagesInMRICount: { $sum: '$numPagesInMRI' },
136 numPagesContainingMRICount: { $sum: '$numPagesContainingMRI' }
137 }
138 },
139 { $sort : { count : -1} }
140]);
141
142
143OR is this better (only numPagesINMRI):
144
145db.Websites.aggregate([
146 {
147 $match: {
148 $and: [
149 {numPagesInMRI: {$gt: 0}},
150 {$or: [{geoLocationCountryCode:"NZ"},{domain: /\.nz/}]}
151 ]
152 }
153 },
154 { $unwind: "$geoLocationCountryCode" },
155 {
156 $group: {
157 _id: "nz",
158 count: { $sum: 1 },
159 domain: { $addToSet: '$domain' },
160 numPagesInMRICount: { $sum: '$numPagesInMRI' },
161 numPagesContainingMRICount: { $sum: '$numPagesContainingMRI' }
162 }
163 },
164 { $sort : { count : -1} }
165]);
166*/
167
168num NZ sites with > 0 isMRI pages = 96
169Total numPagesInMRI in NZ sites = 4360
170Total numPagesContainingMRI in NZ sites = 7968
171
172Using the results you get a list of domains that matched. 171 nz domains, though it should be 176? -1
173
174Copy each domain (up to 255 of them) and look for the first 1 or 2 max that matches isMRI:
175
1761. db.getCollection('Webpages').find({URL:/pukekohe.directorybusiness.co.nz/, isMRI: true}) - check it contains a positive number of pages in MRI and check the first 1-2 pages to make sure they are indeed in MRI. Note down the ratio of MRI finds. e.g. 2/2.
177
1782. Find those pages that containsMRI but not isMRI and check if there are indeed sentences in MRI. Note down the ratio for the first 2 pages.
179db.getCollection('Webpages').find({URL:/maori.livingheritage.org.nz/, isMRI: false, containsMRI: true})
180
181
182First column: n pages that are in MRI / n sampled isMRI pages
183Second column: n pages that do contain MRI / n sampled pages that are not isMRI yet contain MRI
184
185/* 1 */
186{
187 "_id" : "nz",
188 "count" : 96.0,
189 "domain" : [
190 "http://www.teipukarea.maori.nz", 3/3 1/3
191 "http://ngatipahauwera.co.nz", 2/2, 2/2
192 "http://www.oag.govt.nz", 2/2 0/2
193 "https://sexualviolence.victimsinfo.govt.nz", 3/3 0/3
194 "http://tmoa.tki.org.nz", 3/3 3/3
195 "http://www.tewhanake.maori.nz", 3/3 2/3
196 "http://www.matarikifestival.org.nz", 4/4 0/3
197 "http://www.otepoti.school.nz", 3/3 0/4
198!! "https://www.maoritelevision.com", 3/4, 0 [no containsMRI outside isMRI pages]
199 "http://pukapuka.nz", 3/3 1/4 [lorem ipsum used on first 3 pages]
200 "http://community.nzdl.org", 3/3 0/3 [containsMRI has detected Te Taka Keegan as MRI sentence]
201!! "http://kmpmusic.co.nz", 0-4/4? [but CD listing of some MRI song titles] 0 [no other pages containsMRI]
202 "http://maori.livingheritage.org.nz", 2/2 2/2
203 "http://pukoro.co.nz", 2/2 0/2
204 "https://register.tpota.org.nz", 0/1 [form] 0/2
205X "https://cdn.tehiku.nz" => DOMAIN: "tehiku.nz", 0/4, 1/3 [but audio content may be in MRI]
206!! "http://www.runanga.co.nz", 3/3 0 [no containsMRI outside isMRI pages]
207! "http://kuraaiwi.maori.nz", 2/4 [navigation only downloaded. But site content checked] 2/3
208 "http://kurataiao.tki.org.nz", 3/3, 1/total 3
209
210!! "http://satellites.co.nz", 3/3 [kpop], 0 [no containsMRI outside isMRI pages]
211 "http://teaohou.natlib.govt.nz", 4/4, 2/4
212 "http://www.tuwharetoa.iwi.nz", 2/3 0/3
213X "http://auturoa.nz", 0/4 0/3 [lots of MRI terms among English] - COMMUNITY
214 "https://www.terito.school.nz", 3/3, 0/2 total
215 "https://ttw1.cwp.govt.nz", 3/3 3/3
216 "https://www.whanau-tahi.school.nz", 4/4, 1/2 total
217 "https://e-ako-pangarau.nzmaths.co.nz", 3/3 total, 1/1 total
218 "https://teaomaori.news", 3/3, 0/1 total
219 "http://tetaurawhiri.govt.nz", 3/3 /3/3 [Māori Language Commission site]
220 "https://www.tuiatematangi.ac.nz", 4/4 3/3
221 "http://animations.tewhanake.maori.nz", 3/3 3/3
222!! "https://www.dnc.org.nz", 1/1 total, 0 [no containsMRI outside isMRI pages]
223!! "http://firstworldwar.tki.org.nz", 3/3, 0 [no containsMRI outside isMRI pages]
224 "http://www.28maoribattalion.org.nz", 3/3, 1/3
225 "http://www.tewikiotereomaori.co.nz", 1/1 total, 3/3
226 "http://www.brettgraham.co.nz", 1/1 total, 0/3
227!! "https://hepatakakupu.nz", 3/3, 0 [no containsMRI outside isMRI pages]
228
229 "http://anglicanprayerbook.nz", 3/3 3/3
230 "http://arataua.nz", 4/4, 2/3
231 "http://blog.teara.govt.nz", 3/3, 0/3 [AS: teara.govt.nz]
232 "http://maori.tki.org.nz", 3/3 3/3
233DONE (with/out www): "http://www.firstworldwar.tki.org.nz",
234X "http://www.topomap.co.nz", 0/2 [all placenames], 0 [no containsMRI outside isMRI pages]
235 "https://paekupu.co.nz", 4/4, 0 [no containsMRI outside isMRI pages]
236 "https://haereheikaiako.co.nz", 1/1, 0 [no containsMRI outside isMRI pages]
237 "https://curriculumtool.education.govt.nz", 4/4, 3/3
238 "http://kurakokiri.maori.nz", 3/3, 3/3 [same nav menus on each page]
239 "http://kete.wcl.govt.nz", 2/5 [first 3 misdetected: Tokelauan (American Samoa), Kiribati, Tongan], 0/3
240 "http://www.kkmmaungarongo.co.nz", 3/3, 3/3
241 "http://www.heartland.co.nz", 3/3, 1/1 total
242 "http://oilcrash.com", 2/2 total, 0/3
243 "http://www.kura-porirua.school.nz", 4/4, 2/3
244 "http://videos.e-agent.nz", [AT: e-agent.nz] 3/3, 3/3 [repeated nav]
245 "https://www.sporty.co.nz", 3/3, 0 [no containsMRI outside isMRI pages]
246 "https://www.tematawai.maori.nz", 3/3, 3/3
247
248 "https://www.terakipaewhenua.school.nz",
249 "http://www.tetaurawhiri.govt.nz",
250 "http://archive.stats.govt.nz",
251 "http://tiritiowaitangi.govt.nz",
252 "http://www.waiata.maori.nz",
253 "http://hana.co.nz",
254 "http://kaupare.co.nz",
255 "http://www.tereowrap.nz",
256 "https://www.e-agent.nz",
257 "http://www.hrc.co.nz",
258 "http://ngatiporoukiponeke.org.nz",
259 "http://rurued.school.nz",
260 "http://www.twtop.school.nz",
261 "https://www.infinite-electronic.nz",
262 "http://www.huri-translations.pf",
263 "https://admin.teara.govt.nz",
264 "https://tiritiowaitangi.govt.nz",
265 "http://www.tmoa.tki.org.nz",
266 "https://www.komako.org.nz",
267 "http://www.wcl.govt.nz",
268 "https://office.e-agent.nz",
269 "http://punareo.co.nz",
270 "http://www.kurakokiri.maori.nz",
271 "https://rapuatearatika.education.govt.nz",
272 "http://tmmkkm.school.nz",
273 "https://www.components-mart.nz",
274 "http://www.cs.waikato.ac.nz",
275 "http://www.kupengahao.co.nz",
276 "https://www.hapuhauora.health.nz",
277 "https://www.lcds-display.nz",
278 "http://waiata.maori.nz",
279 "http://cms.sunsmartschools.co.nz",
280 "http://www.livingheritage.org.nz",
281 "http://kuraproductions.co.nz",
282 "https://keepourmoneyclean.govt.nz",
283 "http://www.tekura.school.nz",
284 "http://www.tkkmmokopuna.school.nz",
285 "http://hangaraumatihiko.tki.org.nz",
286 "http://www.pakanae.maori.nz"
287 ],
288 "numPagesInMRICount" : 4360,
289 "numPagesContainingMRICount" : 7968
290}
291
292----------------------------
293
294/* 1 */
295{
296 "_id" : "nz",
297 "count" : 176.0,
298 "domain" : [
299!! "http://pukekohe.directorybusiness.co.nz", 0/2, 0/2, isMRI = 0!!
300 "http://maori.livingheritage.org.nz", 2/2 2/2
301 "http://pukoro.co.nz", 2/2 0/2
302 "http://www.rakaumanga.school.nz", 0/4 0/4
303 "http://www.ngamanawainc.co.nz", 0/2 0/2
304 "https://office.e-agent.nz",
305 "https://www.components-mart.nz",
306 "http://tmmkkm.school.nz",
307 "http://www.rotoruanz.com",
308 "http://www.huri-translations.pf",
309 "https://admin.teara.govt.nz",
310 "http://hangaraumatihiko.tki.org.nz",
311 "https://sexualviolence.victimsinfo.govt.nz",
312 "http://www.tekura.school.nz",
313 "http://philipbeadle.co.nz",
314 "http://www.cs.waikato.ac.nz",
315 "https://www.hapuhauora.health.nz",
316 "http://cms.sunsmartschools.co.nz",
317 "https://keepourmoneyclean.govt.nz",
318 "http://www.kura-porirua.school.nz",
319 "http://waitarahistory.org.nz",
320 "http://oilcrash.com",
321 "http://videos.e-agent.nz",
322 "https://manawatuheritage.pncc.govt.nz",
323 "https://www.terakipaewhenua.school.nz",
324 "http://dev.nzpcn.org.nz",
325 "https://kotahimiriona.co.nz",
326 "http://kurakokiri.maori.nz",
327 "https://www.sporty.co.nz",
328 "http://kaupare.co.nz",
329 "http://ngatiporoukiponeke.org.nz",
330 "https://www.takitimu.ac.nz",
331 "http://www.tetaurawhiri.govt.nz",
332 "http://www.waiata.maori.nz",
333 "http://conference.tpwt.maori.nz",
334 "http://ngatiwhakaue.iwi.nz",
335 "http://www.nzpcn.org.nz",
336 "http://www.ruralfind.co.nz",
337 "https://www.dnc.org.nz",
338 "https://www.puau.school.nz",
339 "https://kaiiwicamp.nz",
340 "https://www.terito.school.nz",
341 "https://www.pinterest.nz",
342 "https://e-ako-pangarau.nzmaths.co.nz",
343 "http://givealittle.co.nz",
344 "https://teaomaori.news",
345 "https://www.korokikahukura.co.nz",
346 "http://myfathersworld.net.nz",
347 "http://www.firstworldwar.tki.org.nz",
348 "https://www.ashtangatauranga.co.nz",
349 "http://biketorqueyamaha.co.nz",
350 "https://www.rereahu.maori.nz",
351 "http://www.tewikiotereomaori.co.nz",
352 "http://www.brettgraham.co.nz",
353 "http://tewikiotereomaori.nz",
354 "http://anglicanprayerbook.nz",
355 "http://arataua.nz",
356 "http://blog.teara.govt.nz",
357 "http://www.otepoti.school.nz",
358 "http://www.kmk.maori.nz",
359 "http://www.eventcinemas.co.nz",
360 "https://www.stats.govt.nz",
361 "http://www.oag.govt.nz", 2/2 0/2
362 "http://whatonga.school.nz",
363 "http://www.tewhanake.maori.nz",
364 "https://www.maoritelevision.com",
365 "http://kuraaiwi.maori.nz",
366 "http://kurataiao.tki.org.nz",
367 "http://teaohou.natlib.govt.nz",
368 "http://www.tetaumuturunanga.iwi.nz",
369 "http://www.tasteofplenty.co.nz",
370 "http://community.nzdl.org",
371 "https://www.blushandbrows.nz",
372 "https://register.tpota.org.nz",
373 "https://cdn.tehiku.nz",
374 "http://www.wcl.govt.nz",
375 "http://www.jeremybaker.nz",
376 "http://punareo.co.nz",
377 "https://rapuatearatika.education.govt.nz",
378 "http://www.kurakokiri.maori.nz",
379 "https://www.cruisetourstauranga.co.nz",
380 "https://sooty.nz",
381 "http://rakaumanga.school.nz",
382 "https://tiritiowaitangi.govt.nz",
383 "http://www.tmoa.tki.org.nz",
384 "http://www.w3vietnam.org.nz",
385 "https://www.infinite-electronic.nz",
386 "https://www.komako.org.nz",
387 "http://nzpostcard.co.nz",
388 "http://artizani.co.nz",
389 "http://www.finlaysonpark.school.nz",
390 "http://crimson.co.nz",
391 "http://holyspirit.nz",
392 "http://www.tkkmmokopuna.school.nz",
393 "http://www.pakanae.maori.nz",
394 "http://www.teipukarea.maori.nz",
395 "http://archerpix.com",
396 "https://2019.nethui.nz",
397 "http://www.kupengahao.co.nz",
398 "https://www.lcds-display.nz",
399 "http://waiata.maori.nz",
400 "http://kuraproductions.co.nz",
401 "http://www.biketorqueyamaha.co.nz",
402 "http://www.livingheritage.org.nz",
403 "http://www.zoomin.co.nz",
404 "http://rsnz.natlib.govt.nz",
405 "http://otorohanga.directorybusiness.co.nz",
406 "http://reoora.co.nz",
407 "http://w3vietnam.org.nz",
408 "https://rehuamarae.co.nz",
409 "https://www.electionresults.org.nz",
410 "https://www.ngamanawainc.co.nz",
411 "https://www.rotorua-rafting.co.nz",
412 "https://www.taitokerautrust.org.nz",
413 "https://www.wingspan.co.nz",
414 "http://www.kkmmaungarongo.co.nz",
415 "http://kete.wcl.govt.nz",
416 "http://www.heartland.co.nz",
417 "http://www.electionresults.govt.nz",
418 "https://www.tematawai.maori.nz",
419 "http://hana.co.nz",
420 "http://www.tereowrap.nz",
421 "http://rurued.school.nz",
422 "http://www.twtop.school.nz",
423 "http://rexedra.gen.nz",
424 "http://archive.stats.govt.nz",
425 "https://liveresults.co.nz",
426 "https://www.e-agent.nz",
427 "http://tiritiowaitangi.govt.nz",
428 "http://www.hrc.co.nz",
429 "http://animations.tewhanake.maori.nz",
430 "https://interactives.stuff.co.nz",
431 "http://avonside.net",
432 "http://www.methodist.org.nz",
433 "https://www.tasteofplenty.co.nz",
434 "http://www.maoriinvestments.co.nz",
435 "https://m.wairarapatv.co.nz",
436 "http://www.gans.co.nz",
437 "https://ttw1.cwp.govt.nz",
438 "http://ngarauhuia.ngatiapakiterato.iwi.nz",
439 "https://www.tuiatematangi.ac.nz",
440 "http://tetaurawhiri.govt.nz",
441 "http://maori.tki.org.nz",
442 "http://www.topomap.co.nz",
443 "https://www.puhaandpakeha.co.nz",
444 "https://haereheikaiako.co.nz",
445 "https://paekupu.co.nz",
446 "https://curriculumtool.education.govt.nz",
447 "http://firstworldwar.tki.org.nz",
448 "http://www.28maoribattalion.org.nz",
449 "https://hepatakakupu.nz",
450 "https://www.zenbu.co.nz",
451 "http://www.matarikifestival.org.nz",
452 "http://pukapuka.nz",
453 "http://ngatipahauwera.co.nz", 2/2 2/2
454 "http://southerntribes.co.nz",
455 "https://player.vimeo.com",
456 "http://tmoa.tki.org.nz",
457 "http://www.writersfestival.co.nz",
458 "http://talkingtothecan.com",
459 "https://www.whanau-tahi.school.nz",
460 "http://satellites.co.nz",
461 "http://auturoa.nz",
462 "http://www.tuwharetoa.iwi.nz",
463 "http://kmpmusic.co.nz",
464 "http://www.temarareo.org",
465 "http://archive.electionresults.govt.nz",
466 "http://kaiiwicamp.nz",
467 "http://tehauora.org.nz",
468 "http://temahurehure.maori.nz",
469 "http://www.runanga.co.nz"
470 ],
471 "numPagesInMRICount" : 4360,
472 "numPagesContainingMRICount" : 9641
473}
474
475
Note: See TracBrowser for help on using the repository browser.