With the updated code for generating the maps from 6a and 6b manual site counts, generated corrected maps for num PAGES in MRI and num PAGES containing MRI and their geojson files.

other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json

54 https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/find-sample-size/#CI1
55 https://stats.stackexchange.com/questions/207584/sample-size-choice-with-binary-outcome
56 https://www.statisticshowto.datasciencecentral.com/z-alpha2-za2/
57
58 N (NZ pages where isMRI comes out true) = 4360
59 solving for n, the sample size
60 confidence level = 90%
61 m, margin of error = 5%
62
63 From the "z alpha/2" table, for 90% confidence, we get a z alpha/2 value of 1.6449 (or 1.645).
64
65 Then the sample size, n, we need is = 1.6449^2 * 4360 / ( 1.6449^2 + (4 * 4359) * 0.05^2) = 255 (rounded up)
66
67
68 For N = 681,
69 sample size n is = 1.6449^2 * 681 / ( 1.6449^2 + (4 * 680) * 0.05^2) = 194 (rounded up)
70
71
72 sample size for NZ: 255 (90% confidence with 5% margine of error, Including a finite correction factor)
73 sample size for US: 194
"_id","siteCount","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
"_id","siteCount containsMRI","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
8077"nz","176.0","4360","9641"
8178"us","29.0","681","953"

9188Total sites containing MRI: 216
89[of which 96 isMRI sites from NZ]
9290Total pages detected as being in MRI: 5062
9391Total pages detected as containing MRI sentences: 10706