Changeset 33868

Show
Ignore:
Timestamp:
23.01.2020 21:16:44 (5 weeks ago)
Message:

With the updated code for generating the maps from 6a and 6b manual site counts, generated corrected maps for num PAGES in MRI and num PAGES containing MRI and their geojson files. (Also some tabbing to 6table file).

Location:
other-projects/maori-lang-detection/mongodb-data
Files:
1 modified

Unmodified
Removed
• other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json

r33854 r33868
4747
4848
49
50
51
5249--------------
5350
54 https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/find-sample-size/#CI1
55 https://stats.stackexchange.com/questions/207584/sample-size-choice-with-binary-outcome
56 https://www.statisticshowto.datasciencecentral.com/z-alpha2-za2/
57
58 N (NZ pages where isMRI comes out true) = 4360
59 solving for n, the sample size
60 confidence level = 90%
61 m, margin of error = 5%
62
63 From the "z alpha/2" table, for 90% confidence, we get a z alpha/2 value of 1.6449 (or 1.645).
64
65 Then the sample size, n, we need is = 1.6449^2 * 4360 / ( 1.6449^2 + (4 * 4359) * 0.05^2) = 255 (rounded up)
66
67
68 For N = 681,
69 sample size n is = 1.6449^2 * 681 / ( 1.6449^2 + (4 * 680) * 0.05^2) = 194 (rounded up)
70
71
72 sample size for NZ: 255 (90% confidence with 5% margine of error, Including a finite correction factor)
73 sample size for US: 194
51    https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/find-sample-size/#CI1
52    https://stats.stackexchange.com/questions/207584/sample-size-choice-with-binary-outcome
53    https://www.statisticshowto.datasciencecentral.com/z-alpha2-za2/
54
55    N (NZ pages where isMRI comes out true) = 4360
56    solving for n, the sample size
57    confidence level = 90%
58    m, margin of error = 5%
59
60    From the "z alpha/2" table, for 90% confidence, we get a z alpha/2 value of 1.6449 (or 1.645).
61
62    Then the sample size, n, we need is = 1.6449^2 * 4360 / ( 1.6449^2 + (4 * 4359) * 0.05^2) = 255 (rounded up)
63
64
65    For N = 681,
66    sample size n is = 1.6449^2 * 681 / ( 1.6449^2 + (4 * 680) * 0.05^2) = 194 (rounded up)
67
68
69    sample size for NZ: 255 (90% confidence with 5% margine of error, Including a finite correction factor)
70    sample size for US: 194
7471
7572*/

7774
7875
79 ï»¿"_id","siteCount","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
76ï»¿"_id","siteCount containsMRI","numPagesInMRICount","numPagesContainingMRICount","URLs of pages detected as inMRI"
8077"nz","176.0","4360","9641"
8178"us","29.0","681","953"

9087
9188Total sites containing MRI: 216
89[of which 96 isMRI sites from NZ]
9290Total pages detected as being in MRI: 5062
9391Total pages detected as containing MRI sentences: 10706