source: other-projects/maori-lang-detection/mongodb-data/counts_allCrawledSites.json@ 33813

Last change on this file since 33813 was 33813, checked in by ak19, 4 years ago

With the bugfix from yesterday and the inclusion of http(s):mi.* type URLs in setting the Websites mongodb collection's urlContainsLangCodeInPath property, and updated/improved mongodb queries and their results I have now regenerated the latest geojson json data and maps.

File size: 2.4 KB
Line 
1/*
2Num websites:
3db.getCollection('Websites').find({}).count()
4= 1445
5
6Num webpages
7db.getCollection('Webpages').find({}).count()
8= 117496
9
10Count of country codes for all sites:
11db.Websites.aggregate([
12
13 { $unwind: "$geoLocationCountryCode" },
14 {
15 $group: {
16 _id: "$geoLocationCountryCode",
17 count: { $sum: 1 }
18 }
19 },
20 { $sort : { count : -1} }
21]);
22*/
23
24/* 1 */
25{
26 "_id" : "US",
27 "count" : 696.0
28}
29
30/* 2 */
31{
32 "_id" : "UNKNOWN",
33 "count" : 173.0
34}
35
36/* 3 */
37{
38 "_id" : "CN",
39 "count" : 125.0
40}
41
42/* 4 */
43{
44 "_id" : "NZ",
45 "count" : 115.0
46}
47
48/* 5 */
49{
50 "_id" : "FR",
51 "count" : 69.0
52}
53
54/* 6 */
55{
56 "_id" : "DE",
57 "count" : 52.0
58}
59
60/* 7 */
61{
62 "_id" : "AU",
63 "count" : 43.0
64}
65
66/* 8 */
67{
68 "_id" : "NL",
69 "count" : 32.0
70}
71
72/* 9 */
73{
74 "_id" : "CA",
75 "count" : 19.0
76}
77
78/* 10 */
79{
80 "_id" : "GB",
81 "count" : 18.0
82}
83
84/* 11 */
85{
86 "_id" : "DK",
87 "count" : 10.0
88}
89
90/* 12 */
91{
92 "_id" : "JP",
93 "count" : 10.0
94}
95
96/* 13 */
97{
98 "_id" : "ES",
99 "count" : 8.0
100}
101
102/* 14 */
103{
104 "_id" : "RU",
105 "count" : 7.0
106}
107
108/* 15 */
109{
110 "_id" : "HK",
111 "count" : 7.0
112}
113
114/* 16 */
115{
116 "_id" : "CZ",
117 "count" : 7.0
118}
119
120/* 17 */
121{
122 "_id" : "UA",
123 "count" : 5.0
124}
125
126/* 18 */
127{
128 "_id" : "IE",
129 "count" : 4.0
130}
131
132/* 19 */
133{
134 "_id" : "SE",
135 "count" : 4.0
136}
137
138/* 20 */
139{
140 "_id" : "IT",
141 "count" : 4.0
142}
143
144/* 21 */
145{
146 "_id" : "RO",
147 "count" : 3.0
148}
149
150/* 22 */
151{
152 "_id" : "SG",
153 "count" : 3.0
154}
155
156/* 23 */
157{
158 "_id" : "AT",
159 "count" : 3.0
160}
161
162/* 24 */
163{
164 "_id" : "CH",
165 "count" : 3.0
166}
167
168/* 25 */
169{
170 "_id" : "IL",
171 "count" : 3.0
172}
173
174/* 26 */
175{
176 "_id" : "IN",
177 "count" : 3.0
178}
179
180/* 27 */
181{
182 "_id" : "PL",
183 "count" : 2.0
184}
185
186/* 28 */
187{
188 "_id" : "ZA",
189 "count" : 2.0
190}
191
192/* 29 */
193{
194 "_id" : "VG",
195 "count" : 2.0
196}
197
198/* 30 */
199{
200 "_id" : "CK",
201 "count" : 2.0
202}
203
204/* 31 */
205{
206 "_id" : "BG",
207 "count" : 1.0
208}
209
210/* 32 */
211{
212 "_id" : "PF",
213 "count" : 1.0
214}
215
216/* 33 */
217{
218 "_id" : "IO",
219 "count" : 1.0
220}
221
222/* 34 */
223{
224 "_id" : "GR",
225 "count" : 1.0
226}
227
228/* 35 */
229{
230 "_id" : "MX",
231 "count" : 1.0
232}
233
234/* 36 */
235{
236 "_id" : "TR",
237 "count" : 1.0
238}
239
240/* 37 */
241{
242 "_id" : "ME",
243 "count" : 1.0
244}
245
246/* 38 */
247{
248 "_id" : "FI",
249 "count" : 1.0
250}
251
252/* 39 */
253{
254 "_id" : "EU",
255 "count" : 1.0
256}
257
258/* 40 */
259{
260 "_id" : "IR",
261 "count" : 1.0
262}
263
264/* 41 */
265{
266 "_id" : "PT",
267 "count" : 1.0
268}
Note: See TracBrowser for help on using the repository browser.