source: other-projects/maori-lang-detection/MoreReading/mongodb.txt@ 33710

Last change on this file since 33710 was 33710, checked in by ak19, 4 years ago

Working queries and map coords for geojson.tools (ironically, Lat and Lng are different from Google maps).

File size: 26.5 KB
RevLine 
[33644]1MongoDB
2Installation:
3 https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
4 https://docs.mongodb.com/manual/administration/install-on-linux/
5 https://hevodata.com/blog/install-mongodb-on-ubuntu/
6 https://www.digitalocean.com/community/tutorials/how-to-install-mongodb-on-ubuntu-16-04
7 CENTOS (Analytics): https://tecadmin.net/install-mongodb-on-centos/
8 FROM SOURCE: https://github.com/mongodb/mongo/wiki/Build-Mongodb-From-Source
9GUI:
10 https://robomongo.org/
11 Robomongo is Robo 3T now
12
13https://www.tutorialspoint.com/mongodb/mongodb_java.htm
14JAR FILE:
15 http://central.maven.org/maven2/org/mongodb/mongo-java-driver/
16 https://mongodb.github.io/mongo-java-driver/
17
18
19
20https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
21http://www.programmersought.com/article/6500308940/
22
23 52 sudo apt-get install mongodb-clients
24 53 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
25
26Failed with
27 Error: HostAndPort: host is empty at src/mongo/shell/mongo.js:148
28 exception: connect failed
29
30This is due to a version incompatibility between Client and mongodb Server.
31The solution is to follow instructions at http://www.programmersought.com/article/6500308940/
32and then https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
33as below:
34
35 54 sudo apt-get purge mongodb-clients
36 55 sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
37 56 echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
38 57 sudo apt-get update
39 58 sudo apt-get install mongodb-clients
40 59 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
41(still doesn't work)
42 60 sudo apt-get install -y mongodb-org
43The above ensures an up to date mongo client but installs the mongodb server too. Maybe this is the only step that is needed to install up-to-date mongo client and mongodb server?
44 72 sudo service mongod status
45
46 103 sudo service mongod start
47"mongod" stands for mongo-daemon. This runs the mongo db server listening for client connections
48 104 sudo service mongod status
49 88 sudo service mongod stop
50
51
52DETAILS:
53
54wharariki:[879]/Scratch/ak19/gs3-extensions/maori-lang-detection>mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
55
56didn't work with the pwd. Failed with:
57
58 MongoDB shell version: 2.6.10
59 Enter password:
60 connecting to: mongodb://mongodb.cms.waikato.ac.nz:27017
61 2019-11-04T20:02:47.970+1300 Assertion: 13110:HostAndPort: host is empty
62 2019-11-04T20:02:47.970+1300 0x6b75c9 0x659e9f 0x636f69 0x4fa55c 0x501249 0x4fa7f1 0x6006fd 0x5eb869 0x7f7bfbd47d76 0x1f3c10d06362
63 mongo(_ZN5mongo15printStackTraceERSo+0x39) [0x6b75c9]
64 mongo(_ZN5mongo10logContextEPKc+0x21f) [0x659e9f]
65 mongo(_ZN5mongo11msgassertedEiPKc+0xd9) [0x636f69]
66 mongo(_ZN5mongo16ConnectionString12_fillServersENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x50c) [0x4fa55c]
67 mongo(_ZN5mongo16ConnectionStringC1ENS0_14ConnectionTypeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_+0x99) [0x501249]
68 mongo(_ZN5mongo16ConnectionString5parseERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERS6_+0x201) [0x4fa7f1]
69 mongo(_ZN5mongo17mongoConsExternalEPNS_7V8ScopeERKN2v89ArgumentsE+0x11d) [0x6006fd]
70 mongo(_ZN5mongo7V8Scope10v8CallbackERKN2v89ArgumentsE+0xa9) [0x5eb869]
71 /usr/lib/libv8.so.3.14.5(+0x99d76) [0x7f7bfbd47d76]
72 [0x1f3c10d06362]
73 2019-11-04T20:02:47.971+1300 Error: HostAndPort: host is empty at src/mongo/shell/mongo.js:148
74 exception: connect failed
75
76
77This is due to a version incompatibility between Client and mongodb Server.
78Can find client version above. (2.6.10)
79Server version can be found by running the mongo client shell. Doing so without loading a db:
80
81
82 wharariki:[880]/Scratch/ak19/gs3-extensions/maori-lang-detection>mongo --shell -nodb
83 MongoDB shell version: 2.6.10 <<<<<<<<<-------------------<<<< MONGO CLIENT VERSION
84 type "help" for help
85 > help
86 db.help() help on db methods
87 db.mycoll.help() help on collection methods
88 sh.help() sharding helpers
89 rs.help() replica set helpers
90 help admin administrative help
91 help connect connecting to a db help
92 help keys key shortcuts
93 help misc misc things to know
94 help mr mapreduce
95
96 show dbs show database names
97 show collections show collections in current database
98 show users show users in current database
99 show profile show most recent system.profile entries with time >= 1ms
100 show logs show the accessible logger names
101 show log [name] prints out the last segment of log in memory, 'global' is default
102 use <db_name> set current database
103 db.foo.find() list objects in collection foo
104 db.foo.find( { a : 1 } ) list objects in foo where a == 1
105 it result of the last line evaluated; use to further iterate
106 DBQuery.shellBatchSize = x set default number of items to display on shell
107 exit quit the mongo shell
108
109 > help connect
110
111 Normally one specifies the server on the mongo shell command line. Run mongo --help to see those options.
112 Additional connections may be opened:
113
114 var x = new Mongo('host[:port]');
115 var mydb = x.getDB('mydb');
116 or
117 var mydb = connect('host[:port]/mydb');
118
119 Note: the REPL prompt only auto-reports getLastError() for the shell command line connection.
120
121 Getting help on connect options:
122
123 > var x = new Mongo('mongodb.cms.waikato.ac.nz:27017');
124 > var mydb = x.getDB('anupama');
125
126 > mydb.connect.help()
127 DBCollection help
128 db.connect.find().help() - show DBCursor help
129 db.connect.count()
130 db.connect.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.
131 db.connect.convertToCapped(maxBytes) - calls {convertToCapped:'connect', size:maxBytes}} command
132 db.connect.dataSize()
133 db.connect.distinct( key ) - e.g. db.connect.distinct( 'x' )
134 db.connect.drop() drop the collection
135 db.connect.dropIndex(index) - e.g. db.connect.dropIndex( "indexName" ) or db.connect.dropIndex( { "indexKey" : 1 } )
136 db.connect.dropIndexes()
137 db.connect.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups
138 db.connect.reIndex()
139 db.connect.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.
140 e.g. db.connect.find( {x:77} , {name:1, x:1} )
141 db.connect.find(...).count()
142 db.connect.find(...).limit(n)
143 db.connect.find(...).skip(n)
144 db.connect.find(...).sort(...)
145 db.connect.findOne([query])
146 db.connect.findAndModify( { update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )
147 db.connect.getDB() get DB object associated with collection
148 db.connect.getPlanCache() get query plan cache associated with collection
149 db.connect.getIndexes()
150 db.connect.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
151 db.connect.insert(obj)
152 db.connect.mapReduce( mapFunction , reduceFunction , <optional params> )
153 db.connect.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor
154 db.connect.remove(query)
155 db.connect.renameCollection( newName , <dropTarget> ) renames the collection.
156 db.connect.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
157 db.connect.save(obj)
158 db.connect.stats()
159 db.connect.storageSize() - includes free space allocated to this collection
160 db.connect.totalIndexSize() - size in bytes of all the indexes
161 db.connect.totalSize() - storage allocated for all data and indexes
162 db.connect.update(query, object[, upsert_bool, multi_bool]) - instead of two flags, you can pass an object with fields: upsert, multi
163 db.connect.validate( <full> ) - SLOW
164 db.connect.getShardVersion() - only for use with sharding
165 db.connect.getShardDistribution() - prints statistics about data distribution in the cluster
166 db.connect.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function
167 db.connect.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set
168 db.connect.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection
169 db.connect.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection
170 > mydb.version()
171 4.0.13 <<<<<<<<<-------------------<<<< MONGODB SERVER VERSION
172
173(Check Mongo server version: https://stackoverflow.com/questions/38160412/how-to-find-the-exact-version-of-installed-mongodb)
174
175Finally we now know the mongodb server version: 4.0.13
176This version doesn't work with our mongo client (shell) version of 2.6.10.
177
178
179DETAILS OF INSTALLING MONGO-CLIENT AND UPDATING IT, AND INSTALLING MONGODB SERVER:
180
181
182 54 sudo apt-get purge mongodb-clients
183 55 sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
184 56 echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
185 57 sudo apt-get update
186 58 sudo apt-get install mongodb-clients
187 59 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
188 60 sudo apt-get install -y mongodb-org
189 61 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
190 62 sudo service apache2 status
191 63 sudo service sshd status
192 64 sudo service mongodb status
193 65 sudo service mongo status
194 66 mongod
195 67 mongod --help
196 68 mongod --help | less
197 69 mongod -f /etc/mongod.conf
198 70 sudo mongod -f /etc/mongod.conf
199 71 less /etc/mongod.conf
200 72 sudo service mongod status
201 73 sudo service mongod start
202 74 sudo service mongod status
203 75 ls -l /var/log/mongodb/mongod.log
204 76 sudo rm /var/log/mongodb/mongod.log
205 77 sudo service mongod status
206 78 sudo service mongod start
207 79 sudo service mongod status
208 80 sudo service mongod stop
209 81 ps auxww | grep mongo
210 82 sudo service mongod start
211 83 sudo service mongod status
212 84 ps auxww | grep mongo
213 85 sudo dmsg
214 86 sudo dmesg
215 87 sudo service mongod status
216 88 sudo service mongod stop
217 89 sudo service mongod start
218 90 sudo dmesg
219 91 sudo less /var/log/mongodb/mongod.log
220 92 ls /var/lib/
221 93 ls -ld /var/lib/
222 94 ls -l /var/log/mongodb/mongod.log
223 95 ls -ld /var/lib/
224 96 groups mongodb
225 97 less /etc/mongod.conf
226 98 sudo less /var/log/mongodb/mongod.log
227 99 less /etc/mongod.conf
228 100 ls -l /var/lib/mongodb/
229 101 sudo chown -R mongodb /var/lib/mongodb/
230 102 sudo chgrp -R mongodb /var/lib/mongodb/
231 103 sudo service mongod start
232 104 sudo service mongod status
233 105 history
234
235
236
237MONGO DB ROBO 3T
2381. Download "Double Pack" from https://robomongo.org/
2392. Untar its contents. Then untar the tarball in that.
2403. Run:
241 wharariki:[110]~/Downloads/robo3t-1.3.1-linux-x86_64-7419c406>./bin/robo3t
242
243===================
244On analytics, vagrant node1, we've installed the mongodb server and client.
245We're able to successfully create collections on here.
246
247
248vagrant@node1:~$ mongo
249MongoDB shell version v4.2.1
250connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
251Implicit session: session { "id" : UUID("87bb585c-4685-47f6-bf89-a93801daeb2d") }
252MongoDB server version: 4.2.1
253Server has startup warnings:
2542019-11-04T07:48:14.197+0000 I STORAGE [initandlisten]
2552019-11-04T07:48:14.198+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2562019-11-04T07:48:14.198+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2572019-11-04T07:48:14.624+0000 I CONTROL [initandlisten]
2582019-11-04T07:48:14.624+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2592019-11-04T07:48:14.624+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2602019-11-04T07:48:14.624+0000 I CONTROL [initandlisten]
261---
262Enable MongoDB's free cloud-based monitoring service, which will then receive and display
263metrics about your deployment (disk utilization, CPU, operation statistics, etc).
264
265The monitoring data will be available on a MongoDB website with a unique URL accessible to you
266and anyone you share the URL with. MongoDB may use this information to make product
267improvements and to suggest MongoDB products and deployment options to you.
268
269To enable free monitoring, run the following command: db.enableFreeMonitoring()
270To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
271---
272
273> show dbs
274admin 0.000GB
275config 0.000GB
276local 0.000GB
277> use db ateacrawldata
2782019-11-05T05:24:20.155+0000 E QUERY [js] Error: [db ateacrawldata] is not a valid database name :
279Mongo.prototype.getDB@src/mongo/shell/mongo.js:51:12
280getDatabase@src/mongo/shell/session.js:913:28
281DB.prototype.getSiblingDB@src/mongo/shell/db.js:22:12
282shellHelper.use@src/mongo/shell/utils.js:803:10
283shellHelper@src/mongo/shell/utils.js:790:15
284@(shellhelp2):1:1
285> db.createCollection('webpages');
286{ "ok" : 1 }
[33646]287> db.webpages.drop();
[33644]288... ^C
289
290> db.webpages.drop();
291true
292> use ateacrawldata
293switched to db ateacrawldata
294> db.createCollection('webpages');
295{ "ok" : 1 }
296> show collections
297webpages
298> db.createCollection('websites');
299{ "ok" : 1 }
300>
301
302------------------------
303
304Ask Clint to rename "anupama" database to "ateacrawldata" database following the instructions at:
305 https://stackoverflow.com/questions/9201832/how-do-you-rename-a-mongodb-database
306I don't have permissions to do this.
307Nor do I have permissions to create Mongo collections within a new database that I create, like ateacrawldata.
308I only seem to have rights to the "anupama" database.
309
310
[33646]311
312-----------------------
313
314MONGODB QUERIES:
315
316db.getCollection('webpages').find({"isMRI": true, "singleSentences.langCode": "mri"})
317db.getCollection('webpages').find({"singleSentences": { $elemMatch: {"langCode":"mri"} } }, {"singleSentences.$": "mri"})
[33653]318db.getCollection('Webpages').find({"isMRI": true, "singleSentences": { $elemMatch: {"langCode":"eng"} } }, {"singleSentences.$": "eng"}) [single English lang sentence]
319db.getCollection('Webpages').find({"containsMRI": true, "singleSentences": { $elemMatch: {"langCode":"mri"} } }, {"singleSentences.$": "mri"}) [gets 1st sentence of docs which have sentences containing MRI]
[33646]320
321
322READING
323
324mongodb java convert class
325https://www.quora.com/What-are-the-ways-of-converting-a-Java-object-to-a-MongoDB-document-and-vice-versa
326https://stackoverflow.com/questions/39320825/pojo-to-org-bson-document-and-vice-versa
327X https://mongodb.github.io/morphia/
328https://stackoverflow.com/questions/10170506/inserting-java-object-to-mongodb-collection-using-java
329X https://www.google.com/search?q=morphia+example&oq=morphia+example&aqs=chrome.0.0l6.4223j0j9&sourceid=chrome&ie=UTF-8
330https://www.baeldung.com/mongodb-morphia
331X https://web.archive.org/web/20171117121335/http://mongodb.github.io/morphia/1.3/getting-started/
332=> https://morphia.dev/1.4/getting-started/quick-tour/
333https://github.com/MorphiaOrg/morphia/tree/master/docs/reference
334
335
336mongodb querying
337https://docs.mongodb.com/manual/tutorial/query-embedded-documents/
338https://docs.mongodb.com/manual/tutorial/query-arrays/
339https://www.google.com/search?q=mongodb+find+subdocument&oq=mongodb+find+&aqs=chrome.0.69i59j69i57j0l4.7607j1j8&sourceid=chrome&ie=UTF-8
340https://stackoverflow.com/questions/25586901/how-to-find-document-and-single-subdocument-matching-given-criterias-in-mongodb
341https://stackoverflow.com/questions/21113543/mongodb-get-subdocument
342https://stackoverflow.com/questions/36948856/find-subdocuments-in-mongo
343https://docs.mongodb.com/v3.0/reference/operator/projection/positional/#proj._S_
344https://www.google.com/search?q=mongodb+query+tutorial&oq=mongodb+query+tutorial&aqs=chrome..69i57j0l2j69i60l3.4719j0j7&sourceid=chrome&ie=UTF-8
345https://blog.exploratory.io/an-introduction-to-mongodb-query-for-beginners-bd463319aa4c
346https://docs.mongodb.com/manual/reference/method/db.collection.find/
347https://docs.mongodb.com/manual/reference/method/db.collection.find/#find-projection
[33698]348https://stackoverflow.com/questions/39641925/mongodb-aggregation-framework-to-get-frequencies-of-fields-values
[33666]349
[33698]350https://exploratory.io/note/kanaugust/0961813761939766
351https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/
352https://docs.mongodb.com/manual/aggregation/
353
354
[33675]355Mongo Studio 3T documentation:
356https://studio3t.com/download/ (also has uninstall information)
357https://studio3t.com/download-thank-you/?OS=x64
[33666]358
[33675]359Google: MongoDB visualization
360MongoDB visualization map
361MongoDB Charts
362 (Open source visualisation tools)
363
364json map visualizer
365 geojson.tools
[33666]366-------------------
367
368Some queries with results:
369
370# Num websites
371db.getCollection('Websites').find({}).count()
3721446
373
374# Num webpages
375db.getCollection('Webpages').find({}).count()
[33675]376X75139
377117496
[33666]378
379# Find number of websites who have 1 or more pages in Maori (a positive numPagesInMRI)
380db.getCollection('Websites').find({numPagesInMRI: { $gt: 0}}).count()
381361
382
383# Find number of webpages that are deemed to be overall in MRI (pages where isMRI=true)
384db.getCollection('Webpages').find({isMRI:true}).count()
385X5224
[33675]386X5215
387db.getCollection('Webpages').find({isMRI:true}).count()
3887818
[33666]389
390# Number of pages that contain any number of MRI sentences
391db.getCollection('Webpages').find({containsMRI: true}).count()
[33675]392X12858
39320371
[33666]394
[33675]395
[33666]396# Number of sites with URLs containing /mi(/)
397db.getCollection('Websites').find({urlContainsLangCodeInpath:true}).count()
398153
399
400# Number of websites that are outside NZ that contain /mi(/) in any of its sub-urls
401db.getCollection('Websites').find({urlContainsLangCodeInpath:true, geoLocationCountryCode: {$ne : "NZ"} }).count()
402148
403
404# 5 sites with URLs containing /mi(/) that are in NZ
405db.getCollection('Websites').find({urlContainsLangCodeInpath:true, geoLocationCountryCode: "NZ"}).count()
4065
407
408# sort websites that contain /mi(/) in path by geoLocationCountryCode
409# https://www.quackit.com/mongodb/tutorial/mongodb_sort_query_results.cfm
410db.getCollection('Websites').find({urlContainsLangCodeInpath:true}).sort({geoLocationCountryCode: 1})
411
[33675]412Actually, I want to sort by count. See https://docs.mongodb.com/manual/reference/operator/aggregation/sortByCount/
[33666]413
[33675]414
[33698]415# PROJECTION:
416db.getCollection('Websites').find({geoLocationCountryCode: {$ne:"nz"}}, {geoLocationCountryCode:1, urlContainsLangCodeInpath: 1})
[33675]417
[33698]418https://docs.mongodb.com/manual/aggregation/
[33710]419EXAMPLE:
[33698]420db.orders.aggregate([
421 { $match: { status: "A" } },
422 { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
423])
424
[33710]425X db.Websites.aggregate([{ $match:{urlContainsLangCodeInPath:true}}, $group: {geoLocationCountryCode:1, total: $count}])
[33698]426
[33710]427
428X db.Websites.aggregate([
429 { $match:{urlContainsLangCodeInPath:true}},
430 {$group: {geoLocationCountryCode:1}}
431])
432
433WORKS (but an "unwind" will get rid of "null"):
434db.Websites.aggregate([
435 { $match:{urlContainsLangCodeInPath:true}},
436 {$group: {_id: "$geoLocationCountryCode", count: {$sum: 1}}},
437 { $sort : { count : -1} }
438])
439
440
441# COUNT OF ALL GEOLOCATION COUNTRIES
442#https://stackoverflow.com/questions/14924495/mongodb-count-num-of-distinct-values-per-field-key
443 # LIST
444 db.Websites.distinct('geoLocationCountryCode');
445
446 # COUNT
447 db.Websites.distinct('geoLocationCountryCode').length;
448
449 # A COUNT WITH QUERY - https://docs.mongodb.com/manual/reference/command/distinct/#dbcmd.distinct
450
451 db.runCommand ( { distinct: "Websites", key: "geoLocationCountryCode", query: { "urlContainsLangCodeInPath": true} } );
452
453 # DISTINCT WITH QUERY WITHOUT COUNT - https://docs.mongodb.com/manual/reference/method/db.collection.distinct/
454 db.Websites.distinct('geoLocationCountryCode', {"urlContainsLangCodeInPath": true});
455
456 #SORTED - https://stackoverflow.com/questions/4759437/get-distinct-values-with-sorted-data
457 db.Websites.distinct('geoLocationCountryCode', {"urlContainsLangCodeInPath": true}).sort();
458
459
460# AGGREGATION QUERIES THAT WORK:
461#https://stackoverflow.com/questions/14924495/mongodb-count-num-of-distinct-values-per-field-key
462
463db.Websites.aggregate([
464 {
465 $match: {
466 urlContainsLangCodeInPath: true
467 }
468 },
469 { $unwind: "$geoLocationCountryCode" },
470 {
471 $group: {
472 _id: {$toLower: '$geoLocationCountryCode'},
473 count: { $sum: 1 }
474 }
475 },
476 { $sort : { count : -1} },
477 { $limit : 100 }
478]);
479
480
481WORKS:
482db.Websites.aggregate([
483 {
484 $match: {
485 geoLocationCountryCode: {$ne : "UNKNOWN"}
486 }
487 },
488 { $unwind: "$geoLocationCountryCode" },
489 {
490 $group: {
491 _id: "$geoLocationCountryCode",
492 count: { $sum: 1 }
493 }
494 },
495 { $sort : { count : -1} },
496 { $limit : 100 }
497]);
498
499WORKS:
500db.Websites.aggregate([
501 {
502 $match: {
503 "urlContainsLangCodeInPath": true
504 }
505 },
506 { $unwind: "$geoLocationCountryCode" },
507 {
508 $group: {
509 _id: "$geoLocationCountryCode",
510 count: { $sum: 1 }
511 }
512 },
513 { $sort : { count : -1} },
514 { $limit : 100 }
515]);
516
517
518KEEP ADDITIONAL FIELDS - https://stackoverflow.com/questions/16662405/mongo-group-query-how-to-keep-fields:
519 a. KEEPS ONLY FIRST DOMAIN URL FOR EACH COUNTED COUNTRY CODE:
520
521 db.Websites.aggregate([
522 {
523 $match: {
524 "urlContainsLangCodeInPath": true
525 }
526 },
527 { $unwind: "$geoLocationCountryCode" },
528 {
529 $group: {
530 _id: "$geoLocationCountryCode", count: { $sum: 1 },
531 domain: {$first: '$domain'}
532 }
533 },
534 { $sort : { count : -1} }
535 ]);
536
537 b. KEEP ALL DOMAIN URLS:
538 db.Websites.aggregate([
539 {
540 $match: {
541 "urlContainsLangCodeInPath": true
542 }
543 },
544 { $unwind: "$geoLocationCountryCode" },
545 {
546 $group: {
547 _id: "$geoLocationCountryCode",
548 count: { $sum: 1 },
549 domain: { $addToSet: '$domain' }
550 }
551 },
552 { $sort : { count : -1} }
553 ]);
554
555
556# WANT TO GET THE ABOVE INTO WORLD MAP, use geojson.tools found by Dr Bainbridge
557geojson.tools
558USAGE: https://www.here.xyz/viewer-tool/
559
560
[33698]561AIMS:
[33675]562* Identify where Maori language is online.
563* How can we identify high quality sites that would be good for a corpus.
564(Related work for other languages to quantifiably answer that)
565
[33698]566
567
568
569data-preparation
570docs
571
[33710]572
573
574
575/* 1 */
576{
577 "_id" : "US",
578 "count" : 93.0,
579 -95.8,40.33
580}
581
582/* 2 */
583{
584 "_id" : "AU",
585 "count" : 7.0,
586 135.8,-25.33
587}
588
589/* 3 */
590{
591 "_id" : "CN",
592 "count" : 7.0,
593 100.8,
594 32.33
595}
596
597/* 4 */
598{
599 "_id" : "NZ",
600 "count" : 5.0,
601175.8,
602 -40.33
603}
604
605/* 5 */
606{
607 "_id" : "DE",
608 "count" : 5.0,
60910.8,
610 50.33
611}
612
613/* 6 */
614{
615 "_id" : "HK",
616 "count" : 5.0,
617114,
618 22.33
619}
620
621/* 7 */
622{
623 "_id" : "RU",
624 "count" : 4.0,
62538.4,
626 55.5
627}
628
629/* 8 */
630{
631 "_id" : "JP",
632 "count" : 3.0,
633 137.8,
634 36
635}
636
637/* 9 */
638{
639 "_id" : "GB",
640 "count" : 3.0,
641-2,
642 53.33
643}
644
645/* 10 */
646{
647 "_id" : "CA",
648 "count" : 2.0,
649 -105.8,
650 55.33
651}
652
653/* 11 */
654{
655 "_id" : "FR",
656 "count" : 2.0,
657 3,
658 47.33
659}
660
661/* 12 */
662{
663 "_id" : "DK",
664 "count" : 2.0,
665 9.5,
666 55.33
667}
668
669/* 13 British Virgin Islands */
670{
671 "_id" : "VG",
672 "count" : 2.0,
673 -64.8,
674 18.35
675}
676
677/* 14 Ukraine */
678{
679 "_id" : "UA",
680 "count" : 1.0,
681 31.5,
682 48.5
683}
684
685/* 15 */
686{
687 "_id" : "CZ",
688 "count" : 1.0,
689 16.2,
690 49.7
691}
692
693/* 16 Switzerland */
694{
695 "_id" : "CH",
696 "count" : 1.0,
697 8.5,
698 47
699}
700
701/* 17 Zuid-Afrika */
702{
703 "_id" : "ZA",
704 "count" : 1.0,
705 24.2,
706 -30.7
707}
708
709/* 18 */
710{
711 "_id" : "NL",
712 "count" : 1.0,
7135.8,
714 52.33
715}
716
717/* 19 */
718{
719 "_id" : "KR",
720 "count" : 1.0,
721 127.8,
722 36.8
723}
724
725
726/** http://geojson.tools/
727
728
729{
730 "type": "MultiPoint",
731 "coordinates": [
732 [
733 -95.8,
734 40.33
735 ],
736 [
737 135.8,
738 -25.33
739 ],
740 [
741 100.8,
742 32.33
743 ],
744 [
745 175.8,
746 -40.33
747 ],
748 [
749 10.8,
750 50.33
751 ],
752 [
753 10.8,
754 50.33
755 ],
756 [
757 114,
758 22.33
759 ],
760 [
761 38.4,
762 55.5
763 ],
764 [
765 -2,
766 53.33
767 ],
768 [
769 137.8,
770 36
771 ],
772 [
773 -105.8,
774 55.33
775 ],
776 [
777 3,
778 47.33
779 ],
780 [
781 9.5,
782 55.33
783 ],
784 [
785 -64.8,
786 18.35
787 ],
788 [
789 31.5,
790 48.5
791 ],
792 [
793 16.2,
794 49.7
795 ],
796 [
797 8.5,
798 47
799 ],
800 [
801 24.2,
802 -30.7
803 ],
804 [
805 5.8,
806 52.33
807 ],
808 [
809 127.8,
810 36.8
811 ]
812 ]
813}
814
815*/
816
817/* 1 */
818{
819 "_id" : "US",
820 "count" : 93.0,
821 -95.8,40.33
822}
823
824/* 2 */
825{
826 "_id" : "AU",
827 "count" : 7.0,
828 135.8,-25.33
829}
830
831/* 3 */
832{
833 "_id" : "CN",
834 "count" : 7.0,
835 100.8,
836 32.33
837}
838
839/* 4 */
840{
841 "_id" : "NZ",
842 "count" : 5.0,
843175.8,
844 -40.33
845}
846
847/* 5 */
848{
849 "_id" : "DE",
850 "count" : 5.0,
85110.8,
852 50.33
853}
854
855/* 6 */
856{
857 "_id" : "HK",
858 "count" : 5.0,
859114,
860 22.33
861}
862
863/* 7 */
864{
865 "_id" : "RU",
866 "count" : 4.0,
86738.4,
868 55.5
869}
870
871/* 8 */
872{
873 "_id" : "JP",
874 "count" : 3.0,
875 137.8,
876 36
877}
878
879/* 9 */
880{
881 "_id" : "GB",
882 "count" : 3.0,
883-2,
884 53.33
885}
886
887/* 10 */
888{
889 "_id" : "CA",
890 "count" : 2.0,
891 -105.8,
892 55.33
893}
894
895/* 11 */
896{
897 "_id" : "FR",
898 "count" : 2.0,
899 3,
900 47.33
901}
902
903/* 12 */
904{
905 "_id" : "DK",
906 "count" : 2.0,
907 9.5,
908 55.33
909}
910
911/* 13 British Virgin Islands */
912{
913 "_id" : "VG",
914 "count" : 2.0,
915 -64.8,
916 18.35
917}
918
919/* 14 Ukraine */
920{
921 "_id" : "UA",
922 "count" : 1.0,
923 31.5,
924 48.5
925}
926
927/* 15 */
928{
929 "_id" : "CZ",
930 "count" : 1.0,
931 16.2,
932 49.7
933}
934
935/* 16 Switzerland */
936{
937 "_id" : "CH",
938 "count" : 1.0,
939 8.5,
940 47
941}
942
943/* 17 Zuid-Afrika */
944{
945 "_id" : "ZA",
946 "count" : 1.0,
947 24.2,
948 -30.7
949}
950
951/* 18 */
952{
953 "_id" : "NL",
954 "count" : 1.0,
9555.8,
956 52.33
957}
958
959/* 19 */
960{
961 "_id" : "KR",
962 "count" : 1.0,
963 127.8,
964 36.8
965}
966
967
968/** http://geojson.tools/
969
970
971{
972 "type": "MultiPoint",
973 "coordinates": [
974 [
975 -95.8,
976 40.33
977 ],
978 [
979 135.8,
980 -25.33
981 ],
982 [
983 100.8,
984 32.33
985 ],
986 [
987 175.8,
988 -40.33
989 ],
990 [
991 10.8,
992 50.33
993 ],
994 [
995 10.8,
996 50.33
997 ],
998 [
999 114,
1000 22.33
1001 ],
1002 [
1003 38.4,
1004 55.5
1005 ],
1006 [
1007 -2,
1008 53.33
1009 ],
1010 [
1011 137.8,
1012 36
1013 ],
1014 [
1015 -105.8,
1016 55.33
1017 ],
1018 [
1019 3,
1020 47.33
1021 ],
1022 [
1023 9.5,
1024 55.33
1025 ],
1026 [
1027 -64.8,
1028 18.35
1029 ],
1030 [
1031 31.5,
1032 48.5
1033 ],
1034 [
1035 16.2,
1036 49.7
1037 ],
1038 [
1039 8.5,
1040 47
1041 ],
1042 [
1043 24.2,
1044 -30.7
1045 ],
1046 [
1047 5.8,
1048 52.33
1049 ],
1050 [
1051 127.8,
1052 36.8
1053 ]
1054 ]
1055}
1056
1057*/
Note: See TracBrowser for help on using the repository browser.