1 | MongoDB
|
---|
2 | Installation:
|
---|
3 | https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
|
---|
4 | https://docs.mongodb.com/manual/administration/install-on-linux/
|
---|
5 | https://hevodata.com/blog/install-mongodb-on-ubuntu/
|
---|
6 | https://www.digitalocean.com/community/tutorials/how-to-install-mongodb-on-ubuntu-16-04
|
---|
7 | CENTOS (Analytics): https://tecadmin.net/install-mongodb-on-centos/
|
---|
8 | FROM SOURCE: https://github.com/mongodb/mongo/wiki/Build-Mongodb-From-Source
|
---|
9 | GUI:
|
---|
10 | https://robomongo.org/
|
---|
11 | Robomongo is Robo 3T now
|
---|
12 |
|
---|
13 | https://www.tutorialspoint.com/mongodb/mongodb_java.htm
|
---|
14 | JAR FILE:
|
---|
15 | http://central.maven.org/maven2/org/mongodb/mongo-java-driver/
|
---|
16 | https://mongodb.github.io/mongo-java-driver/
|
---|
17 |
|
---|
18 |
|
---|
19 |
|
---|
20 | https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
|
---|
21 | http://www.programmersought.com/article/6500308940/
|
---|
22 |
|
---|
23 | 52 sudo apt-get install mongodb-clients
|
---|
24 | 53 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
|
---|
25 |
|
---|
26 | Failed with
|
---|
27 | Error: HostAndPort: host is empty at src/mongo/shell/mongo.js:148
|
---|
28 | exception: connect failed
|
---|
29 |
|
---|
30 | This is due to a version incompatibility between Client and mongodb Server.
|
---|
31 | The solution is to follow instructions at http://www.programmersought.com/article/6500308940/
|
---|
32 | and then https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
|
---|
33 | as below:
|
---|
34 |
|
---|
35 | 54 sudo apt-get purge mongodb-clients
|
---|
36 | 55 sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
|
---|
37 | 56 echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
|
---|
38 | 57 sudo apt-get update
|
---|
39 | 58 sudo apt-get install mongodb-clients
|
---|
40 | 59 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
|
---|
41 | (still doesn't work)
|
---|
42 | 60 sudo apt-get install -y mongodb-org
|
---|
43 | The above ensures an up to date mongo client but installs the mongodb server too. Maybe this is the only step that is needed to install up-to-date mongo client and mongodb server?
|
---|
44 | 72 sudo service mongod status
|
---|
45 |
|
---|
46 | 103 sudo service mongod start
|
---|
47 | "mongod" stands for mongo-daemon. This runs the mongo db server listening for client connections
|
---|
48 | 104 sudo service mongod status
|
---|
49 | 88 sudo service mongod stop
|
---|
50 |
|
---|
51 |
|
---|
52 | DETAILS:
|
---|
53 |
|
---|
54 | wharariki:[879]/Scratch/ak19/gs3-extensions/maori-lang-detection>mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
|
---|
55 |
|
---|
56 | didn't work with the pwd. Failed with:
|
---|
57 |
|
---|
58 | MongoDB shell version: 2.6.10
|
---|
59 | Enter password:
|
---|
60 | connecting to: mongodb://mongodb.cms.waikato.ac.nz:27017
|
---|
61 | 2019-11-04T20:02:47.970+1300 Assertion: 13110:HostAndPort: host is empty
|
---|
62 | 2019-11-04T20:02:47.970+1300 0x6b75c9 0x659e9f 0x636f69 0x4fa55c 0x501249 0x4fa7f1 0x6006fd 0x5eb869 0x7f7bfbd47d76 0x1f3c10d06362
|
---|
63 | mongo(_ZN5mongo15printStackTraceERSo+0x39) [0x6b75c9]
|
---|
64 | mongo(_ZN5mongo10logContextEPKc+0x21f) [0x659e9f]
|
---|
65 | mongo(_ZN5mongo11msgassertedEiPKc+0xd9) [0x636f69]
|
---|
66 | mongo(_ZN5mongo16ConnectionString12_fillServersENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x50c) [0x4fa55c]
|
---|
67 | mongo(_ZN5mongo16ConnectionStringC1ENS0_14ConnectionTypeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_+0x99) [0x501249]
|
---|
68 | mongo(_ZN5mongo16ConnectionString5parseERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERS6_+0x201) [0x4fa7f1]
|
---|
69 | mongo(_ZN5mongo17mongoConsExternalEPNS_7V8ScopeERKN2v89ArgumentsE+0x11d) [0x6006fd]
|
---|
70 | mongo(_ZN5mongo7V8Scope10v8CallbackERKN2v89ArgumentsE+0xa9) [0x5eb869]
|
---|
71 | /usr/lib/libv8.so.3.14.5(+0x99d76) [0x7f7bfbd47d76]
|
---|
72 | [0x1f3c10d06362]
|
---|
73 | 2019-11-04T20:02:47.971+1300 Error: HostAndPort: host is empty at src/mongo/shell/mongo.js:148
|
---|
74 | exception: connect failed
|
---|
75 |
|
---|
76 |
|
---|
77 | This is due to a version incompatibility between Client and mongodb Server.
|
---|
78 | Can find client version above. (2.6.10)
|
---|
79 | Server version can be found by running the mongo client shell. Doing so without loading a db:
|
---|
80 |
|
---|
81 |
|
---|
82 | wharariki:[880]/Scratch/ak19/gs3-extensions/maori-lang-detection>mongo --shell -nodb
|
---|
83 | MongoDB shell version: 2.6.10 <<<<<<<<<-------------------<<<< MONGO CLIENT VERSION
|
---|
84 | type "help" for help
|
---|
85 | > help
|
---|
86 | db.help() help on db methods
|
---|
87 | db.mycoll.help() help on collection methods
|
---|
88 | sh.help() sharding helpers
|
---|
89 | rs.help() replica set helpers
|
---|
90 | help admin administrative help
|
---|
91 | help connect connecting to a db help
|
---|
92 | help keys key shortcuts
|
---|
93 | help misc misc things to know
|
---|
94 | help mr mapreduce
|
---|
95 |
|
---|
96 | show dbs show database names
|
---|
97 | show collections show collections in current database
|
---|
98 | show users show users in current database
|
---|
99 | show profile show most recent system.profile entries with time >= 1ms
|
---|
100 | show logs show the accessible logger names
|
---|
101 | show log [name] prints out the last segment of log in memory, 'global' is default
|
---|
102 | use <db_name> set current database
|
---|
103 | db.foo.find() list objects in collection foo
|
---|
104 | db.foo.find( { a : 1 } ) list objects in foo where a == 1
|
---|
105 | it result of the last line evaluated; use to further iterate
|
---|
106 | DBQuery.shellBatchSize = x set default number of items to display on shell
|
---|
107 | exit quit the mongo shell
|
---|
108 |
|
---|
109 | > help connect
|
---|
110 |
|
---|
111 | Normally one specifies the server on the mongo shell command line. Run mongo --help to see those options.
|
---|
112 | Additional connections may be opened:
|
---|
113 |
|
---|
114 | var x = new Mongo('host[:port]');
|
---|
115 | var mydb = x.getDB('mydb');
|
---|
116 | or
|
---|
117 | var mydb = connect('host[:port]/mydb');
|
---|
118 |
|
---|
119 | Note: the REPL prompt only auto-reports getLastError() for the shell command line connection.
|
---|
120 |
|
---|
121 | Getting help on connect options:
|
---|
122 |
|
---|
123 | > var x = new Mongo('mongodb.cms.waikato.ac.nz:27017');
|
---|
124 | > var mydb = x.getDB('anupama');
|
---|
125 |
|
---|
126 | > mydb.connect.help()
|
---|
127 | DBCollection help
|
---|
128 | db.connect.find().help() - show DBCursor help
|
---|
129 | db.connect.count()
|
---|
130 | db.connect.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.
|
---|
131 | db.connect.convertToCapped(maxBytes) - calls {convertToCapped:'connect', size:maxBytes}} command
|
---|
132 | db.connect.dataSize()
|
---|
133 | db.connect.distinct( key ) - e.g. db.connect.distinct( 'x' )
|
---|
134 | db.connect.drop() drop the collection
|
---|
135 | db.connect.dropIndex(index) - e.g. db.connect.dropIndex( "indexName" ) or db.connect.dropIndex( { "indexKey" : 1 } )
|
---|
136 | db.connect.dropIndexes()
|
---|
137 | db.connect.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups
|
---|
138 | db.connect.reIndex()
|
---|
139 | db.connect.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.
|
---|
140 | e.g. db.connect.find( {x:77} , {name:1, x:1} )
|
---|
141 | db.connect.find(...).count()
|
---|
142 | db.connect.find(...).limit(n)
|
---|
143 | db.connect.find(...).skip(n)
|
---|
144 | db.connect.find(...).sort(...)
|
---|
145 | db.connect.findOne([query])
|
---|
146 | db.connect.findAndModify( { update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )
|
---|
147 | db.connect.getDB() get DB object associated with collection
|
---|
148 | db.connect.getPlanCache() get query plan cache associated with collection
|
---|
149 | db.connect.getIndexes()
|
---|
150 | db.connect.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
|
---|
151 | db.connect.insert(obj)
|
---|
152 | db.connect.mapReduce( mapFunction , reduceFunction , <optional params> )
|
---|
153 | db.connect.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor
|
---|
154 | db.connect.remove(query)
|
---|
155 | db.connect.renameCollection( newName , <dropTarget> ) renames the collection.
|
---|
156 | db.connect.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
|
---|
157 | db.connect.save(obj)
|
---|
158 | db.connect.stats()
|
---|
159 | db.connect.storageSize() - includes free space allocated to this collection
|
---|
160 | db.connect.totalIndexSize() - size in bytes of all the indexes
|
---|
161 | db.connect.totalSize() - storage allocated for all data and indexes
|
---|
162 | db.connect.update(query, object[, upsert_bool, multi_bool]) - instead of two flags, you can pass an object with fields: upsert, multi
|
---|
163 | db.connect.validate( <full> ) - SLOW
|
---|
164 | db.connect.getShardVersion() - only for use with sharding
|
---|
165 | db.connect.getShardDistribution() - prints statistics about data distribution in the cluster
|
---|
166 | db.connect.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function
|
---|
167 | db.connect.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set
|
---|
168 | db.connect.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection
|
---|
169 | db.connect.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection
|
---|
170 | > mydb.version()
|
---|
171 | 4.0.13 <<<<<<<<<-------------------<<<< MONGODB SERVER VERSION
|
---|
172 |
|
---|
173 | (Check Mongo server version: https://stackoverflow.com/questions/38160412/how-to-find-the-exact-version-of-installed-mongodb)
|
---|
174 |
|
---|
175 | Finally we now know the mongodb server version: 4.0.13
|
---|
176 | This version doesn't work with our mongo client (shell) version of 2.6.10.
|
---|
177 |
|
---|
178 |
|
---|
179 | DETAILS OF INSTALLING MONGO-CLIENT AND UPDATING IT, AND INSTALLING MONGODB SERVER:
|
---|
180 |
|
---|
181 |
|
---|
182 | 54 sudo apt-get purge mongodb-clients
|
---|
183 | 55 sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
|
---|
184 | 56 echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
|
---|
185 | 57 sudo apt-get update
|
---|
186 | 58 sudo apt-get install mongodb-clients
|
---|
187 | 59 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
|
---|
188 | 60 sudo apt-get install -y mongodb-org
|
---|
189 | 61 mongo 'mongodb://mongodb.cms.waikato.ac.nz:27017' -u anupama -p
|
---|
190 | 62 sudo service apache2 status
|
---|
191 | 63 sudo service sshd status
|
---|
192 | 64 sudo service mongodb status
|
---|
193 | 65 sudo service mongo status
|
---|
194 | 66 mongod
|
---|
195 | 67 mongod --help
|
---|
196 | 68 mongod --help | less
|
---|
197 | 69 mongod -f /etc/mongod.conf
|
---|
198 | 70 sudo mongod -f /etc/mongod.conf
|
---|
199 | 71 less /etc/mongod.conf
|
---|
200 | 72 sudo service mongod status
|
---|
201 | 73 sudo service mongod start
|
---|
202 | 74 sudo service mongod status
|
---|
203 | 75 ls -l /var/log/mongodb/mongod.log
|
---|
204 | 76 sudo rm /var/log/mongodb/mongod.log
|
---|
205 | 77 sudo service mongod status
|
---|
206 | 78 sudo service mongod start
|
---|
207 | 79 sudo service mongod status
|
---|
208 | 80 sudo service mongod stop
|
---|
209 | 81 ps auxww | grep mongo
|
---|
210 | 82 sudo service mongod start
|
---|
211 | 83 sudo service mongod status
|
---|
212 | 84 ps auxww | grep mongo
|
---|
213 | 85 sudo dmsg
|
---|
214 | 86 sudo dmesg
|
---|
215 | 87 sudo service mongod status
|
---|
216 | 88 sudo service mongod stop
|
---|
217 | 89 sudo service mongod start
|
---|
218 | 90 sudo dmesg
|
---|
219 | 91 sudo less /var/log/mongodb/mongod.log
|
---|
220 | 92 ls /var/lib/
|
---|
221 | 93 ls -ld /var/lib/
|
---|
222 | 94 ls -l /var/log/mongodb/mongod.log
|
---|
223 | 95 ls -ld /var/lib/
|
---|
224 | 96 groups mongodb
|
---|
225 | 97 less /etc/mongod.conf
|
---|
226 | 98 sudo less /var/log/mongodb/mongod.log
|
---|
227 | 99 less /etc/mongod.conf
|
---|
228 | 100 ls -l /var/lib/mongodb/
|
---|
229 | 101 sudo chown -R mongodb /var/lib/mongodb/
|
---|
230 | 102 sudo chgrp -R mongodb /var/lib/mongodb/
|
---|
231 | 103 sudo service mongod start
|
---|
232 | 104 sudo service mongod status
|
---|
233 | 105 history
|
---|
234 |
|
---|
235 |
|
---|
236 |
|
---|
237 | MONGO DB ROBO 3T
|
---|
238 | 1. Download "Double Pack" from https://robomongo.org/
|
---|
239 | 2. Untar its contents. Then untar the tarball in that.
|
---|
240 | 3. Run:
|
---|
241 | wharariki:[110]~/Downloads/robo3t-1.3.1-linux-x86_64-7419c406>./bin/robo3t
|
---|
242 |
|
---|
243 | ===================
|
---|
244 | On analytics, vagrant node1, we've installed the mongodb server and client.
|
---|
245 | We're able to successfully create collections on here.
|
---|
246 |
|
---|
247 |
|
---|
248 | vagrant@node1:~$ mongo
|
---|
249 | MongoDB shell version v4.2.1
|
---|
250 | connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
|
---|
251 | Implicit session: session { "id" : UUID("87bb585c-4685-47f6-bf89-a93801daeb2d") }
|
---|
252 | MongoDB server version: 4.2.1
|
---|
253 | Server has startup warnings:
|
---|
254 | 2019-11-04T07:48:14.197+0000 I STORAGE [initandlisten]
|
---|
255 | 2019-11-04T07:48:14.198+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
|
---|
256 | 2019-11-04T07:48:14.198+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
|
---|
257 | 2019-11-04T07:48:14.624+0000 I CONTROL [initandlisten]
|
---|
258 | 2019-11-04T07:48:14.624+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
|
---|
259 | 2019-11-04T07:48:14.624+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
|
---|
260 | 2019-11-04T07:48:14.624+0000 I CONTROL [initandlisten]
|
---|
261 | ---
|
---|
262 | Enable MongoDB's free cloud-based monitoring service, which will then receive and display
|
---|
263 | metrics about your deployment (disk utilization, CPU, operation statistics, etc).
|
---|
264 |
|
---|
265 | The monitoring data will be available on a MongoDB website with a unique URL accessible to you
|
---|
266 | and anyone you share the URL with. MongoDB may use this information to make product
|
---|
267 | improvements and to suggest MongoDB products and deployment options to you.
|
---|
268 |
|
---|
269 | To enable free monitoring, run the following command: db.enableFreeMonitoring()
|
---|
270 | To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
|
---|
271 | ---
|
---|
272 |
|
---|
273 | > show dbs
|
---|
274 | admin 0.000GB
|
---|
275 | config 0.000GB
|
---|
276 | local 0.000GB
|
---|
277 | > use db ateacrawldata
|
---|
278 | 2019-11-05T05:24:20.155+0000 E QUERY [js] Error: [db ateacrawldata] is not a valid database name :
|
---|
279 | Mongo.prototype.getDB@src/mongo/shell/mongo.js:51:12
|
---|
280 | getDatabase@src/mongo/shell/session.js:913:28
|
---|
281 | DB.prototype.getSiblingDB@src/mongo/shell/db.js:22:12
|
---|
282 | shellHelper.use@src/mongo/shell/utils.js:803:10
|
---|
283 | shellHelper@src/mongo/shell/utils.js:790:15
|
---|
284 | @(shellhelp2):1:1
|
---|
285 | > db.createCollection('webpages');
|
---|
286 | { "ok" : 1 }
|
---|
287 | > db.webpages.drop();
|
---|
288 | ... ^C
|
---|
289 |
|
---|
290 | > db.webpages.drop();
|
---|
291 | true
|
---|
292 | > use ateacrawldata
|
---|
293 | switched to db ateacrawldata
|
---|
294 | > db.createCollection('webpages');
|
---|
295 | { "ok" : 1 }
|
---|
296 | > show collections
|
---|
297 | webpages
|
---|
298 | > db.createCollection('websites');
|
---|
299 | { "ok" : 1 }
|
---|
300 | >
|
---|
301 |
|
---|
302 | ------------------------
|
---|
303 |
|
---|
304 | Ask Clint to rename "anupama" database to "ateacrawldata" database following the instructions at:
|
---|
305 | https://stackoverflow.com/questions/9201832/how-do-you-rename-a-mongodb-database
|
---|
306 | I don't have permissions to do this.
|
---|
307 | Nor do I have permissions to create Mongo collections within a new database that I create, like ateacrawldata.
|
---|
308 | I only seem to have rights to the "anupama" database.
|
---|
309 |
|
---|
310 |
|
---|
311 |
|
---|
312 | -----------------------
|
---|
313 | Vagrant virtual machine Node1 has the mongodb installed.
|
---|
314 |
|
---|
315 | After doing "vagrant up" on node1 to start node1:
|
---|
316 |
|
---|
317 | [anupama@analytics vagrant-hadoop-hive-spark]$ vagrant ssh
|
---|
318 | vagrant@node1:~$ mongo
|
---|
319 | MongoDB shell version v4.2.1
|
---|
320 | connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
|
---|
321 | 2019-11-13T09:22:46.996+0000 E QUERY [js] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
|
---|
322 | connect@src/mongo/shell/mongo.js:341:17
|
---|
323 | @(connect):2:6
|
---|
324 | 2019-11-13T09:22:46.999+0000 F - [main] exception: connect failed
|
---|
325 | 2019-11-13T09:22:46.999+0000 E - [main] exiting with code 1
|
---|
326 | vagrant@node1:~$ sudo service mongod status
|
---|
327 | â mongod.service - MongoDB Database Server
|
---|
328 | Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled)
|
---|
329 | Active: inactive (dead)
|
---|
330 | Docs: https://docs.mongodb.org/manual
|
---|
331 | vagrant@node1:~$ sudo service mongod start
|
---|
332 | vagrant@node1:~$ sudo service mongod status
|
---|
333 | â mongod.service - MongoDB Database Server
|
---|
334 | Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled)
|
---|
335 | Active: active (running) since Wed 2019-11-13 09:24:07 UTC; 2s ago
|
---|
336 | Docs: https://docs.mongodb.org/manual
|
---|
337 | Main PID: 4383 (mongod)
|
---|
338 | Tasks: 32
|
---|
339 | Memory: 199.3M
|
---|
340 | CPU: 754ms
|
---|
341 | CGroup: /system.slice/mongod.service
|
---|
342 | ââ4383 /usr/bin/mongod --config /etc/mongod.conf
|
---|
343 |
|
---|
344 | Nov 13 09:24:07 node1 systemd[1]: Started MongoDB Database Server.
|
---|
345 | vagrant@node1:~$
|
---|
346 |
|
---|
347 |
|
---|
348 | So now mongodb is running on node1 on localhost:27017.
|
---|
349 |
|
---|
350 | Next, in another x-term connected to analytics' node1 Vagrant VM, port forward node1's localhost:27017 to analytics' localhost:27017:
|
---|
351 | vagrant ssh -- -L 27017:localhost:27017
|
---|
352 |
|
---|
353 |
|
---|
354 |
|
---|
355 | Finally, in another x-term, port-forward from analytics:27017 to current machine's 27017:
|
---|
356 | ssh -L 27017:localhost:27017 analytics
|
---|
357 |
|
---|
358 |
|
---|
359 | Now can connect Robo-3T running on current machine to localhost:27017.
|
---|
360 |
|
---|
361 | Then in a new x-term, can use the client mongo shell to connect (by default to localhost:27017):
|
---|
362 |
|
---|
363 | wharariki:[122]/Scratch/ak19/GS309>mongo --shell
|
---|
364 | MongoDB shell version v4.0.13
|
---|
365 | connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
|
---|
366 | ...
|
---|
367 | > show dbs
|
---|
368 | admin 0.000GB
|
---|
369 | ateacrawldata 1.532GB
|
---|
370 | config 0.000GB
|
---|
371 | local 0.000GB
|
---|
372 | > use ateacrawldata
|
---|
373 |
|
---|
374 | > show collections
|
---|
375 | Webpages
|
---|
376 | Websites
|
---|
377 | oldwebpages
|
---|
378 | oldwebsites
|
---|
379 | -------------------
|
---|
380 |
|
---|
381 | Country code to geolocation CSV file found by Dr Bainbridge:
|
---|
382 | https://developers.google.com/public-data/docs/canonical/countries_csv
|
---|
383 |
|
---|
384 | Import into mongodb with:
|
---|
385 | https://stackoverflow.com/questions/4686500/how-to-use-mongoimport-to-import-csv
|
---|
386 |
|
---|
387 |
|
---|
388 |
|
---|
389 | NOTE: mongoimport is a commandline utility and not a command to be run from the mongo shell. See https://jira.mongodb.org/browse/DOCS-11072
|
---|
390 | This means, in an x-term, DON'T RUN MONGO SHELL/client first. Instead, directly from x-term, run the following to import the countrycodes.csv file:
|
---|
391 |
|
---|
392 |
|
---|
393 | mongoimport -d ateacrawldata -c countrylocations --type csv --file /Scratch/ak19/maori-lang-detection/MoreReading/countrycodes.csv --headerline
|
---|
394 |
|
---|
395 |
|
---|
396 | -------------------------
|
---|
397 |
|
---|
398 | MONGODB QUERIES:
|
---|
399 |
|
---|
400 | db.getCollection('webpages').find({"isMRI": true, "singleSentences.langCode": "mri"})
|
---|
401 | db.getCollection('webpages').find({"singleSentences": { $elemMatch: {"langCode":"mri"} } }, {"singleSentences.$": "mri"})
|
---|
402 | db.getCollection('Webpages').find({"isMRI": true, "singleSentences": { $elemMatch: {"langCode":"eng"} } }, {"singleSentences.$": "eng"}) [single English lang sentence]
|
---|
403 | db.getCollection('Webpages').find({"containsMRI": true, "singleSentences": { $elemMatch: {"langCode":"mri"} } }, {"singleSentences.$": "mri"}) [gets 1st sentence of docs which have sentences containing MRI]
|
---|
404 |
|
---|
405 |
|
---|
406 | READING
|
---|
407 |
|
---|
408 | mongodb java convert class
|
---|
409 | https://www.quora.com/What-are-the-ways-of-converting-a-Java-object-to-a-MongoDB-document-and-vice-versa
|
---|
410 | https://stackoverflow.com/questions/39320825/pojo-to-org-bson-document-and-vice-versa
|
---|
411 | X https://mongodb.github.io/morphia/
|
---|
412 | https://stackoverflow.com/questions/10170506/inserting-java-object-to-mongodb-collection-using-java
|
---|
413 | X https://www.google.com/search?q=morphia+example&oq=morphia+example&aqs=chrome.0.0l6.4223j0j9&sourceid=chrome&ie=UTF-8
|
---|
414 | https://www.baeldung.com/mongodb-morphia
|
---|
415 | X https://web.archive.org/web/20171117121335/http://mongodb.github.io/morphia/1.3/getting-started/
|
---|
416 | => https://morphia.dev/1.4/getting-started/quick-tour/
|
---|
417 | https://github.com/MorphiaOrg/morphia/tree/master/docs/reference
|
---|
418 |
|
---|
419 |
|
---|
420 | mongodb querying
|
---|
421 | https://docs.mongodb.com/manual/tutorial/query-embedded-documents/
|
---|
422 | https://docs.mongodb.com/manual/tutorial/query-arrays/
|
---|
423 | https://www.google.com/search?q=mongodb+find+subdocument&oq=mongodb+find+&aqs=chrome.0.69i59j69i57j0l4.7607j1j8&sourceid=chrome&ie=UTF-8
|
---|
424 | https://stackoverflow.com/questions/25586901/how-to-find-document-and-single-subdocument-matching-given-criterias-in-mongodb
|
---|
425 | https://stackoverflow.com/questions/21113543/mongodb-get-subdocument
|
---|
426 | https://stackoverflow.com/questions/36948856/find-subdocuments-in-mongo
|
---|
427 | https://docs.mongodb.com/v3.0/reference/operator/projection/positional/#proj._S_
|
---|
428 | https://www.google.com/search?q=mongodb+query+tutorial&oq=mongodb+query+tutorial&aqs=chrome..69i57j0l2j69i60l3.4719j0j7&sourceid=chrome&ie=UTF-8
|
---|
429 | https://blog.exploratory.io/an-introduction-to-mongodb-query-for-beginners-bd463319aa4c
|
---|
430 | https://docs.mongodb.com/manual/reference/method/db.collection.find/
|
---|
431 | https://docs.mongodb.com/manual/reference/method/db.collection.find/#find-projection
|
---|
432 | https://stackoverflow.com/questions/39641925/mongodb-aggregation-framework-to-get-frequencies-of-fields-values
|
---|
433 |
|
---|
434 | https://exploratory.io/note/kanaugust/0961813761939766
|
---|
435 | https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/
|
---|
436 | https://docs.mongodb.com/manual/aggregation/
|
---|
437 |
|
---|
438 |
|
---|
439 | Mongo Studio 3T documentation:
|
---|
440 | https://studio3t.com/download/ (also has uninstall information)
|
---|
441 | https://studio3t.com/download-thank-you/?OS=x64
|
---|
442 |
|
---|
443 | Google: MongoDB visualization
|
---|
444 | MongoDB visualization map
|
---|
445 | MongoDB Charts
|
---|
446 | (Open source visualisation tools)
|
---|
447 |
|
---|
448 | json map visualizer
|
---|
449 | geojson.tools
|
---|
450 | -------------------
|
---|
451 |
|
---|
452 | Some queries with results:
|
---|
453 |
|
---|
454 | # Num websites
|
---|
455 | db.getCollection('Websites').find({}).count()
|
---|
456 | 1445
|
---|
457 |
|
---|
458 | # Num webpages
|
---|
459 | db.getCollection('Webpages').find({}).count()
|
---|
460 | X75139
|
---|
461 | 117496
|
---|
462 |
|
---|
463 | # Find number of websites who have 1 or more pages in Maori (a positive numPagesInMRI)
|
---|
464 | db.getCollection('Websites').find({numPagesInMRI: { $gt: 0}}).count()
|
---|
465 | 361
|
---|
466 |
|
---|
467 | # Number of sites containing at least one sentence for which OpenNLP detected the best language = MRI
|
---|
468 | db.getCollection('Websites').find({numPagesContainingMRI: {$gt: 0}}).count()
|
---|
469 | 868
|
---|
470 |
|
---|
471 | # Obviously, the union of the above two will be identical to numPagesContainingMRI:
|
---|
472 | db.getCollection('Websites').find({ $or: [ { numPagesInMRI: { $gt: 0 } }, { numPagesContainingMRI: {$gt: 0} } ] } ).count()
|
---|
473 | 868
|
---|
474 |
|
---|
475 | # Find number of webpages that are deemed to be overall in MRI (pages where isMRI=true)
|
---|
476 | db.getCollection('Webpages').find({isMRI:true}).count()
|
---|
477 | X5224
|
---|
478 | X5215
|
---|
479 | db.getCollection('Webpages').find({isMRI:true}).count()
|
---|
480 | 7818
|
---|
481 |
|
---|
482 | # Number of pages that contain any number of MRI sentences
|
---|
483 | db.getCollection('Webpages').find({containsMRI: true}).count()
|
---|
484 | X12858
|
---|
485 | 20371
|
---|
486 |
|
---|
487 |
|
---|
488 | # Number of sites with URLs containing /mi(/)
|
---|
489 | db.getCollection('Websites').find({urlContainsLangCodeInPath:true}).count()
|
---|
490 | 153
|
---|
491 |
|
---|
492 | # Number of websites that are outside NZ that contain /mi(/) in any of its sub-urls
|
---|
493 | db.getCollection('Websites').find({urlContainsLangCodeInPath:true, geoLocationCountryCode: {$ne : "NZ"} }).count()
|
---|
494 | 147
|
---|
495 |
|
---|
496 | # 5 sites with URLs containing /mi(/) that are in NZ
|
---|
497 | db.getCollection('Websites').find({urlContainsLangCodeInPath:true, geoLocationCountryCode: "NZ"}).count()
|
---|
498 | 6
|
---|
499 |
|
---|
500 |
|
---|
501 | # sort websites that contain /mi(/) in path by geoLocationCountryCode
|
---|
502 | # https://www.quackit.com/mongodb/tutorial/mongodb_sort_query_results.cfm
|
---|
503 | db.getCollection('Websites').find({urlContainsLangCodeInPath:true}).sort({geoLocationCountryCode: 1})
|
---|
504 |
|
---|
505 | Actually, I want to sort by count. See https://docs.mongodb.com/manual/reference/operator/aggregation/sortByCount/
|
---|
506 |
|
---|
507 |
|
---|
508 | # PROJECTION:
|
---|
509 | db.getCollection('Websites').find({geoLocationCountryCode: {$ne:"nz"}}, {geoLocationCountryCode:1, urlContainsLangCodeInPath: 1})
|
---|
510 |
|
---|
511 | https://docs.mongodb.com/manual/aggregation/
|
---|
512 | EXAMPLE:
|
---|
513 | db.orders.aggregate([
|
---|
514 | { $match: { status: "A" } },
|
---|
515 | { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
|
---|
516 | ])
|
---|
517 |
|
---|
518 | X db.Websites.aggregate([{ $match:{urlContainsLangCodeInPath:true}}, $group: {geoLocationCountryCode:1, total: $count}])
|
---|
519 |
|
---|
520 |
|
---|
521 | X db.Websites.aggregate([
|
---|
522 | { $match:{urlContainsLangCodeInPath:true}},
|
---|
523 | {$group: {geoLocationCountryCode:1}}
|
---|
524 | ])
|
---|
525 |
|
---|
526 | WORKS (but an "unwind" will get rid of "null"):
|
---|
527 | db.Websites.aggregate([
|
---|
528 | { $match:{urlContainsLangCodeInPath:true}},
|
---|
529 | {$group: {_id: "$geoLocationCountryCode", count: {$sum: 1}}},
|
---|
530 | { $sort : { count : -1} }
|
---|
531 | ])
|
---|
532 |
|
---|
533 |
|
---|
534 | # COUNT OF ALL GEOLOCATION COUNTRIES
|
---|
535 | #https://stackoverflow.com/questions/14924495/mongodb-count-num-of-distinct-values-per-field-key
|
---|
536 | # LIST
|
---|
537 | db.Websites.distinct('geoLocationCountryCode');
|
---|
538 |
|
---|
539 | # COUNT
|
---|
540 | db.Websites.distinct('geoLocationCountryCode').length;
|
---|
541 |
|
---|
542 | # A COUNT WITH QUERY - https://docs.mongodb.com/manual/reference/command/distinct/#dbcmd.distinct
|
---|
543 |
|
---|
544 | db.runCommand ( { distinct: "Websites", key: "geoLocationCountryCode", query: { "urlContainsLangCodeInPath": true} } );
|
---|
545 |
|
---|
546 | # DISTINCT WITH QUERY WITHOUT COUNT - https://docs.mongodb.com/manual/reference/method/db.collection.distinct/
|
---|
547 | db.Websites.distinct('geoLocationCountryCode', {"urlContainsLangCodeInPath": true});
|
---|
548 |
|
---|
549 | #SORTED - https://stackoverflow.com/questions/4759437/get-distinct-values-with-sorted-data
|
---|
550 | db.Websites.distinct('geoLocationCountryCode', {"urlContainsLangCodeInPath": true}).sort();
|
---|
551 |
|
---|
552 |
|
---|
553 | # count of all sites for which the geolocation is UNKNOWN
|
---|
554 | db.getCollection('Websites').find({geoLocationCountryCode: {$eq:"UNKNOWN"}}).count()
|
---|
555 |
|
---|
556 |
|
---|
557 | # AGGREGATION QUERIES THAT WORK:
|
---|
558 | #https://stackoverflow.com/questions/14924495/mongodb-count-num-of-distinct-values-per-field-key
|
---|
559 |
|
---|
560 | WORKS:
|
---|
561 | // count of country codes for all sites
|
---|
562 | db.Websites.aggregate([
|
---|
563 |
|
---|
564 | { $unwind: "$geoLocationCountryCode" },
|
---|
565 | {
|
---|
566 | $group: {
|
---|
567 | _id: "$geoLocationCountryCode",
|
---|
568 | count: { $sum: 1 }
|
---|
569 | }
|
---|
570 | },
|
---|
571 | { $sort : { count : -1} }
|
---|
572 | ]);
|
---|
573 |
|
---|
574 | // count of country codes for sites that have at least one page detected as MRI
|
---|
575 |
|
---|
576 | db.Websites.aggregate([
|
---|
577 | {
|
---|
578 | $match: {
|
---|
579 | numPagesInMRI: {$gt: 0}
|
---|
580 | }
|
---|
581 | },
|
---|
582 | { $unwind: "$geoLocationCountryCode" },
|
---|
583 | {
|
---|
584 | $group: {
|
---|
585 | _id: {$toLower: '$geoLocationCountryCode'},
|
---|
586 | count: { $sum: 1 }
|
---|
587 | }
|
---|
588 | },
|
---|
589 | { $sort : { count : -1} }
|
---|
590 | ]);
|
---|
591 |
|
---|
592 | // count of country codes for sites that have at least one page containing at least one sentence detected as MRI
|
---|
593 | db.Websites.aggregate([
|
---|
594 | {
|
---|
595 | $match: {
|
---|
596 | numPagesContainingMRI: {$gt: 0}
|
---|
597 | }
|
---|
598 | },
|
---|
599 | { $unwind: "$geoLocationCountryCode" },
|
---|
600 | {
|
---|
601 | $group: {
|
---|
602 | _id: {$toLower: '$geoLocationCountryCode'},
|
---|
603 | count: { $sum: 1 }
|
---|
604 | }
|
---|
605 | },
|
---|
606 | { $sort : { count : -1} }
|
---|
607 | ]);
|
---|
608 |
|
---|
609 |
|
---|
610 | WORKS:
|
---|
611 | // count of country codes for sites that have /mi(/) in path
|
---|
612 |
|
---|
613 | db.Websites.aggregate([
|
---|
614 | {
|
---|
615 | $match: {
|
---|
616 | urlContainsLangCodeInPath: true
|
---|
617 | }
|
---|
618 | },
|
---|
619 | { $unwind: "$geoLocationCountryCode" },
|
---|
620 | {
|
---|
621 | $group: {
|
---|
622 | _id: {$toLower: '$geoLocationCountryCode'},
|
---|
623 | count: { $sum: 1 }
|
---|
624 | }
|
---|
625 | },
|
---|
626 | { $sort : { count : -1} }
|
---|
627 | ]);
|
---|
628 |
|
---|
629 |
|
---|
630 | WORKS:
|
---|
631 | db.Websites.aggregate([
|
---|
632 | {
|
---|
633 | $match: {
|
---|
634 | geoLocationCountryCode: {$ne : "UNKNOWN"}
|
---|
635 | }
|
---|
636 | },
|
---|
637 | { $unwind: "$geoLocationCountryCode" },
|
---|
638 | {
|
---|
639 | $group: {
|
---|
640 | _id: "$geoLocationCountryCode",
|
---|
641 | count: { $sum: 1 }
|
---|
642 | }
|
---|
643 | },
|
---|
644 | { $sort : { count : -1} }
|
---|
645 | ]);
|
---|
646 |
|
---|
647 | WORKS:
|
---|
648 | db.Websites.aggregate([
|
---|
649 | {
|
---|
650 | $match: {
|
---|
651 | "urlContainsLangCodeInPath": true
|
---|
652 | }
|
---|
653 | },
|
---|
654 | { $unwind: "$geoLocationCountryCode" },
|
---|
655 | {
|
---|
656 | $group: {
|
---|
657 | _id: "$geoLocationCountryCode",
|
---|
658 | count: { $sum: 1 }
|
---|
659 | }
|
---|
660 | },
|
---|
661 | { $sort : { count : -1} }
|
---|
662 | ]);
|
---|
663 |
|
---|
664 |
|
---|
665 | KEEP ADDITIONAL FIELDS - https://stackoverflow.com/questions/16662405/mongo-group-query-how-to-keep-fields:
|
---|
666 | a. KEEPS ONLY FIRST DOMAIN URL FOR EACH COUNTED COUNTRY CODE:
|
---|
667 |
|
---|
668 | db.Websites.aggregate([
|
---|
669 | {
|
---|
670 | $match: {
|
---|
671 | "urlContainsLangCodeInPath": true
|
---|
672 | }
|
---|
673 | },
|
---|
674 | { $unwind: "$geoLocationCountryCode" },
|
---|
675 | {
|
---|
676 | $group: {
|
---|
677 | _id: "$geoLocationCountryCode", count: { $sum: 1 },
|
---|
678 | domain: {$first: '$domain'}
|
---|
679 | }
|
---|
680 | },
|
---|
681 | { $sort : { count : -1} }
|
---|
682 | ]);
|
---|
683 |
|
---|
684 | b. KEEP ALL DOMAIN URLS:
|
---|
685 | db.Websites.aggregate([
|
---|
686 | {
|
---|
687 | $match: {
|
---|
688 | "urlContainsLangCodeInPath": true
|
---|
689 | }
|
---|
690 | },
|
---|
691 | { $unwind: "$geoLocationCountryCode" },
|
---|
692 | {
|
---|
693 | $group: {
|
---|
694 | _id: "$geoLocationCountryCode",
|
---|
695 | count: { $sum: 1 },
|
---|
696 | domain: { $addToSet: '$domain' }
|
---|
697 | }
|
---|
698 | },
|
---|
699 | { $sort : { count : -1} }
|
---|
700 | ]);
|
---|
701 |
|
---|
702 |
|
---|
703 | # WANT TO GET THE ABOVE INTO WORLD MAP, use geojson.tools found by Dr Bainbridge
|
---|
704 | geojson.tools
|
---|
705 | USAGE: https://www.here.xyz/viewer-tool/
|
---|
706 |
|
---|
707 |
|
---|
708 | AIMS:
|
---|
709 | * Identify where Maori language is online.
|
---|
710 | * How can we identify high quality sites that would be good for a corpus.
|
---|
711 | (Related work for other languages to quantifiably answer that)
|
---|
712 |
|
---|
713 | data-preparation
|
---|
714 | docs
|
---|
715 |
|
---|
716 |
|
---|
717 | ------------------------------------------
|
---|
718 |
|
---|
719 | BUILDING TOWARDS NEW MONGODB QUERY: Counts by country code of TENTATIVE NON-PRODUCT SITES that are in Maori
|
---|
720 | ---
|
---|
721 |
|
---|
722 | # https://stackoverflow.com/questions/16902930/mongodb-aggregation-framework-match-or
|
---|
723 | # https://docs.mongodb.com/manual/reference/operator/query/and/
|
---|
724 |
|
---|
725 | # 1. all the websites which are from NZ:
|
---|
726 | db.getCollection('Websites').find({geoLocationCountryCode: "NZ"}).count()
|
---|
727 | 128
|
---|
728 |
|
---|
729 | # 2. all the websites that have /mi in URL path which are from NZ:
|
---|
730 | db.getCollection('Websites').find({$and: [{urlContainsLangCodeInPath: true}, {geoLocationCountryCode: "NZ"}]})
|
---|
731 | 6
|
---|
732 |
|
---|
733 | # 3. all the websites that don't have /mi in URLpath
|
---|
734 | db.getCollection('Websites').find({urlContainsLangCodeInPath: false}).count()
|
---|
735 | 1292
|
---|
736 |
|
---|
737 | # 4. all the websites that don't have /mi, or if they do are from NZ
|
---|
738 | # (should be the sum of the above points 2 and 3 above)
|
---|
739 | db.getCollection('Websites').find({$or: [{urlContainsLangCodeInPath: false}, {$and: [{urlContainsLangCodeInPath: true}, {geoLocationCountryCode: "NZ"}]}]}).count()
|
---|
740 | 1298
|
---|
741 |
|
---|
742 | # 5. All the websites that have at least 1 page detected as MRI AND either don't have /mi un URL path or if they do are from NZ
|
---|
743 | # These are the TENTATIVE NON-PRODUCT SITES
|
---|
744 | # Should be less than the point 4, but more than 1 to 3
|
---|
745 | db.getCollection('Websites').find({$and: [{numPagesContainingMRI: {$gt: 0}},{$or: [{urlContainsLangCodeInPath: false}, {$and: [{urlContainsLangCodeInPath: true}, {geoLocationCountryCode: "NZ"}]}]}]}).count()
|
---|
746 | 859
|
---|
747 |
|
---|
748 | # 6. Now do the counts by country code of the above, by pasting the query of point 5 as the $match clause (i.e. without the .count() suffix)
|
---|
749 | # Counts by country code of TENTATIVE NON-PRODUCT SITES that are in Maori
|
---|
750 | db.Websites.aggregate([
|
---|
751 | {
|
---|
752 | $match: {$and: [{numPagesContainingMRI: {$gt: 0}},{$or: [{urlContainsLangCodeInPath: false}, {$and: [{urlContainsLangCodeInPath: true}, {geoLocationCountryCode: "NZ"}]}]}]}
|
---|
753 | },
|
---|
754 | { $unwind: "$geoLocationCountryCode" },
|
---|
755 | {
|
---|
756 | $group: {
|
---|
757 | _id: {$toLower: '$geoLocationCountryCode'},
|
---|
758 | count: { $sum: 1 }
|
---|
759 | }
|
---|
760 | },
|
---|
761 | { $sort : { count : -1} }
|
---|
762 | ]);
|
---|
763 |
|
---|
764 | The result is very close to the same aggregate on just numPagesContainingMRI.
|
---|
765 |
|
---|
766 | That's because if you count those websites that contain /mi/ AND numPagesContainingMRI, they're very few:
|
---|
767 |
|
---|
768 | db.Websites.aggregate([
|
---|
769 | {
|
---|
770 | $match: {
|
---|
771 | $and: [{numPagesContainingMRI: {$gt: 0}},{urlContainsLangCodeInPath: true}]
|
---|
772 | }
|
---|
773 | },
|
---|
774 | { $unwind: "$geoLocationCountryCode" },
|
---|
775 | {
|
---|
776 | $group: {
|
---|
777 | _id: {$toLower: '$geoLocationCountryCode'},
|
---|
778 | count: { $sum: 1 }
|
---|
779 | }
|
---|
780 | },
|
---|
781 | { $sort : { count : -1} }
|
---|
782 | ]);
|
---|
783 |
|
---|
784 |
|
---|
785 | _id count
|
---|
786 | us 4.0
|
---|
787 | nz 4.0
|
---|
788 | au 3.0
|
---|
789 | ru 1.0
|
---|
790 | de 1.0
|
---|
791 |
|
---|
792 | Total: 13 sites that have /mi/ and are detected as having MRI content,
|
---|
793 | db.getCollection('Websites').find({$and: [{numPagesContainingMRI: {$gt: 0}},{urlContainsLangCodeInPath: true}]}).count()
|
---|
794 | 13
|
---|
795 |
|
---|
796 | Of these 13, the 4 from NZ were already included in steps 5 and 6. So the difference is only 8 sites that are MI.
|
---|
797 |
|
---|
798 |
|
---|
799 | Let's get a listing of the sites' domains - 3 whose country codes are NOT NZ have NZ TLD!
|
---|
800 | /* 1 */
|
---|
801 | {
|
---|
802 | "_id" : "nz",
|
---|
803 | "count" : 4.0,
|
---|
804 | "domain" : [
|
---|
805 | "http://firstworldwar.tki.org.nz",
|
---|
806 | "http://www.firstworldwar.tki.org.nz",
|
---|
807 | "https://admin.teara.govt.nz",
|
---|
808 | "http://community.nzdl.org"
|
---|
809 | ]
|
---|
810 | }
|
---|
811 |
|
---|
812 | /* 2 */
|
---|
813 | {
|
---|
814 | "_id" : "us",
|
---|
815 | "count" : 4.0,
|
---|
816 | "domain" : [
|
---|
817 | "https://sexualviolence.victimsinfo.govt.nz",
|
---|
818 | "https://follow3rs.com",
|
---|
819 | "http://www.church-of-christ.org",
|
---|
820 | "http://www.mytrickstips.com"
|
---|
821 | ]
|
---|
822 | }
|
---|
823 |
|
---|
824 | /* 3 */
|
---|
825 | {
|
---|
826 | "_id" : "au",
|
---|
827 | "count" : 3.0,
|
---|
828 | "domain" : [
|
---|
829 | "https://rapuatearatika.education.govt.nz",
|
---|
830 | "https://www.kiwiproperty.com",
|
---|
831 | "https://curriculumtool.education.govt.nz"
|
---|
832 | ]
|
---|
833 | }
|
---|
834 |
|
---|
835 | /* 4 */
|
---|
836 | {
|
---|
837 | "_id" : "ru",
|
---|
838 | "count" : 1.0,
|
---|
839 | "domain" : [
|
---|
840 | "http://www.treningmozga.com"
|
---|
841 | ]
|
---|
842 | }
|
---|
843 |
|
---|
844 | /* 5 */
|
---|
845 | {
|
---|
846 | "_id" : "de",
|
---|
847 | "count" : 1.0,
|
---|
848 | "domain" : [
|
---|
849 | "http://www.almancax.com" # Website to learn German, autotranslated
|
---|
850 | ]
|
---|
851 | }
|
---|
852 |
|
---|
853 |
|
---|
854 | But we're not catching a potentially large number of auto-translated sites, like
|
---|
855 | - https://www.gigalight.com/all-languages.html
|
---|
856 | - http://www.hzhinew.com/
|
---|
857 |
|
---|
858 |
|
---|
859 | --------------
|
---|
860 | GETTING TABLE DATA OUT OF MONGO DB:
|
---|
861 |
|
---|
862 | https://stackoverflow.com/questions/28733692/how-to-export-json-from-mongodb-using-robomongo
|
---|
863 | "export to file" as in a spreadsheet like to a .csv?
|
---|
864 |
|
---|
865 | IMO this is the EASIEST way to do this in Robo 3T (formerly robomongo):
|
---|
866 |
|
---|
867 | 1. In the top right of the Robo 3T GUI there is a "View Results in text mode" button, click it and copy everything
|
---|
868 |
|
---|
869 | 2. paste everything into this website: https://json-csv.com/
|
---|
870 |
|
---|
871 | 3. click the download button and now you have it in a spreadsheet.
|
---|
872 |
|
---|
873 |
|
---|
874 | https://json-csv.com/
|
---|
875 |
|
---|
876 |
|
---|
877 | ---------------------
|
---|
878 |
|
---|
879 | /* 1 */
|
---|
880 | {
|
---|
881 | "_id" : "US",
|
---|
882 | "count" : 93.0,
|
---|
883 | -95.8,40.33
|
---|
884 | }
|
---|
885 |
|
---|
886 | /* 2 */
|
---|
887 | {
|
---|
888 | "_id" : "AU",
|
---|
889 | "count" : 7.0,
|
---|
890 | 135.8,-25.33
|
---|
891 | }
|
---|
892 |
|
---|
893 | /* 3 */
|
---|
894 | {
|
---|
895 | "_id" : "CN",
|
---|
896 | "count" : 7.0,
|
---|
897 | 100.8,
|
---|
898 | 32.33
|
---|
899 | }
|
---|
900 |
|
---|
901 | /* 4 */
|
---|
902 | {
|
---|
903 | "_id" : "NZ",
|
---|
904 | "count" : 5.0,
|
---|
905 | 175.8,
|
---|
906 | -40.33
|
---|
907 | }
|
---|
908 |
|
---|
909 | /* 5 */
|
---|
910 | {
|
---|
911 | "_id" : "DE",
|
---|
912 | "count" : 5.0,
|
---|
913 | 10.8,
|
---|
914 | 50.33
|
---|
915 | }
|
---|
916 |
|
---|
917 | /* 6 */
|
---|
918 | {
|
---|
919 | "_id" : "HK",
|
---|
920 | "count" : 5.0,
|
---|
921 | 114,
|
---|
922 | 22.33
|
---|
923 | }
|
---|
924 |
|
---|
925 | /* 7 */
|
---|
926 | {
|
---|
927 | "_id" : "RU",
|
---|
928 | "count" : 4.0,
|
---|
929 | 38.4,
|
---|
930 | 55.5
|
---|
931 | }
|
---|
932 |
|
---|
933 | /* 8 */
|
---|
934 | {
|
---|
935 | "_id" : "JP",
|
---|
936 | "count" : 3.0,
|
---|
937 | 137.8,
|
---|
938 | 36
|
---|
939 | }
|
---|
940 |
|
---|
941 | /* 9 */
|
---|
942 | {
|
---|
943 | "_id" : "GB",
|
---|
944 | "count" : 3.0,
|
---|
945 | -2,
|
---|
946 | 53.33
|
---|
947 | }
|
---|
948 |
|
---|
949 | /* 10 */
|
---|
950 | {
|
---|
951 | "_id" : "CA",
|
---|
952 | "count" : 2.0,
|
---|
953 | -105.8,
|
---|
954 | 55.33
|
---|
955 | }
|
---|
956 |
|
---|
957 | /* 11 */
|
---|
958 | {
|
---|
959 | "_id" : "FR",
|
---|
960 | "count" : 2.0,
|
---|
961 | 3,
|
---|
962 | 47.33
|
---|
963 | }
|
---|
964 |
|
---|
965 | /* 12 */
|
---|
966 | {
|
---|
967 | "_id" : "DK",
|
---|
968 | "count" : 2.0,
|
---|
969 | 9.5,
|
---|
970 | 55.33
|
---|
971 | }
|
---|
972 |
|
---|
973 | /* 13 British Virgin Islands */
|
---|
974 | {
|
---|
975 | "_id" : "VG",
|
---|
976 | "count" : 2.0,
|
---|
977 | -64.8,
|
---|
978 | 18.35
|
---|
979 | }
|
---|
980 |
|
---|
981 | /* 14 Ukraine */
|
---|
982 | {
|
---|
983 | "_id" : "UA",
|
---|
984 | "count" : 1.0,
|
---|
985 | 31.5,
|
---|
986 | 48.5
|
---|
987 | }
|
---|
988 |
|
---|
989 | /* 15 */
|
---|
990 | {
|
---|
991 | "_id" : "CZ",
|
---|
992 | "count" : 1.0,
|
---|
993 | 16.2,
|
---|
994 | 49.7
|
---|
995 | }
|
---|
996 |
|
---|
997 | /* 16 Switzerland */
|
---|
998 | {
|
---|
999 | "_id" : "CH",
|
---|
1000 | "count" : 1.0,
|
---|
1001 | 8.5,
|
---|
1002 | 47
|
---|
1003 | }
|
---|
1004 |
|
---|
1005 | /* 17 Zuid-Afrika */
|
---|
1006 | {
|
---|
1007 | "_id" : "ZA",
|
---|
1008 | "count" : 1.0,
|
---|
1009 | 24.2,
|
---|
1010 | -30.7
|
---|
1011 | }
|
---|
1012 |
|
---|
1013 | /* 18 */
|
---|
1014 | {
|
---|
1015 | "_id" : "NL",
|
---|
1016 | "count" : 1.0,
|
---|
1017 | 5.8,
|
---|
1018 | 52.33
|
---|
1019 | }
|
---|
1020 |
|
---|
1021 | /* 19 */
|
---|
1022 | {
|
---|
1023 | "_id" : "KR",
|
---|
1024 | "count" : 1.0,
|
---|
1025 | 127.8,
|
---|
1026 | 36.8
|
---|
1027 | }
|
---|
1028 |
|
---|
1029 |
|
---|
1030 | /** http://geojson.tools/
|
---|
1031 |
|
---|
1032 |
|
---|
1033 | {
|
---|
1034 | "type": "MultiPoint",
|
---|
1035 | "coordinates": [
|
---|
1036 | [
|
---|
1037 | -95.8,
|
---|
1038 | 40.33
|
---|
1039 | ],
|
---|
1040 | [
|
---|
1041 | 135.8,
|
---|
1042 | -25.33
|
---|
1043 | ],
|
---|
1044 | [
|
---|
1045 | 100.8,
|
---|
1046 | 32.33
|
---|
1047 | ],
|
---|
1048 | [
|
---|
1049 | 175.8,
|
---|
1050 | -40.33
|
---|
1051 | ],
|
---|
1052 | [
|
---|
1053 | 10.8,
|
---|
1054 | 50.33
|
---|
1055 | ],
|
---|
1056 | [
|
---|
1057 | 10.8,
|
---|
1058 | 50.33
|
---|
1059 | ],
|
---|
1060 | [
|
---|
1061 | 114,
|
---|
1062 | 22.33
|
---|
1063 | ],
|
---|
1064 | [
|
---|
1065 | 38.4,
|
---|
1066 | 55.5
|
---|
1067 | ],
|
---|
1068 | [
|
---|
1069 | -2,
|
---|
1070 | 53.33
|
---|
1071 | ],
|
---|
1072 | [
|
---|
1073 | 137.8,
|
---|
1074 | 36
|
---|
1075 | ],
|
---|
1076 | [
|
---|
1077 | -105.8,
|
---|
1078 | 55.33
|
---|
1079 | ],
|
---|
1080 | [
|
---|
1081 | 3,
|
---|
1082 | 47.33
|
---|
1083 | ],
|
---|
1084 | [
|
---|
1085 | 9.5,
|
---|
1086 | 55.33
|
---|
1087 | ],
|
---|
1088 | [
|
---|
1089 | -64.8,
|
---|
1090 | 18.35
|
---|
1091 | ],
|
---|
1092 | [
|
---|
1093 | 31.5,
|
---|
1094 | 48.5
|
---|
1095 | ],
|
---|
1096 | [
|
---|
1097 | 16.2,
|
---|
1098 | 49.7
|
---|
1099 | ],
|
---|
1100 | [
|
---|
1101 | 8.5,
|
---|
1102 | 47
|
---|
1103 | ],
|
---|
1104 | [
|
---|
1105 | 24.2,
|
---|
1106 | -30.7
|
---|
1107 | ],
|
---|
1108 | [
|
---|
1109 | 5.8,
|
---|
1110 | 52.33
|
---|
1111 | ],
|
---|
1112 | [
|
---|
1113 | 127.8,
|
---|
1114 | 36.8
|
---|
1115 | ]
|
---|
1116 | ]
|
---|
1117 | }
|
---|
1118 |
|
---|
1119 | */
|
---|
1120 |
|
---|
1121 | /* 1 */
|
---|
1122 | {
|
---|
1123 | "_id" : "US",
|
---|
1124 | "count" : 93.0,
|
---|
1125 | -95.8,40.33
|
---|
1126 | }
|
---|
1127 |
|
---|
1128 | /* 2 */
|
---|
1129 | {
|
---|
1130 | "_id" : "AU",
|
---|
1131 | "count" : 7.0,
|
---|
1132 | 135.8,-25.33
|
---|
1133 | }
|
---|
1134 |
|
---|
1135 | /* 3 */
|
---|
1136 | {
|
---|
1137 | "_id" : "CN",
|
---|
1138 | "count" : 7.0,
|
---|
1139 | 100.8,
|
---|
1140 | 32.33
|
---|
1141 | }
|
---|
1142 |
|
---|
1143 | /* 4 */
|
---|
1144 | {
|
---|
1145 | "_id" : "NZ",
|
---|
1146 | "count" : 5.0,
|
---|
1147 | 175.8,
|
---|
1148 | -40.33
|
---|
1149 | }
|
---|
1150 |
|
---|
1151 | /* 5 */
|
---|
1152 | {
|
---|
1153 | "_id" : "DE",
|
---|
1154 | "count" : 5.0,
|
---|
1155 | 10.8,
|
---|
1156 | 50.33
|
---|
1157 | }
|
---|
1158 |
|
---|
1159 | /* 6 */
|
---|
1160 | {
|
---|
1161 | "_id" : "HK",
|
---|
1162 | "count" : 5.0,
|
---|
1163 | 114,
|
---|
1164 | 22.33
|
---|
1165 | }
|
---|
1166 |
|
---|
1167 | /* 7 */
|
---|
1168 | {
|
---|
1169 | "_id" : "RU",
|
---|
1170 | "count" : 4.0,
|
---|
1171 | 38.4,
|
---|
1172 | 55.5
|
---|
1173 | }
|
---|
1174 |
|
---|
1175 | /* 8 */
|
---|
1176 | {
|
---|
1177 | "_id" : "JP",
|
---|
1178 | "count" : 3.0,
|
---|
1179 | 137.8,
|
---|
1180 | 36
|
---|
1181 | }
|
---|
1182 |
|
---|
1183 | /* 9 */
|
---|
1184 | {
|
---|
1185 | "_id" : "GB",
|
---|
1186 | "count" : 3.0,
|
---|
1187 | -2,
|
---|
1188 | 53.33
|
---|
1189 | }
|
---|
1190 |
|
---|
1191 | /* 10 */
|
---|
1192 | {
|
---|
1193 | "_id" : "CA",
|
---|
1194 | "count" : 2.0,
|
---|
1195 | -105.8,
|
---|
1196 | 55.33
|
---|
1197 | }
|
---|
1198 |
|
---|
1199 | /* 11 */
|
---|
1200 | {
|
---|
1201 | "_id" : "FR",
|
---|
1202 | "count" : 2.0,
|
---|
1203 | 3,
|
---|
1204 | 47.33
|
---|
1205 | }
|
---|
1206 |
|
---|
1207 | /* 12 */
|
---|
1208 | {
|
---|
1209 | "_id" : "DK",
|
---|
1210 | "count" : 2.0,
|
---|
1211 | 9.5,
|
---|
1212 | 55.33
|
---|
1213 | }
|
---|
1214 |
|
---|
1215 | /* 13 British Virgin Islands */
|
---|
1216 | {
|
---|
1217 | "_id" : "VG",
|
---|
1218 | "count" : 2.0,
|
---|
1219 | -64.8,
|
---|
1220 | 18.35
|
---|
1221 | }
|
---|
1222 |
|
---|
1223 | /* 14 Ukraine */
|
---|
1224 | {
|
---|
1225 | "_id" : "UA",
|
---|
1226 | "count" : 1.0,
|
---|
1227 | 31.5,
|
---|
1228 | 48.5
|
---|
1229 | }
|
---|
1230 |
|
---|
1231 | /* 15 */
|
---|
1232 | {
|
---|
1233 | "_id" : "CZ",
|
---|
1234 | "count" : 1.0,
|
---|
1235 | 16.2,
|
---|
1236 | 49.7
|
---|
1237 | }
|
---|
1238 |
|
---|
1239 | /* 16 Switzerland */
|
---|
1240 | {
|
---|
1241 | "_id" : "CH",
|
---|
1242 | "count" : 1.0,
|
---|
1243 | 8.5,
|
---|
1244 | 47
|
---|
1245 | }
|
---|
1246 |
|
---|
1247 | /* 17 Zuid-Afrika */
|
---|
1248 | {
|
---|
1249 | "_id" : "ZA",
|
---|
1250 | "count" : 1.0,
|
---|
1251 | 24.2,
|
---|
1252 | -30.7
|
---|
1253 | }
|
---|
1254 |
|
---|
1255 | /* 18 */
|
---|
1256 | {
|
---|
1257 | "_id" : "NL",
|
---|
1258 | "count" : 1.0,
|
---|
1259 | 5.8,
|
---|
1260 | 52.33
|
---|
1261 | }
|
---|
1262 |
|
---|
1263 | /* 19 */
|
---|
1264 | {
|
---|
1265 | "_id" : "KR",
|
---|
1266 | "count" : 1.0,
|
---|
1267 | 127.8,
|
---|
1268 | 36.8
|
---|
1269 | }
|
---|
1270 |
|
---|
1271 |
|
---|
1272 | /** http://geojson.tools/
|
---|
1273 |
|
---|
1274 |
|
---|
1275 | {
|
---|
1276 | "type": "MultiPoint",
|
---|
1277 | "coordinates": [
|
---|
1278 | [
|
---|
1279 | -95.8,
|
---|
1280 | 40.33
|
---|
1281 | ],
|
---|
1282 | [
|
---|
1283 | 135.8,
|
---|
1284 | -25.33
|
---|
1285 | ],
|
---|
1286 | [
|
---|
1287 | 100.8,
|
---|
1288 | 32.33
|
---|
1289 | ],
|
---|
1290 | [
|
---|
1291 | 175.8,
|
---|
1292 | -40.33
|
---|
1293 | ],
|
---|
1294 | [
|
---|
1295 | 10.8,
|
---|
1296 | 50.33
|
---|
1297 | ],
|
---|
1298 | [
|
---|
1299 | 10.8,
|
---|
1300 | 50.33
|
---|
1301 | ],
|
---|
1302 | [
|
---|
1303 | 114,
|
---|
1304 | 22.33
|
---|
1305 | ],
|
---|
1306 | [
|
---|
1307 | 38.4,
|
---|
1308 | 55.5
|
---|
1309 | ],
|
---|
1310 | [
|
---|
1311 | -2,
|
---|
1312 | 53.33
|
---|
1313 | ],
|
---|
1314 | [
|
---|
1315 | 137.8,
|
---|
1316 | 36
|
---|
1317 | ],
|
---|
1318 | [
|
---|
1319 | -105.8,
|
---|
1320 | 55.33
|
---|
1321 | ],
|
---|
1322 | [
|
---|
1323 | 3,
|
---|
1324 | 47.33
|
---|
1325 | ],
|
---|
1326 | [
|
---|
1327 | 9.5,
|
---|
1328 | 55.33
|
---|
1329 | ],
|
---|
1330 | [
|
---|
1331 | -64.8,
|
---|
1332 | 18.35
|
---|
1333 | ],
|
---|
1334 | [
|
---|
1335 | 31.5,
|
---|
1336 | 48.5
|
---|
1337 | ],
|
---|
1338 | [
|
---|
1339 | 16.2,
|
---|
1340 | 49.7
|
---|
1341 | ],
|
---|
1342 | [
|
---|
1343 | 8.5,
|
---|
1344 | 47
|
---|
1345 | ],
|
---|
1346 | [
|
---|
1347 | 24.2,
|
---|
1348 | -30.7
|
---|
1349 | ],
|
---|
1350 | [
|
---|
1351 | 5.8,
|
---|
1352 | 52.33
|
---|
1353 | ],
|
---|
1354 | [
|
---|
1355 | 127.8,
|
---|
1356 | 36.8
|
---|
1357 | ]
|
---|
1358 | ]
|
---|
1359 | }
|
---|
1360 |
|
---|
1361 | */
|
---|