Changeset 33675 for other-projects


Ignore:
Timestamp:
2019-11-15T00:22:34+13:00 (4 years ago)
Author:
ak19
Message:

Committing the newer query results (but from before today's reingestion in mongodb) and some more mongodb related links to read suggested by Dr Bainbridge and his google searches today.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/MoreReading/mongodb.txt

    r33666 r33675  
    347347https://docs.mongodb.com/manual/reference/method/db.collection.find/#find-projection
    348348
    349 
     349Mongo Studio 3T documentation:
     350https://studio3t.com/download/ (also has uninstall information)
     351https://studio3t.com/download-thank-you/?OS=x64
     352
     353Google: MongoDB visualization
     354MongoDB visualization map
     355MongoDB Charts
     356    (Open source visualisation tools)
     357
     358json map visualizer
     359    geojson.tools
    350360-------------------
    351361
     
    358368# Num webpages
    359369db.getCollection('Webpages').find({}).count()
    360 75139
     370X75139
     371117496
    361372
    362373# Find number of websites who have 1 or more pages in Maori (a positive numPagesInMRI)
     
    367378db.getCollection('Webpages').find({isMRI:true}).count()
    368379X5224
    369 5215
     380X5215
     381db.getCollection('Webpages').find({isMRI:true}).count()
     3827818
    370383
    371384# Number of pages that contain any number of MRI sentences
    372385db.getCollection('Webpages').find({containsMRI: true}).count()
    373 12858
     386X12858
     38720371
     388
    374389
    375390# Number of sites with URLs containing /mi(/)
     
    389404db.getCollection('Websites').find({urlContainsLangCodeInpath:true}).sort({geoLocationCountryCode: 1})
    390405
    391 
     406Actually, I want to sort by count. See https://docs.mongodb.com/manual/reference/operator/aggregation/sortByCount/
     407
     408
     409
     410* Identify where Maori language is online.
     411* How can we identify high quality sites that would be good for a corpus.
     412(Related work for other languages to quantifiably answer that)
     413
Note: See TracChangeset for help on using the changeset viewer.