Changeset 33675

Show
Ignore:
Timestamp:
15.11.2019 00:22:34 (3 weeks ago)
Author:
ak19
Message:

Committing the newer query results (but from before today's reingestion in mongodb) and some more mongodb related links to read suggested by Dr Bainbridge and his google searches today.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/MoreReading/mongodb.txt

    r33666 r33675  
    347347https://docs.mongodb.com/manual/reference/method/db.collection.find/#find-projection 
    348348 
    349  
     349Mongo Studio 3T documentation: 
     350https://studio3t.com/download/ (also has uninstall information) 
     351https://studio3t.com/download-thank-you/?OS=x64 
     352 
     353Google: MongoDB visualization 
     354MongoDB visualization map 
     355MongoDB Charts 
     356    (Open source visualisation tools) 
     357 
     358json map visualizer 
     359    geojson.tools 
    350360------------------- 
    351361 
     
    358368# Num webpages 
    359369db.getCollection('Webpages').find({}).count() 
    360 75139 
     370X75139 
     371117496 
    361372 
    362373# Find number of websites who have 1 or more pages in Maori (a positive numPagesInMRI) 
     
    367378db.getCollection('Webpages').find({isMRI:true}).count() 
    368379X5224 
    369 5215 
     380X5215 
     381db.getCollection('Webpages').find({isMRI:true}).count() 
     3827818 
    370383 
    371384# Number of pages that contain any number of MRI sentences 
    372385db.getCollection('Webpages').find({containsMRI: true}).count() 
    373 12858 
     386X12858 
     38720371 
     388 
    374389 
    375390# Number of sites with URLs containing /mi(/) 
     
    389404db.getCollection('Websites').find({urlContainsLangCodeInpath:true}).sort({geoLocationCountryCode: 1}) 
    390405 
    391  
     406Actually, I want to sort by count. See https://docs.mongodb.com/manual/reference/operator/aggregation/sortByCount/ 
     407 
     408 
     409 
     410* Identify where Maori language is online. 
     411* How can we identify high quality sites that would be good for a corpus. 
     412(Related work for other languages to quantifiably answer that) 
     413