Ignore:
Timestamp:
2019-12-12T18:04:10+13:00 (4 years ago)
Author:
ak19
Message:

Removed an adult site from crawled contents and added its url to blacklist conf file (for if ever anyone crawls our MRI set of common crawl sites again)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/MoreReading/mongodb.txt

    r33787 r33800  
    479479
    480480# Number of sites with URLs containing /mi(/)
    481 db.getCollection('Websites').find({urlContainsLangCodeInpath:true}).count()
     481db.getCollection('Websites').find({urlContainsLangCodeInPath:true}).count()
    482482153
    483483
    484484# Number of websites that are outside NZ that contain /mi(/) in any of its sub-urls
    485 0db.getCollection('Websites').find({urlContainsLangCodeInpath:true, geoLocationCountryCode: {$ne : "NZ"} }).count()
     485db.getCollection('Websites').find({urlContainsLangCodeInPath:true, geoLocationCountryCode: {$ne : "NZ"} }).count()
    486486148
    487487
    488488# 5 sites with URLs containing /mi(/) that are in NZ
    489 db.getCollection('Websites').find({urlContainsLangCodeInpath:true, geoLocationCountryCode: "NZ"}).count()
     489db.getCollection('Websites').find({urlContainsLangCodeInPath:true, geoLocationCountryCode: "NZ"}).count()
    4904905
    491491
    492492# sort websites that contain /mi(/) in path by geoLocationCountryCode
    493493#    https://www.quackit.com/mongodb/tutorial/mongodb_sort_query_results.cfm
    494 db.getCollection('Websites').find({urlContainsLangCodeInpath:true}).sort({geoLocationCountryCode: 1})
     494db.getCollection('Websites').find({urlContainsLangCodeInPath:true}).sort({geoLocationCountryCode: 1})
    495495
    496496Actually, I want to sort by count. See https://docs.mongodb.com/manual/reference/operator/aggregation/sortByCount/
     
    498498
    499499# PROJECTION:
    500 db.getCollection('Websites').find({geoLocationCountryCode: {$ne:"nz"}}, {geoLocationCountryCode:1, urlContainsLangCodeInpath: 1})
     500db.getCollection('Websites').find({geoLocationCountryCode: {$ne:"nz"}}, {geoLocationCountryCode:1, urlContainsLangCodeInPath: 1})
    501501
    502502https://docs.mongodb.com/manual/aggregation/
Note: See TracChangeset for help on using the changeset viewer.