Changeset 33847 for other-projects/maori-lang-detection/MoreReading
- Timestamp:
- 2020-01-17T19:32:16+13:00 (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/maori-lang-detection/MoreReading/mongodb.txt
r33843 r33847 1488 1488 TIDIED: 1489 1489 NZ: 176 1490 US: 25+ 3 from US with mi in URL path = 281490 US: 25+4 from US with mi in URL path = 29 1491 1491 AU: 3 1492 1492 DE: 2 … … 1497 1497 FR: 1 1498 1498 IE: 1 1499 TOTAL: 213+ 3 from US with mi in URL path = 2161499 TOTAL: 213+4 from US with mi in URL path = 217 1500 1500 1501 1501 … … 1525 1525 Of interest or possible interest: 1526 1526 US: 1527 !! http://indigenousblogs.com [15/18 blogs work] 1527 !! http://indigenousblogs.com [15/18 blogs work] - has one page in Maori (http://indigenousblogs.com/feeds/mi.xml) 1528 1528 X https://biblia.gospelprime.com.br - misdetection (containsMRI) 1529 1529 X ?https://follow3rs.com - seems dodgy and possibly auto-translated. Can't spell account, misspelled as accout … … 1559 1559 db.getCollection('Webpages').find({$and: [{isMRI: true}, {URL: /indigenousblogs\.com/}]}) 1560 1560 => http://indigenousblogs.com/mi/ 1561 1562 -------------------------- 1563 1564 1565 db.Websites.aggregate([ 1566 { 1567 $match: { 1568 $and: [ 1569 {geoLocationCountryCode: {$ne: "NZ"}}, 1570 {domain: {$not: /\.nz/}}, 1571 {numPagesContainingMRI: {$gt: 0}}, 1572 {$or: [{geoLocationCountryCode: "AU"}, {urlContainsLangCodeInPath: false}]} 1573 ] 1574 } 1575 }, 1576 { $unwind: "$geoLocationCountryCode" }, 1577 { 1578 $group: { 1579 _id: {$toLower: '$geoLocationCountryCode'}, 1580 count: { $sum: 1 }, 1581 domain: { $addToSet: '$domain' }, 1582 numPagesInMRI: { $addToSet: '$numPagesInMRI' }, 1583 numPagesContainingMRI: { $addToSet: '$numPagesContainingMRI' }, 1584 numPagesInMRICount: { $sum: '$numPagesInMRI' }, 1585 numPagesContainingMRICount: { $sum: '$numPagesContainingMRI' } 1586 } 1587 }, 1588 { $sort : { count : -1} } 1589 ]); 1590 1591 1592 To convert json to csv 1593 In gedit replace 1594 \/\*\s*\d+\s*\*\/ => ,
Note:
See TracChangeset
for help on using the changeset viewer.