# # ChangeLog for other-projects/maori-lang-detection/MoreReading # # Generated by Trac 1.4.2 # 2024-04-28T11:28:36+12:00 Mon, 13 Jan 2020 06:45:21 GMT ak19 [33823] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/mongodb-data (added) * other-projects/maori-lang-detection/mongodb-data/1a_counts_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1a_geojson-features_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1a_multipoint_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_counts_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_geojson-features_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_multipoint_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1counts_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/1geojson-features_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/1map_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/1multipoint_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/2counts_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/2geojson-features_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/2map_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/2multipoint_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3counts_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3geojson-features_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3map_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/3multipoint_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/4counts_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/4geojson-features_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/4map_exclTentativeAutotranslatedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/4multipoint_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/5geojson-features_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/5map_exclTentativeAutotranslatedSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/5multipoint_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (added) * other-projects/maori-lang-detection/mongodb-data/6multipoint_nonProductSites1_manualShortlist.json (added) Recommitting mongo-data folder with renamed files with numbering. Thu, 19 Dec 2019 09:33:08 GMT ak19 [33816] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Finished manually going through the sites that I couldn't easily ... Wed, 18 Dec 2019 08:38:44 GMT ak19 [33813] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/counts_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/counts_miInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_noMiInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath.png (deleted) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath_UnknownSelected.png (deleted) * other-projects/maori-lang-detection/mongodb-data/histo_worldmap_miInUrlPath.png (deleted) * other-projects/maori-lang-detection/mongodb-data/map_exclTentativeAutotranslatedSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesContainingMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesInMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/map_tentativeNonProductSites.png (deleted) * other-projects/maori-lang-detection/mongodb-data/multipoint_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (deleted) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (deleted) With the bugfix from yesterday and the inclusion of http(s)://mi.* ... Tue, 17 Dec 2019 06:29:58 GMT ak19 [33807] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Trying to manually go through a shortlisted set of domains to see if ... Fri, 13 Dec 2019 08:31:11 GMT ak19 [33806] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/map_tentativeNonProductSites.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites.json (added) More mongodb querying revealed that excluding tentative product sites ... Fri, 13 Dec 2019 07:00:53 GMT ak19 [33804] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) 1. Updated results from mongodb querying after yesterday's ... Thu, 12 Dec 2019 05:04:10 GMT ak19 [33800] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/crawledNode2.tar (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Removed an adult site from crawled contents and added its url to ... Tue, 10 Dec 2019 07:36:30 GMT ak19 [33787] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Documented another mongodb query that I'm using, the one to produce ... Mon, 25 Nov 2019 08:29:42 GMT ak19 [33722] * other-projects/maori-lang-detection/MoreReading/countrycodes.csv (added) * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Adding in additional instructions in mongodb.txt, before I forgot how ... Wed, 20 Nov 2019 10:23:29 GMT ak19 [33710] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Working queries and map coords for geojson.tools (ironically, Lat and ... Fri, 15 Nov 2019 10:14:48 GMT ak19 [33698] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Links to more reading Thu, 14 Nov 2019 11:22:34 GMT ak19 [33675] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Committing the newer query results (but from before today's ... Wed, 13 Nov 2019 10:08:37 GMT ak19 [33666] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * other-projects/maori-lang-detection/crawledNode6.tar (modified) * other-projects/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * other-projects/maori-lang-detection/to_crawl.tar.gz (added) Having finished sending all the crawl data to mongodb 1. Recrawled ... Tue, 12 Nov 2019 07:51:48 GMT ak19 [33653] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/lib/classgraph-4.8.52.jar (added) * other-projects/maori-lang-detection/lib/core-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/logging-slf4j-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/slf4j-api-1.7.9.jar (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebpageInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (moved) 1. As suggested by Dr Bainbridge, made the code changes to use ... Mon, 11 Nov 2019 05:46:24 GMT ak19 [33646] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Saving the mongodb queries and learning links that Dr Bainbridge ... Sun, 10 Nov 2019 22:50:29 GMT ak19 [33644] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (added) Just committing the growing mongodb.txt file with links and ... Sun, 10 Nov 2019 20:38:55 GMT ak19 [33635] * other-projects/maori-lang-detection (moved) Maori-language-detection doesn't use Greenstone 3 at present, it's ... Tue, 05 Nov 2019 08:04:09 GMT ak19 [33623] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) 1. Incorporated Dr Nichols earlier suggestion of storing page ...