# # ChangeLog for other-projects/maori-lang-detection/src # # Generated by Trac 1.4.2 # 2024-06-08T12:14:31+12:00 Thu, 30 Jan 2020 07:21:31 GMT ak19 [33879] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Have the 2 mongodb aggregate() calls working that Wed, 29 Jan 2020 08:48:52 GMT ak19 [33876] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Some missteps, but have got complex collection.aggregate() working at ... Fri, 24 Jan 2020 08:49:44 GMT ak19 [33873] * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (added) Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge ... Fri, 24 Jan 2020 07:59:42 GMT ak19 [33871] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) Removed mostly duplicated older version of method but left the ... Fri, 24 Jan 2020 07:48:17 GMT ak19 [33870] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) Got the mongodb query working in Java in 2 different ways: the fully ... Thu, 23 Jan 2020 09:59:46 GMT ak19 [33869] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (added) First cut at the RandomURLsForDomainGenerator.java class and the ... Thu, 23 Jan 2020 08:12:17 GMT ak19 [33867] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Moved the code handling of special case large rectangles and those ... Wed, 22 Jan 2020 06:31:09 GMT ak19 [33858] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Fixes to the code committed yesterday: correct calculation of the ... Tue, 21 Jan 2020 08:58:29 GMT ak19 [33853] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Handling map coordinates that are horizontally excessive (beyond ... Wed, 18 Dec 2019 08:36:07 GMT ak19 [33812] * other-projects/maori-lang-detection/conf/countrycodes.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Better handling of multi-line comment symbols, so I can now include ... Wed, 18 Dec 2019 03:51:34 GMT ak19 [33811] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Returning to using a single variable, urlContainsLangCodeInPath, to ... Tue, 17 Dec 2019 08:48:08 GMT ak19 [33810] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Bugfix: mi in url path should be checked for for each page of site, ... Tue, 17 Dec 2019 06:31:28 GMT ak19 [33808] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Storing not just whether /mi(/) suffix is in path, but also whether ... Fri, 13 Dec 2019 07:08:14 GMT ak19 [33805] * other-projects/maori-lang-detection/conf/countrycodes.json (moved) * other-projects/maori-lang-detection/mongodb-data/countrycodes1.json (deleted) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) 1. Moving the static countrycodes.json file to conf folder and ... Fri, 13 Dec 2019 05:40:46 GMT ak19 [33801] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) 1. NutchTextDumpToMongoDB Added an extra field to each document in ... Thu, 12 Dec 2019 05:04:10 GMT ak19 [33800] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/crawledNode2.tar (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Removed an adult site from crawled contents and added its url to ... Thu, 12 Dec 2019 03:08:08 GMT ak19 [33799] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) 1. Adding breadcrumb for next step at end of running ... Thu, 12 Dec 2019 02:42:19 GMT ak19 [33796] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Instead of a hack for US' count being too great that its histogram ... Wed, 11 Dec 2019 08:57:02 GMT ak19 [33794] * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (added) * other-projects/maori-lang-detection/mongodb-data/multipoint.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Wrote the geojson map data created from the site counts per ... Tue, 10 Dec 2019 07:43:53 GMT ak19 [33790] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Got the MultiPoint geojson mapdata of the country code counts ... Mon, 09 Dec 2019 08:55:27 GMT ak19 [33778] * other-projects/maori-lang-detection/lib/gson-1.7.1.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.3.jar (added) * other-projects/maori-lang-detection/mongodb-data (added) * other-projects/maori-lang-detection/mongodb-data/countrycodes.json (added) * other-projects/maori-lang-detection/mongodb-data/countrycodes1.json (added) * other-projects/maori-lang-detection/mongodb-data/counts.json (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (added) Made a beginning on getting the geojson map data automated. Couldn't ... Fri, 15 Nov 2019 10:14:48 GMT ak19 [33698] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Links to more reading Thu, 14 Nov 2019 11:21:31 GMT ak19 [33674] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) Changes to support the top 5 predicted langcodes and their confidence ... Wed, 13 Nov 2019 10:08:37 GMT ak19 [33666] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * other-projects/maori-lang-detection/crawledNode6.tar (modified) * other-projects/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * other-projects/maori-lang-detection/to_crawl.tar.gz (added) Having finished sending all the crawl data to mongodb 1. Recrawled ... Tue, 12 Nov 2019 08:33:57 GMT ak19 [33657] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Some fixes after brief testing against 1/3 of the crawl. Restarted ... Tue, 12 Nov 2019 08:11:05 GMT ak19 [33656] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Final minor changes before I start processing the crawls of node2. Tue, 12 Nov 2019 07:56:53 GMT ak19 [33655] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Minor change to print statement Tue, 12 Nov 2019 07:51:48 GMT ak19 [33653] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/lib/classgraph-4.8.52.jar (added) * other-projects/maori-lang-detection/lib/core-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/logging-slf4j-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/slf4j-api-1.7.9.jar (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebpageInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (moved) 1. As suggested by Dr Bainbridge, made the code changes to use ... Tue, 12 Nov 2019 07:41:13 GMT ak19 [33652] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia (added) Introducing morphia subpackage Tue, 12 Nov 2019 05:11:39 GMT ak19 [33651] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (modified) 1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor Mon, 11 Nov 2019 05:45:29 GMT ak19 [33645] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences ... Sun, 10 Nov 2019 20:38:55 GMT ak19 [33635] * other-projects/maori-lang-detection (moved) Maori-language-detection doesn't use Greenstone 3 at present, it's ... Fri, 08 Nov 2019 10:59:07 GMT ak19 [33634] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/SentenceInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebsiteInfo.java (added) Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which ...