# # ChangeLog for other-projects/maori-lang-detection # # Generated by Trac 1.4.2 # 2024-05-31T16:18:05+12:00 Thu, 19 Dec 2019 04:13:26 GMT ak19 [33814] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Put the important mongodb queries and results into hdfs-cc- ... Wed, 18 Dec 2019 08:38:44 GMT ak19 [33813] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/counts_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/counts_miInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_noMiInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath.png (deleted) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath_UnknownSelected.png (deleted) * other-projects/maori-lang-detection/mongodb-data/histo_worldmap_miInUrlPath.png (deleted) * other-projects/maori-lang-detection/mongodb-data/map_exclTentativeAutotranslatedSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesContainingMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesInMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/map_tentativeNonProductSites.png (deleted) * other-projects/maori-lang-detection/mongodb-data/multipoint_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (deleted) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (deleted) With the bugfix from yesterday and the inclusion of http(s)://mi.* ... Wed, 18 Dec 2019 08:36:07 GMT ak19 [33812] * other-projects/maori-lang-detection/conf/countrycodes.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Better handling of multi-line comment symbols, so I can now include ... Wed, 18 Dec 2019 03:51:34 GMT ak19 [33811] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Returning to using a single variable, urlContainsLangCodeInPath, to ... Tue, 17 Dec 2019 08:48:08 GMT ak19 [33810] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Bugfix: mi in url path should be checked for for each page of site, ... Tue, 17 Dec 2019 06:53:17 GMT ak19 [33809] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Some more GS_README.txt instructions. Not put the mongodb queries in ... Tue, 17 Dec 2019 06:31:28 GMT ak19 [33808] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Storing not just whether /mi(/) suffix is in path, but also whether ... Tue, 17 Dec 2019 06:29:58 GMT ak19 [33807] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Trying to manually go through a shortlisted set of domains to see if ... Fri, 13 Dec 2019 08:31:11 GMT ak19 [33806] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/map_tentativeNonProductSites.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites.json (added) More mongodb querying revealed that excluding tentative product sites ... Fri, 13 Dec 2019 07:08:14 GMT ak19 [33805] * other-projects/maori-lang-detection/conf/countrycodes.json (moved) * other-projects/maori-lang-detection/mongodb-data/countrycodes1.json (deleted) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) 1. Moving the static countrycodes.json file to conf folder and ... Fri, 13 Dec 2019 07:00:53 GMT ak19 [33804] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) 1. Updated results from mongodb querying after yesterday's ... Fri, 13 Dec 2019 06:27:52 GMT ak19 [33803] * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesContainingMRI.json (added) geojson mapdata and map for mongodb results on ... Fri, 13 Dec 2019 05:42:05 GMT ak19 [33802] * other-projects/maori-lang-detection/mongodb-data/counts.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint.json (modified) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (modified) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (modified) With an extra adult site removed and with setting countrycodes that ... Fri, 13 Dec 2019 05:40:46 GMT ak19 [33801] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) 1. NutchTextDumpToMongoDB Added an extra field to each document in ... Thu, 12 Dec 2019 05:04:10 GMT ak19 [33800] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/crawledNode2.tar (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Removed an adult site from crawled contents and added its url to ... Thu, 12 Dec 2019 03:08:08 GMT ak19 [33799] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) 1. Adding breadcrumb for next step at end of running ... Thu, 12 Dec 2019 02:57:56 GMT ak19 [33798] * other-projects/maori-lang-detection/mongodb-data/counts_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath_UnknownSelected.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_noMiInUrlPath.json (added) Adding the geojson related files related to querying mongodb for ... Thu, 12 Dec 2019 02:42:47 GMT ak19 [33797] * other-projects/maori-lang-detection/mongodb-data/counts_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_worldmap_miInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (modified) Updated json and imaegs files, and new files for when /mi(/) is in ... Thu, 12 Dec 2019 02:42:19 GMT ak19 [33796] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Instead of a hack for US' count being too great that its histogram ... Wed, 11 Dec 2019 08:57:02 GMT ak19 [33794] * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (added) * other-projects/maori-lang-detection/mongodb-data/multipoint.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Wrote the geojson map data created from the site counts per ... Tue, 10 Dec 2019 07:43:53 GMT ak19 [33790] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Got the MultiPoint geojson mapdata of the country code counts ... Tue, 10 Dec 2019 07:39:46 GMT ak19 [33789] * other-projects/maori-lang-detection/mongodb-data/counts.json (modified) Redid the mongodb query to get the countrycode counts for all the ... Tue, 10 Dec 2019 07:39:06 GMT ak19 [33788] * other-projects/maori-lang-detection/lib/jackson-annotations-2.10.0.jar (added) * other-projects/maori-lang-detection/lib/jackson-core-2.10.0.jar (added) * other-projects/maori-lang-detection/lib/jackson-databind-2.10.0.jar (added) * other-projects/maori-lang-detection/lib/sf-2.0.2.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.2.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.3.jar (deleted) Adding all the jar files needed to work in Java with geojson Simple ... Tue, 10 Dec 2019 07:36:30 GMT ak19 [33787] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Documented another mongodb query that I'm using, the one to produce ... Mon, 09 Dec 2019 08:55:27 GMT ak19 [33778] * other-projects/maori-lang-detection/lib/gson-1.7.1.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.3.jar (added) * other-projects/maori-lang-detection/mongodb-data (added) * other-projects/maori-lang-detection/mongodb-data/countrycodes.json (added) * other-projects/maori-lang-detection/mongodb-data/countrycodes1.json (added) * other-projects/maori-lang-detection/mongodb-data/counts.json (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (added) Made a beginning on getting the geojson map data automated. Couldn't ... Mon, 25 Nov 2019 08:29:42 GMT ak19 [33722] * other-projects/maori-lang-detection/MoreReading/countrycodes.csv (added) * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Adding in additional instructions in mongodb.txt, before I forgot how ... Wed, 20 Nov 2019 10:23:29 GMT ak19 [33710] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Working queries and map coords for geojson.tools (ironically, Lat and ... Fri, 15 Nov 2019 10:14:48 GMT ak19 [33698] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Links to more reading Thu, 14 Nov 2019 11:22:34 GMT ak19 [33675] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Committing the newer query results (but from before today's ... Thu, 14 Nov 2019 11:21:31 GMT ak19 [33674] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) Changes to support the top 5 predicted langcodes and their confidence ... Wed, 13 Nov 2019 10:08:37 GMT ak19 [33666] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * other-projects/maori-lang-detection/crawledNode6.tar (modified) * other-projects/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * other-projects/maori-lang-detection/to_crawl.tar.gz (added) Having finished sending all the crawl data to mongodb 1. Recrawled ... Tue, 12 Nov 2019 08:33:57 GMT ak19 [33657] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Some fixes after brief testing against 1/3 of the crawl. Restarted ... Tue, 12 Nov 2019 08:11:05 GMT ak19 [33656] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Final minor changes before I start processing the crawls of node2. Tue, 12 Nov 2019 07:56:53 GMT ak19 [33655] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Minor change to print statement Tue, 12 Nov 2019 07:54:06 GMT ak19 [33654] * other-projects/maori-lang-detection/lib/logging-slf4j-1.5.8.jar (deleted) Removing jar file that wasn't used after all. Tue, 12 Nov 2019 07:51:48 GMT ak19 [33653] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/lib/classgraph-4.8.52.jar (added) * other-projects/maori-lang-detection/lib/core-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/logging-slf4j-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/slf4j-api-1.7.9.jar (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebpageInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (moved) 1. As suggested by Dr Bainbridge, made the code changes to use ... Tue, 12 Nov 2019 07:41:13 GMT ak19 [33652] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia (added) Introducing morphia subpackage Tue, 12 Nov 2019 05:11:39 GMT ak19 [33651] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (modified) 1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor Mon, 11 Nov 2019 05:46:24 GMT ak19 [33646] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Saving the mongodb queries and learning links that Dr Bainbridge ... Mon, 11 Nov 2019 05:45:29 GMT ak19 [33645] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences ... Sun, 10 Nov 2019 22:50:29 GMT ak19 [33644] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (added) Just committing the growing mongodb.txt file with links and ... Sun, 10 Nov 2019 22:46:48 GMT ak19 [33643] * other-projects/maori-lang-detection/conf/config.properties.in (moved) * other-projects/maori-lang-detection/conf/log4j.properties (deleted) * other-projects/maori-lang-detection/conf/log4j.properties.in (modified) Brought the template log4j.properties.in back up to speed. I forgot ... Sun, 10 Nov 2019 22:06:48 GMT ak19 [33642] * other-projects/maori-lang-detection/lib/mongo-java-driver-3.9.1.jar (added) Forgot to commit the java driver for mongodb when I committed the ... Sun, 10 Nov 2019 20:38:55 GMT ak19 [33635] * other-projects/maori-lang-detection (moved) Maori-language-detection doesn't use Greenstone 3 at present, it's ... Fri, 08 Nov 2019 10:59:07 GMT ak19 [33634] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/SentenceInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebsiteInfo.java (added) Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which ...