# # ChangeLog for other-projects # # Generated by Trac 1.4.2 # 2024-04-26T23:14:44+12:00 Fri, 21 Feb 2020 08:00:55 GMT ak19 [33966] * other-projects/maori-lang-detection/mongodb-data/random260.ods (added) * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) * other-projects/maori-lang-detection/mongodb-data/random260_results.txt (added) Added the origSequence and basicDomain columns to the random 260 web ... Fri, 21 Feb 2020 07:59:07 GMT ak19 [33965] * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) 1. Adding a basicDomain column (stripped of http/https and www ... Fri, 21 Feb 2020 06:57:38 GMT ak19 [33964] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) 2 records were missing a value for the qualityLevel column. Thu, 20 Feb 2020 09:12:43 GMT ak19 [33963] * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBQueryer.java (modified) Added a new helper method to MongoDBQueryer.java to add numPagesInMRI ... Thu, 20 Feb 2020 09:07:20 GMT ak19 [33962] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) 2 fields changed, as one was missed out and the other incorrectly ... Thu, 20 Feb 2020 07:24:19 GMT ak19 [33961] * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) New category, LINK_TEXT, introduced for the random web page URL samples. Thu, 20 Feb 2020 07:22:38 GMT ak19 [33960] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Reviewed all the random sample web page URLs marked ... Thu, 20 Feb 2020 07:06:41 GMT ak19 [33959] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) URIEncoding the mapData makes it unparseable by geojson.io Tue, 18 Feb 2020 10:35:35 GMT ak19 [33952] * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) Minor changes for processing Tue, 18 Feb 2020 10:33:29 GMT ak19 [33951] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Reviewed the qualityLevel column where LITTLE_TEXT was assigned. Tue, 18 Feb 2020 10:28:55 GMT ak19 [33950] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Reviewed the qualityLevel column where MIXED_TEXT was assigned. Tue, 18 Feb 2020 10:22:53 GMT ak19 [33949] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Reviewed the qualityLevel column where NAV was assigned. Tue, 18 Feb 2020 09:56:44 GMT ak19 [33948] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) Reviewed the random sampled web page URLs marked as ... Tue, 18 Feb 2020 09:07:33 GMT ak19 [33947] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Some more questionmarked field values assigned. Tue, 18 Feb 2020 08:58:42 GMT ak19 [33946] * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) 1. New function to handle user input assigning the newly introduced ... Tue, 18 Feb 2020 08:48:14 GMT ak19 [33945] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Added a 4th column for all 260 sample web page URLs and have used the ... Tue, 18 Feb 2020 03:44:21 GMT ak19 [33944] * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) Added the isReallyInMRI column after manually inspecting the ... Tue, 18 Feb 2020 02:18:00 GMT ak19 [33941] * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (modified) 1. Uppercase 3rd field (Y/N/? field) read back in from file before ... Mon, 17 Feb 2020 09:16:40 GMT ak19 [33940] * other-projects/maori-lang-detection/lib/commons-csv-1.7.jar (deleted) * other-projects/maori-lang-detection/lib/commons-csv-1.8.jar (added) * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/ManualURLInspection.java (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBQueryer.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) 1. In order to make it easier to do the manual work of inspecting 260 ... Mon, 17 Feb 2020 03:22:08 GMT ak19 [33939] * other-projects/maori-lang-detection/mongodb-data/isMRI_full_manualList_globalDomains_whereAPageContainsMRI.txt (added) * other-projects/maori-lang-detection/mongodb-data/random255_domainsNZ_IsMRI.txt (deleted) * other-projects/maori-lang-detection/mongodb-data/random260_manualList_globalDomains_whereAPageContainsMRI.txt (added) 1. Old random samples file doesn't apply as we're not sampling by ... Mon, 17 Feb 2020 03:10:00 GMT ak19 [33938] * other-projects/maori-lang-detection/conf/log4j.properties.in (modified) * other-projects/maori-lang-detection/lib/gutil.jar (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) 1. Don't regenerate random sample of web page urls and full web page ... Mon, 17 Feb 2020 03:06:40 GMT ak19 [33937] * other-projects/maori-lang-detection/mongodb-data/6counts_sitesWithPagesContainingMRI_manualShortlist.json (added) New counts of manual sites after reingesting into MongoDB. Forgot to ... Mon, 17 Feb 2020 03:05:55 GMT ak19 [33936] * other-projects/maori-lang-detection/mongodb-data/6counts_sitesWithPagesContainingMRI_manualShortlist.jsonOLD (moved) * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2_afterMongoDBReingest.txt (modified) Renaming old file to place with new counts after reingesting into ... Fri, 14 Feb 2020 10:03:21 GMT ak19 [33926] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) Investigated some other options for screen capturing and Google ... Fri, 14 Feb 2020 07:41:20 GMT ak19 [33925] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) 1. Bugfix: oversight, should return uri encoded URL for mapData, ... Fri, 14 Feb 2020 06:22:40 GMT ak19 [33924] * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) Adding in Dr Bainbridge's command to check the JSON generated is ... Thu, 13 Feb 2020 09:40:41 GMT ak19 [33919] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/lib/jna-platform.jar (added) * other-projects/maori-lang-detection/lib/jna.jar (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBQueryer.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) SummaryTool now uses the CountryCodeCountsMapData.java class to ... Thu, 13 Feb 2020 06:34:14 GMT ak19 [33918] * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2_afterMongoDBReingest.txt (modified) * other-projects/maori-lang-detection/mongodb-data/manualList_globalDomains_whereAPageContainsMRI.txt (modified) Country codes added to each domain's URL of the manual site/domain ... Thu, 13 Feb 2020 05:18:13 GMT ak19 [33917] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBQueryer.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (modified) Added some better reporting when confirming sample size was correct Thu, 13 Feb 2020 04:42:11 GMT ak19 [33916] * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2_afterMongoDBReingest.txt (modified) Updated the rest of the file after reingest Thu, 13 Feb 2020 04:12:06 GMT ak19 [33915] * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2_afterMongoDBReingest.txt (moved) Forgot to add a (manual) counts file created last week, and am now ... Thu, 13 Feb 2020 04:09:07 GMT ak19 [33914] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting.txt (modified) * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2.txt (modified) * other-projects/maori-lang-detection/mongodb-data/manualList_globalDomains_whereAPageContainsMRI.txt (added) Shortlisted just the domain sites by country into ... Wed, 12 Feb 2020 08:27:02 GMT ak19 [33913] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBQueryer.java (modified) 1. Adjusted table mongodb query statements to be more exact, but same ... Wed, 12 Feb 2020 06:53:48 GMT ak19 [33912] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBQueryer.java (added) Forgot to svn add the new MongoDBQueryer.java class with commit ... Wed, 12 Feb 2020 06:12:42 GMT ak19 [33911] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/SummaryTool.java (moved) Correct commit message for previous and current commit: 1. After ... Wed, 12 Feb 2020 06:05:50 GMT ak19 [33910] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the ... Wed, 12 Feb 2020 06:02:44 GMT ak19 [33909] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the ... Wed, 05 Feb 2020 10:38:57 GMT ak19 [33907] * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting2.txt (added) See previous commit message. This will be the file with the results ... Wed, 05 Feb 2020 10:36:37 GMT ak19 [33906] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Code is intermediate state. 1. Introduced basicDomain field to ... Wed, 05 Feb 2020 05:49:16 GMT ak19 [33905] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) More notes Wed, 05 Feb 2020 05:48:33 GMT ak19 [33904] * other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * other-projects/maori-lang-detection/conf/url-greylist-filter.txt (modified) * other-projects/maori-lang-detection/crawledNode6.tar (modified) * other-projects/maori-lang-detection/to_crawl.tar.gz (modified) Shouldn't greylist anglican.org, as this prevented crawling of ... Tue, 04 Feb 2020 02:50:43 GMT ak19 [33903] * other-projects/maori-lang-detection/journal-paper/MRI_slideNotes.txt (added) My notes when preparing for today's meetings. Some of this may be ... Mon, 03 Feb 2020 10:29:59 GMT ak19 [33896] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Clarification in comments Mon, 03 Feb 2020 10:20:53 GMT ak19 [33895] * other-projects/maori-lang-detection/mongodb-data/5b_counts_containsMRI_groupedByNZorOverseasNoFilter.json (moved) Minor rename Mon, 03 Feb 2020 10:20:33 GMT ak19 [33894] * other-projects/maori-lang-detection/mongodb-data/5b_count_containsMRI_groupedByNZorOverseasNoFilter.json (added) * other-projects/maori-lang-detection/mongodb-data/5b_geojson-features_containsMRI_groupedByNZorOverseasNoFilter.json (added) * other-projects/maori-lang-detection/mongodb-data/5b_map_containsMRI_groupedByNZorOverseasNoFilter.png (added) * other-projects/maori-lang-detection/mongodb-data/5b_multipoint_containsMRI_groupedByNZorOverseasNoFilter.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_sitesWithPagesContainingMRI_manualShortlist.json (moved) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_sitesWithPagesContainingMRI_manualShortlist.json (moved) * other-projects/maori-lang-detection/mongodb-data/6map_sitesWithPagesContainingMRI_manualShortlist.png (moved) * other-projects/maori-lang-detection/mongodb-data/6multipoint_sitesWithPagesContainingMRI_manualShortlist.json (moved) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) 1. Adding map, counts.json and geo-json files for 5b count of sites ... Mon, 03 Feb 2020 09:41:47 GMT ak19 [33893] * other-projects/maori-lang-detection/mongodb-data/8TableOfNumDetectedVsManualSITESWithMRI.ods (modified) * other-projects/maori-lang-detection/mongodb-data/8table_siteCountSummary.png (modified) 1. Left out region code column. 2. Two more sheets of work in ... Mon, 03 Feb 2020 09:28:44 GMT ak19 [33892] * other-projects/maori-lang-detection/mongodb-data/8TableOfNumDetectedVsManualSITESWithMRI.ods (moved) Sheets renamed and spreadsheet renamed Mon, 03 Feb 2020 09:27:37 GMT ak19 [33891] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/8table_siteCountSummary.png (added) * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting.txt (added) * other-projects/maori-lang-detection/mongodb-data/TableOfNumDetectedVsManualSITESWithMRI.ods (added) Site level detected vs manual inspected data: working shown in file ... Mon, 03 Feb 2020 07:31:33 GMT ak19 [33890] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) Finished going through NZ sites listing of numPagesContainingMRI > 0 ... Mon, 03 Feb 2020 02:48:40 GMT ak19 [33889] * other-projects/maori-lang-detection/mongodb-data/1a_table_miInUrlPath.csv (modified) * other-projects/maori-lang-detection/mongodb-data/1a_table_miInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/1b_table_noMiInUrlPath.csv (modified) * other-projects/maori-lang-detection/mongodb-data/1b_table_noMiInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/1table_allCrawledSites.csv (modified) * other-projects/maori-lang-detection/mongodb-data/1table_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/2table_sitesWithPagesInMRI.csv (modified) * other-projects/maori-lang-detection/mongodb-data/2table_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/3table_sitesWithPagesContainingMRI.csv (modified) * other-projects/maori-lang-detection/mongodb-data/3table_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/4table_tentativeNonProductSites.csv (modified) * other-projects/maori-lang-detection/mongodb-data/4table_tentativeNonProductSites.png (added) * other-projects/maori-lang-detection/mongodb-data/5b_table_containsMRI_groupedByNZorOverseasNoFilter.csv (added) * other-projects/maori-lang-detection/mongodb-data/5b_table_containsMRI_groupedByNZorOverseasNoFilter.png (added) * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.csv (modified) * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of ... Fri, 31 Jan 2020 10:49:11 GMT ak19 [33887] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) 1. Added support for writing out tables in csv format too. 2. Second ... Fri, 31 Jan 2020 10:17:47 GMT ak19 [33886] * other-projects/maori-lang-detection/mongodb-data/2table_sitesWithPagesInMRI.csv (moved) Minor. File rename Fri, 31 Jan 2020 09:54:15 GMT ak19 [33885] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Attempting to write the tables. csv not yet supported. Table 1 done. Fri, 31 Jan 2020 09:21:40 GMT ak19 [33884] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) 0. Previous commit had lots of modifications, and only 2 files ... Fri, 31 Jan 2020 08:50:34 GMT ak19 [33883] * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.csv (modified) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Clarifications Thu, 30 Jan 2020 09:54:39 GMT ak19 [33882] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Code now writes both a listing of all non-autotranslated websites and ... Thu, 30 Jan 2020 09:08:00 GMT ak19 [33881] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) Uses lambda expression to process each doc in a mongodb aggregate ... Thu, 30 Jan 2020 08:17:40 GMT ak19 [33880] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Write out the 5counts_tentativeNonAutotranslatedSites.json file with ... Thu, 30 Jan 2020 07:21:31 GMT ak19 [33879] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Have the 2 mongodb aggregate() calls working that Thu, 30 Jan 2020 07:18:09 GMT ak19 [33878] * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) Better comment Thu, 30 Jan 2020 07:07:59 GMT ak19 [33877] * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (modified) Reordering to have proper descending order of counts Wed, 29 Jan 2020 08:48:52 GMT ak19 [33876] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Some missteps, but have got complex collection.aggregate() working at ... Wed, 29 Jan 2020 06:18:29 GMT ak19 [33875] * other-projects/maori-lang-detection/mongodb-data/6b_geojson-features_manualShortlist_numPagesContainingMRI.json (moved) * other-projects/maori-lang-detection/mongodb-data/6b_multipoint_manualShortlist_numPagesContainingMRI.json (moved) Renaming 2 more files correctly Wed, 29 Jan 2020 06:15:29 GMT ak19 [33874] * other-projects/maori-lang-detection/mongodb-data/6a_geojson-features_manualShortlist_numPagesInMRI.json (moved) * other-projects/maori-lang-detection/mongodb-data/6a_multipoint_manualShortlist_numPagesInMRI.json (moved) Renaming 2 files correctly Fri, 24 Jan 2020 08:49:44 GMT ak19 [33873] * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (added) Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge ... Fri, 24 Jan 2020 08:44:04 GMT ak19 [33872] * other-projects/maori-lang-detection/mongodb-data/4counts_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (modified) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/random255_domainsNZ_IsMRI.txt (added) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) 1. Added the file containing the 255 random NZ page URLs to sample. ... Fri, 24 Jan 2020 07:59:42 GMT ak19 [33871] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) Removed mostly duplicated older version of method but left the ... Fri, 24 Jan 2020 07:48:17 GMT ak19 [33870] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) Got the mongodb query working in Java in 2 different ways: the fully ... Thu, 23 Jan 2020 09:59:46 GMT ak19 [33869] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (added) First cut at the RandomURLsForDomainGenerator.java class and the ... Thu, 23 Jan 2020 08:16:44 GMT ak19 [33868] * other-projects/maori-lang-detection/mongodb-data/6a_counts_geojson-features_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_counts_multipoint_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_map_numPagesInMRI_fromManualInspectedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/6b_counts_geojson-features_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_counts_multipoint_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_map_numPagesContainingMRI_fromManualInspectedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) With the updated code for generating the maps from 6a and 6b manual ... Thu, 23 Jan 2020 08:12:17 GMT ak19 [33867] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Moved the code handling of special case large rectangles and those ... Thu, 23 Jan 2020 05:56:36 GMT ak19 [33866] * other-projects/the-macronizer/trunk/web/jsp/en/main.jsp (modified) * other-projects/the-macronizer/trunk/web/jsp/mi/main.jsp (modified) Dr Bainbridge's fix to Android mobile macronizer user (on Chrome ... Thu, 23 Jan 2020 05:49:56 GMT ak19 [33865] * other-projects/the-macronizer/trunk/build.xml (modified) * other-projects/the-macronizer/trunk/web/macronizer.xml.in (modified) 1. The gs3 context name changed from macronizer to macron- ... Wed, 22 Jan 2020 06:31:09 GMT ak19 [33858] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Fixes to the code committed yesterday: correct calculation of the ... Wed, 22 Jan 2020 03:33:31 GMT ak19 [33856] * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.pdf (added) * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.svg (modified) Forgot to commit. Last week, Dr Bainbridge had properly cropped the ... Tue, 21 Jan 2020 09:01:07 GMT ak19 [33854] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) Manually gone over around 150 webpages of sample size of 255 webpages ... Tue, 21 Jan 2020 08:58:29 GMT ak19 [33853] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Handling map coordinates that are horizontally excessive (beyond ... Fri, 17 Jan 2020 09:38:24 GMT ak19 [33851] * other-projects/maori-lang-detection/mongodb-data/6a_map_manuallyInspected_numPagesInMRI.png (deleted) * other-projects/maori-lang-detection/mongodb-data/6b_map_manuallyInspected_numPagesContainingMRI.png (deleted) Deleting faulty maps. NZ numPages inMRI and containingMRI count is ... Fri, 17 Jan 2020 09:38:00 GMT ak19 [33850] * other-projects/maori-lang-detection/mongodb-data/6a_map_manuallyInspected_numPagesInMRI.png (moved) * other-projects/maori-lang-detection/mongodb-data/6b_map_manuallyInspected_numPagesContainingMRI.png (moved) Renames before deleting faulty maps. NZ numPages inMRI and ... Fri, 17 Jan 2020 09:22:18 GMT ak19 [33849] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/journal-paper/writeup (modified) One less Australian site as it was an infographic containing Maori ... Fri, 17 Jan 2020 09:21:14 GMT ak19 [33848] * other-projects/maori-lang-detection/mongodb-data/1a_counts_miInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/1a_table_miInUrlPath.csv (added) * other-projects/maori-lang-detection/mongodb-data/1b_counts_noMiInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/1b_table_noMiInUrlPath.csv (added) * other-projects/maori-lang-detection/mongodb-data/1table_allCrawledSites.csv (added) * other-projects/maori-lang-detection/mongodb-data/2table__sitesWithPagesInMRI.csv (added) * other-projects/maori-lang-detection/mongodb-data/3table_sitesWithPagesContainingMRI.csv (added) * other-projects/maori-lang-detection/mongodb-data/4table_tentativeNonProductSites.csv (added) * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.csv (added) * other-projects/maori-lang-detection/mongodb-data/6a_counts_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_geojson-features_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_manuallyInspected_numPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/6a_multipoint_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_counts_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_geojson-features_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_manuallyInspected_numPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/6b_multipoint_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/tables.txt (added) Tables of mongodb counts (1-5 table) and manual counts (6table). ... Fri, 17 Jan 2020 06:32:16 GMT ak19 [33847] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) indigenousblogs.com did have one page actually in Maori (an XML ... Fri, 17 Jan 2020 03:49:05 GMT ak19 [33846] * other-projects/maori-lang-detection/mongodb-data/1map_allCrawledSites.png (modified) * other-projects/maori-lang-detection/mongodb-data/2map_sitesWithPagesInMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/3map_sitesWithPagesContainingMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/4map_exclTentativeAutotranslatedSites.png (modified) * other-projects/maori-lang-detection/mongodb-data/5map_exclTentativeAutotranslatedSites1.png (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) Cropped out the json portion Fri, 17 Jan 2020 03:34:11 GMT ak19 [33845] * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) Cropped out the json portion Fri, 17 Jan 2020 03:33:24 GMT ak19 [33844] * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) * other-projects/maori-lang-detection/mongodb-data/7miInURLPath_exclNZ_byCountryCode.json (added) Regenerated Fri, 17 Jan 2020 03:24:28 GMT ak19 [33843] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Counting the 3 non-NZ sites that had mi in the URl path that manual ... Thu, 16 Jan 2020 09:30:09 GMT ak19 [33842] * other-projects/maori-lang-detection/journal-paper/writeup (modified) Jotted down some further paragraphs and notes of interest. ... Thu, 16 Jan 2020 08:23:09 GMT ak19 [33841] * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.svg (modified) Latest version of the flowchart of the process of getting Common ... Thu, 16 Jan 2020 08:22:15 GMT ak19 [33840] * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.svg (added) Older flowchart of the process of getting Common Crawl data into ... Thu, 16 Jan 2020 08:18:43 GMT ak19 [33839] * other-projects/maori-lang-detection/journal-paper (added) * other-projects/maori-lang-detection/journal-paper/writeup (moved) Moving writeup text file into new folder so I can add the SVG ... Thu, 16 Jan 2020 04:56:50 GMT ak19 [33838] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Updated after checking non-NZ and non-nz TLD sites with mi in URL path Tue, 14 Jan 2020 09:09:43 GMT ak19 [33828] * other-projects/maori-lang-detection/writeup (modified) Additions and modifications to the write-up. Mon, 13 Jan 2020 08:47:33 GMT ak19 [33825] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * other-projects/maori-lang-detection/writeup (added) Beginnings of first draft of write up. Mon, 13 Jan 2020 07:14:59 GMT ak19 [33824] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) More instructions and explaining the contents of the mongodb-data folder. Mon, 13 Jan 2020 06:45:21 GMT ak19 [33823] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/mongodb-data (added) * other-projects/maori-lang-detection/mongodb-data/1a_counts_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1a_geojson-features_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1a_multipoint_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_counts_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_geojson-features_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_multipoint_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1counts_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/1geojson-features_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/1map_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/1multipoint_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/2counts_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/2geojson-features_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/2map_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/2multipoint_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3counts_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3geojson-features_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3map_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/3multipoint_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/4counts_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/4geojson-features_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/4map_exclTentativeAutotranslatedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/4multipoint_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/5geojson-features_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/5map_exclTentativeAutotranslatedSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/5multipoint_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (added) * other-projects/maori-lang-detection/mongodb-data/6multipoint_nonProductSites1_manualShortlist.json (added) Recommitting mongo-data folder with renamed files with numbering. Mon, 13 Jan 2020 06:43:53 GMT ak19 [33822] * other-projects/maori-lang-detection/mongodb-data (deleted) Removing as I'm renaming all the files with prefixes. There are too ... Mon, 13 Jan 2020 06:26:06 GMT ak19 [33821] * other-projects/maori-lang-detection/mongodb-data/counts_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/map_exclAutotranslatedSites1_manualShortlist.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_nonProductSites1_manualShortlist.json (added) Manually created a shortlist of MRI sites from longer ... Mon, 13 Jan 2020 06:25:12 GMT ak19 [33820] * other-projects/maori-lang-detection/mongodb-data/map_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/map_exclTentativeAutotranslatedSites.png (added) Forgot to commit before holidays. Thu, 19 Dec 2019 09:33:08 GMT ak19 [33816] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Finished manually going through the sites that I couldn't easily ... Thu, 19 Dec 2019 04:17:16 GMT ak19 [33815] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Removed old results from before bugfix and improvement to ...