# # ChangeLog for other-projects/maori-lang-detection # # Generated by Trac 1.4.2 # 2024-05-18T05:38:20+12:00 Mon, 03 Feb 2020 10:29:59 GMT ak19 [33896] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Clarification in comments Mon, 03 Feb 2020 10:20:53 GMT ak19 [33895] * other-projects/maori-lang-detection/mongodb-data/5b_counts_containsMRI_groupedByNZorOverseasNoFilter.json (moved) Minor rename Mon, 03 Feb 2020 10:20:33 GMT ak19 [33894] * other-projects/maori-lang-detection/mongodb-data/5b_count_containsMRI_groupedByNZorOverseasNoFilter.json (added) * other-projects/maori-lang-detection/mongodb-data/5b_geojson-features_containsMRI_groupedByNZorOverseasNoFilter.json (added) * other-projects/maori-lang-detection/mongodb-data/5b_map_containsMRI_groupedByNZorOverseasNoFilter.png (added) * other-projects/maori-lang-detection/mongodb-data/5b_multipoint_containsMRI_groupedByNZorOverseasNoFilter.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_sitesWithPagesContainingMRI_manualShortlist.json (moved) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_sitesWithPagesContainingMRI_manualShortlist.json (moved) * other-projects/maori-lang-detection/mongodb-data/6map_sitesWithPagesContainingMRI_manualShortlist.png (moved) * other-projects/maori-lang-detection/mongodb-data/6multipoint_sitesWithPagesContainingMRI_manualShortlist.json (moved) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) 1. Adding map, counts.json and geo-json files for 5b count of sites ... Mon, 03 Feb 2020 09:41:47 GMT ak19 [33893] * other-projects/maori-lang-detection/mongodb-data/8TableOfNumDetectedVsManualSITESWithMRI.ods (modified) * other-projects/maori-lang-detection/mongodb-data/8table_siteCountSummary.png (modified) 1. Left out region code column. 2. Two more sheets of work in ... Mon, 03 Feb 2020 09:28:44 GMT ak19 [33892] * other-projects/maori-lang-detection/mongodb-data/8TableOfNumDetectedVsManualSITESWithMRI.ods (moved) Sheets renamed and spreadsheet renamed Mon, 03 Feb 2020 09:27:37 GMT ak19 [33891] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/8table_siteCountSummary.png (added) * other-projects/maori-lang-detection/mongodb-data/ManualShortlisting.txt (added) * other-projects/maori-lang-detection/mongodb-data/TableOfNumDetectedVsManualSITESWithMRI.ods (added) Site level detected vs manual inspected data: working shown in file ... Mon, 03 Feb 2020 07:31:33 GMT ak19 [33890] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) Finished going through NZ sites listing of numPagesContainingMRI > 0 ... Mon, 03 Feb 2020 02:48:40 GMT ak19 [33889] * other-projects/maori-lang-detection/mongodb-data/1a_table_miInUrlPath.csv (modified) * other-projects/maori-lang-detection/mongodb-data/1a_table_miInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/1b_table_noMiInUrlPath.csv (modified) * other-projects/maori-lang-detection/mongodb-data/1b_table_noMiInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/1table_allCrawledSites.csv (modified) * other-projects/maori-lang-detection/mongodb-data/1table_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/2table_sitesWithPagesInMRI.csv (modified) * other-projects/maori-lang-detection/mongodb-data/2table_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/3table_sitesWithPagesContainingMRI.csv (modified) * other-projects/maori-lang-detection/mongodb-data/3table_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/4table_tentativeNonProductSites.csv (modified) * other-projects/maori-lang-detection/mongodb-data/4table_tentativeNonProductSites.png (added) * other-projects/maori-lang-detection/mongodb-data/5b_table_containsMRI_groupedByNZorOverseasNoFilter.csv (added) * other-projects/maori-lang-detection/mongodb-data/5b_table_containsMRI_groupedByNZorOverseasNoFilter.png (added) * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.csv (modified) * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of ... Fri, 31 Jan 2020 10:49:11 GMT ak19 [33887] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) 1. Added support for writing out tables in csv format too. 2. Second ... Fri, 31 Jan 2020 10:17:47 GMT ak19 [33886] * other-projects/maori-lang-detection/mongodb-data/2table_sitesWithPagesInMRI.csv (moved) Minor. File rename Fri, 31 Jan 2020 09:54:15 GMT ak19 [33885] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Attempting to write the tables. csv not yet supported. Table 1 done. Fri, 31 Jan 2020 09:21:40 GMT ak19 [33884] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) 0. Previous commit had lots of modifications, and only 2 files ... Fri, 31 Jan 2020 08:50:34 GMT ak19 [33883] * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.csv (modified) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Clarifications Thu, 30 Jan 2020 09:54:39 GMT ak19 [33882] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Code now writes both a listing of all non-autotranslated websites and ... Thu, 30 Jan 2020 09:08:00 GMT ak19 [33881] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) Uses lambda expression to process each doc in a mongodb aggregate ... Thu, 30 Jan 2020 08:17:40 GMT ak19 [33880] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Write out the 5counts_tentativeNonAutotranslatedSites.json file with ... Thu, 30 Jan 2020 07:21:31 GMT ak19 [33879] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Have the 2 mongodb aggregate() calls working that Thu, 30 Jan 2020 07:18:09 GMT ak19 [33878] * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) Better comment Thu, 30 Jan 2020 07:07:59 GMT ak19 [33877] * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (modified) Reordering to have proper descending order of counts Wed, 29 Jan 2020 08:48:52 GMT ak19 [33876] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (modified) Some missteps, but have got complex collection.aggregate() working at ... Wed, 29 Jan 2020 06:18:29 GMT ak19 [33875] * other-projects/maori-lang-detection/mongodb-data/6b_geojson-features_manualShortlist_numPagesContainingMRI.json (moved) * other-projects/maori-lang-detection/mongodb-data/6b_multipoint_manualShortlist_numPagesContainingMRI.json (moved) Renaming 2 more files correctly Wed, 29 Jan 2020 06:15:29 GMT ak19 [33874] * other-projects/maori-lang-detection/mongodb-data/6a_geojson-features_manualShortlist_numPagesInMRI.json (moved) * other-projects/maori-lang-detection/mongodb-data/6a_multipoint_manualShortlist_numPagesInMRI.json (moved) Renaming 2 files correctly Fri, 24 Jan 2020 08:49:44 GMT ak19 [33873] * other-projects/maori-lang-detection/src/org/greenstone/atea/WebPageURLsListing.java (added) Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge ... Fri, 24 Jan 2020 08:44:04 GMT ak19 [33872] * other-projects/maori-lang-detection/mongodb-data/4counts_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (modified) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/random255_domainsNZ_IsMRI.txt (added) * other-projects/maori-lang-detection/mongodb-data/tables.txt (modified) 1. Added the file containing the 255 random NZ page URLs to sample. ... Fri, 24 Jan 2020 07:59:42 GMT ak19 [33871] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) Removed mostly duplicated older version of method but left the ... Fri, 24 Jan 2020 07:48:17 GMT ak19 [33870] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (modified) Got the mongodb query working in Java in 2 different ways: the fully ... Thu, 23 Jan 2020 09:59:46 GMT ak19 [33869] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/RandomURLsForDomainGenerator.java (added) First cut at the RandomURLsForDomainGenerator.java class and the ... Thu, 23 Jan 2020 08:16:44 GMT ak19 [33868] * other-projects/maori-lang-detection/mongodb-data/6a_counts_geojson-features_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_counts_multipoint_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_map_numPagesInMRI_fromManualInspectedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/6b_counts_geojson-features_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_counts_multipoint_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_map_numPagesContainingMRI_fromManualInspectedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) With the updated code for generating the maps from 6a and 6b manual ... Thu, 23 Jan 2020 08:12:17 GMT ak19 [33867] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Moved the code handling of special case large rectangles and those ... Wed, 22 Jan 2020 06:31:09 GMT ak19 [33858] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Fixes to the code committed yesterday: correct calculation of the ... Wed, 22 Jan 2020 03:33:31 GMT ak19 [33856] * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.pdf (added) * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.svg (modified) Forgot to commit. Last week, Dr Bainbridge had properly cropped the ... Tue, 21 Jan 2020 09:01:07 GMT ak19 [33854] * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (modified) Manually gone over around 150 webpages of sample size of 255 webpages ... Tue, 21 Jan 2020 08:58:29 GMT ak19 [33853] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Handling map coordinates that are horizontally excessive (beyond ... Fri, 17 Jan 2020 09:38:24 GMT ak19 [33851] * other-projects/maori-lang-detection/mongodb-data/6a_map_manuallyInspected_numPagesInMRI.png (deleted) * other-projects/maori-lang-detection/mongodb-data/6b_map_manuallyInspected_numPagesContainingMRI.png (deleted) Deleting faulty maps. NZ numPages inMRI and containingMRI count is ... Fri, 17 Jan 2020 09:38:00 GMT ak19 [33850] * other-projects/maori-lang-detection/mongodb-data/6a_map_manuallyInspected_numPagesInMRI.png (moved) * other-projects/maori-lang-detection/mongodb-data/6b_map_manuallyInspected_numPagesContainingMRI.png (moved) Renames before deleting faulty maps. NZ numPages inMRI and ... Fri, 17 Jan 2020 09:22:18 GMT ak19 [33849] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/journal-paper/writeup (modified) One less Australian site as it was an infographic containing Maori ... Fri, 17 Jan 2020 09:21:14 GMT ak19 [33848] * other-projects/maori-lang-detection/mongodb-data/1a_counts_miInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/1a_table_miInUrlPath.csv (added) * other-projects/maori-lang-detection/mongodb-data/1b_counts_noMiInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/1b_table_noMiInUrlPath.csv (added) * other-projects/maori-lang-detection/mongodb-data/1table_allCrawledSites.csv (added) * other-projects/maori-lang-detection/mongodb-data/2table__sitesWithPagesInMRI.csv (added) * other-projects/maori-lang-detection/mongodb-data/3table_sitesWithPagesContainingMRI.csv (added) * other-projects/maori-lang-detection/mongodb-data/4table_tentativeNonProductSites.csv (added) * other-projects/maori-lang-detection/mongodb-data/5table_tentativeNonProductSites1.csv (added) * other-projects/maori-lang-detection/mongodb-data/6a_counts_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_geojson-features_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6a_manuallyInspected_numPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/6a_multipoint_manualShortlist_numPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_counts_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_geojson-features_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6b_manuallyInspected_numPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/6b_multipoint_manualShortlist_numPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) * other-projects/maori-lang-detection/mongodb-data/6table_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/tables.txt (added) Tables of mongodb counts (1-5 table) and manual counts (6table). ... Fri, 17 Jan 2020 06:32:16 GMT ak19 [33847] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) indigenousblogs.com did have one page actually in Maori (an XML ... Fri, 17 Jan 2020 03:49:05 GMT ak19 [33846] * other-projects/maori-lang-detection/mongodb-data/1map_allCrawledSites.png (modified) * other-projects/maori-lang-detection/mongodb-data/2map_sitesWithPagesInMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/3map_sitesWithPagesContainingMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/4map_exclTentativeAutotranslatedSites.png (modified) * other-projects/maori-lang-detection/mongodb-data/5map_exclTentativeAutotranslatedSites1.png (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) Cropped out the json portion Fri, 17 Jan 2020 03:34:11 GMT ak19 [33845] * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) Cropped out the json portion Fri, 17 Jan 2020 03:33:24 GMT ak19 [33844] * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (modified) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (modified) * other-projects/maori-lang-detection/mongodb-data/7miInURLPath_exclNZ_byCountryCode.json (added) Regenerated Fri, 17 Jan 2020 03:24:28 GMT ak19 [33843] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Counting the 3 non-NZ sites that had mi in the URl path that manual ... Thu, 16 Jan 2020 09:30:09 GMT ak19 [33842] * other-projects/maori-lang-detection/journal-paper/writeup (modified) Jotted down some further paragraphs and notes of interest. ... Thu, 16 Jan 2020 08:23:09 GMT ak19 [33841] * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.svg (modified) Latest version of the flowchart of the process of getting Common ... Thu, 16 Jan 2020 08:22:15 GMT ak19 [33840] * other-projects/maori-lang-detection/journal-paper/CommonCrawl_flow.svg (added) Older flowchart of the process of getting Common Crawl data into ... Thu, 16 Jan 2020 08:18:43 GMT ak19 [33839] * other-projects/maori-lang-detection/journal-paper (added) * other-projects/maori-lang-detection/journal-paper/writeup (moved) Moving writeup text file into new folder so I can add the SVG ... Thu, 16 Jan 2020 04:56:50 GMT ak19 [33838] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Updated after checking non-NZ and non-nz TLD sites with mi in URL path Tue, 14 Jan 2020 09:09:43 GMT ak19 [33828] * other-projects/maori-lang-detection/writeup (modified) Additions and modifications to the write-up. Mon, 13 Jan 2020 08:47:33 GMT ak19 [33825] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * other-projects/maori-lang-detection/writeup (added) Beginnings of first draft of write up. Mon, 13 Jan 2020 07:14:59 GMT ak19 [33824] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) More instructions and explaining the contents of the mongodb-data folder. Mon, 13 Jan 2020 06:45:21 GMT ak19 [33823] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/mongodb-data (added) * other-projects/maori-lang-detection/mongodb-data/1a_counts_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1a_geojson-features_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1a_multipoint_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_counts_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_geojson-features_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1b_multipoint_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/1counts_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/1geojson-features_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/1map_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/1multipoint_allCrawledSites.json (added) * other-projects/maori-lang-detection/mongodb-data/2counts_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/2geojson-features_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/2map_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/2multipoint_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3counts_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3geojson-features_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/3map_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/3multipoint_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/4counts_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/4geojson-features_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/4map_exclTentativeAutotranslatedSites.png (added) * other-projects/maori-lang-detection/mongodb-data/4multipoint_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/5counts_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/5geojson-features_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/5map_exclTentativeAutotranslatedSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/5multipoint_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/6counts_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/6geojson-features_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/6map_exclAutotranslatedSites1_manualShortlist.png (added) * other-projects/maori-lang-detection/mongodb-data/6multipoint_nonProductSites1_manualShortlist.json (added) Recommitting mongo-data folder with renamed files with numbering. Mon, 13 Jan 2020 06:43:53 GMT ak19 [33822] * other-projects/maori-lang-detection/mongodb-data (deleted) Removing as I'm renaming all the files with prefixes. There are too ... Mon, 13 Jan 2020 06:26:06 GMT ak19 [33821] * other-projects/maori-lang-detection/mongodb-data/counts_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_nonProductSites1_manualShortlist.json (added) * other-projects/maori-lang-detection/mongodb-data/map_exclAutotranslatedSites1_manualShortlist.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_nonProductSites1_manualShortlist.json (added) Manually created a shortlist of MRI sites from longer ... Mon, 13 Jan 2020 06:25:12 GMT ak19 [33820] * other-projects/maori-lang-detection/mongodb-data/map_allCrawledSites.png (added) * other-projects/maori-lang-detection/mongodb-data/map_exclTentativeAutotranslatedSites.png (added) Forgot to commit before holidays. Thu, 19 Dec 2019 09:33:08 GMT ak19 [33816] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Finished manually going through the sites that I couldn't easily ... Thu, 19 Dec 2019 04:17:16 GMT ak19 [33815] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Removed old results from before bugfix and improvement to ... Thu, 19 Dec 2019 04:13:26 GMT ak19 [33814] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Put the important mongodb queries and results into hdfs-cc- ... Wed, 18 Dec 2019 08:38:44 GMT ak19 [33813] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/counts_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/counts_miInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_noMiInUrlPath.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath.png (deleted) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath_UnknownSelected.png (deleted) * other-projects/maori-lang-detection/mongodb-data/histo_worldmap_miInUrlPath.png (deleted) * other-projects/maori-lang-detection/mongodb-data/map_exclTentativeAutotranslatedSites1.png (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesContainingMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesInMRI.png (modified) * other-projects/maori-lang-detection/mongodb-data/map_tentativeNonProductSites.png (deleted) * other-projects/maori-lang-detection/mongodb-data/multipoint_allCrawledSites.json (moved) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesContainingMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesInMRI.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites1.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (deleted) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (deleted) With the bugfix from yesterday and the inclusion of http(s)://mi.* ... Wed, 18 Dec 2019 08:36:07 GMT ak19 [33812] * other-projects/maori-lang-detection/conf/countrycodes.json (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Better handling of multi-line comment symbols, so I can now include ... Wed, 18 Dec 2019 03:51:34 GMT ak19 [33811] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Returning to using a single variable, urlContainsLangCodeInPath, to ... Tue, 17 Dec 2019 08:48:08 GMT ak19 [33810] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Bugfix: mi in url path should be checked for for each page of site, ... Tue, 17 Dec 2019 06:53:17 GMT ak19 [33809] * other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Some more GS_README.txt instructions. Not put the mongodb queries in ... Tue, 17 Dec 2019 06:31:28 GMT ak19 [33808] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Storing not just whether /mi(/) suffix is in path, but also whether ... Tue, 17 Dec 2019 06:29:58 GMT ak19 [33807] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Trying to manually go through a shortlisted set of domains to see if ... Fri, 13 Dec 2019 08:31:11 GMT ak19 [33806] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/mongodb-data/counts_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_tentativeNonProductSites.json (added) * other-projects/maori-lang-detection/mongodb-data/map_tentativeNonProductSites.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_tentativeNonProductSites.json (added) More mongodb querying revealed that excluding tentative product sites ... Fri, 13 Dec 2019 07:08:14 GMT ak19 [33805] * other-projects/maori-lang-detection/conf/countrycodes.json (moved) * other-projects/maori-lang-detection/mongodb-data/countrycodes1.json (deleted) * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesInMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesInMRI.json (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) 1. Moving the static countrycodes.json file to conf folder and ... Fri, 13 Dec 2019 07:00:53 GMT ak19 [33804] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) 1. Updated results from mongodb querying after yesterday's ... Fri, 13 Dec 2019 06:27:52 GMT ak19 [33803] * other-projects/maori-lang-detection/mongodb-data/counts_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_sitesWithPagesContainingMRI.json (added) * other-projects/maori-lang-detection/mongodb-data/map_sitesWithPagesContainingMRI.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_sitesWithPagesContainingMRI.json (added) geojson mapdata and map for mongodb results on ... Fri, 13 Dec 2019 05:42:05 GMT ak19 [33802] * other-projects/maori-lang-detection/mongodb-data/counts.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (modified) * other-projects/maori-lang-detection/mongodb-data/multipoint.json (modified) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (modified) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (modified) With an extra adult site removed and with setting countrycodes that ... Fri, 13 Dec 2019 05:40:46 GMT ak19 [33801] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) 1. NutchTextDumpToMongoDB Added an extra field to each document in ... Thu, 12 Dec 2019 05:04:10 GMT ak19 [33800] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * other-projects/maori-lang-detection/crawledNode2.tar (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Removed an adult site from crawled contents and added its url to ... Thu, 12 Dec 2019 03:08:08 GMT ak19 [33799] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) 1. Adding breadcrumb for next step at end of running ... Thu, 12 Dec 2019 02:57:56 GMT ak19 [33798] * other-projects/maori-lang-detection/mongodb-data/counts_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features_noMiInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/histo_noMiInUrlPath_UnknownSelected.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_noMiInUrlPath.json (added) Adding the geojson related files related to querying mongodb for ... Thu, 12 Dec 2019 02:42:47 GMT ak19 [33797] * other-projects/maori-lang-detection/mongodb-data/counts_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (modified) * other-projects/maori-lang-detection/mongodb-data/geojson-features_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/histo_worldmap_miInUrlPath.png (added) * other-projects/maori-lang-detection/mongodb-data/multipoint_miInUrlPath.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_UnknownSelected.png (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (modified) Updated json and imaegs files, and new files for when /mi(/) is in ... Thu, 12 Dec 2019 02:42:19 GMT ak19 [33796] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Instead of a hack for US' count being too great that its histogram ... Wed, 11 Dec 2019 08:57:02 GMT ak19 [33794] * other-projects/maori-lang-detection/mongodb-data/geojson-features.json (added) * other-projects/maori-lang-detection/mongodb-data/multipoint.json (added) * other-projects/maori-lang-detection/mongodb-data/siteOriginCountryCounts_worldmap.png (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Wrote the geojson map data created from the site counts per ... Tue, 10 Dec 2019 07:43:53 GMT ak19 [33790] * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (modified) Got the MultiPoint geojson mapdata of the country code counts ... Tue, 10 Dec 2019 07:39:46 GMT ak19 [33789] * other-projects/maori-lang-detection/mongodb-data/counts.json (modified) Redid the mongodb query to get the countrycode counts for all the ... Tue, 10 Dec 2019 07:39:06 GMT ak19 [33788] * other-projects/maori-lang-detection/lib/jackson-annotations-2.10.0.jar (added) * other-projects/maori-lang-detection/lib/jackson-core-2.10.0.jar (added) * other-projects/maori-lang-detection/lib/jackson-databind-2.10.0.jar (added) * other-projects/maori-lang-detection/lib/sf-2.0.2.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.2.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.3.jar (deleted) Adding all the jar files needed to work in Java with geojson Simple ... Tue, 10 Dec 2019 07:36:30 GMT ak19 [33787] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Documented another mongodb query that I'm using, the one to produce ... Mon, 09 Dec 2019 08:55:27 GMT ak19 [33778] * other-projects/maori-lang-detection/lib/gson-1.7.1.jar (added) * other-projects/maori-lang-detection/lib/sf-geojson-2.0.3.jar (added) * other-projects/maori-lang-detection/mongodb-data (added) * other-projects/maori-lang-detection/mongodb-data/countrycodes.json (added) * other-projects/maori-lang-detection/mongodb-data/countrycodes1.json (added) * other-projects/maori-lang-detection/mongodb-data/counts.json (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/CountryCodeCountsMapData.java (added) Made a beginning on getting the geojson map data automated. Couldn't ... Mon, 25 Nov 2019 08:29:42 GMT ak19 [33722] * other-projects/maori-lang-detection/MoreReading/countrycodes.csv (added) * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Adding in additional instructions in mongodb.txt, before I forgot how ... Wed, 20 Nov 2019 10:23:29 GMT ak19 [33710] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Working queries and map coords for geojson.tools (ironically, Lat and ... Fri, 15 Nov 2019 10:14:48 GMT ak19 [33698] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (modified) Links to more reading Thu, 14 Nov 2019 11:22:34 GMT ak19 [33675] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Committing the newer query results (but from before today's ... Thu, 14 Nov 2019 11:21:31 GMT ak19 [33674] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/LanguageInfo.java (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (modified) Changes to support the top 5 predicted langcodes and their confidence ... Wed, 13 Nov 2019 10:08:37 GMT ak19 [33666] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * other-projects/maori-lang-detection/crawledNode6.tar (modified) * other-projects/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * other-projects/maori-lang-detection/to_crawl.tar.gz (added) Having finished sending all the crawl data to mongodb 1. Recrawled ... Tue, 12 Nov 2019 08:33:57 GMT ak19 [33657] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Some fixes after brief testing against 1/3 of the crawl. Restarted ... Tue, 12 Nov 2019 08:11:05 GMT ak19 [33656] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Final minor changes before I start processing the crawls of node2. Tue, 12 Nov 2019 07:56:53 GMT ak19 [33655] * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) Minor change to print statement Tue, 12 Nov 2019 07:54:06 GMT ak19 [33654] * other-projects/maori-lang-detection/lib/logging-slf4j-1.5.8.jar (deleted) Removing jar file that wasn't used after all. Tue, 12 Nov 2019 07:51:48 GMT ak19 [33653] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) * other-projects/maori-lang-detection/lib/classgraph-4.8.52.jar (added) * other-projects/maori-lang-detection/lib/core-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/logging-slf4j-1.5.8.jar (added) * other-projects/maori-lang-detection/lib/slf4j-api-1.7.9.jar (added) * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/SentenceInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebpageInfo.java (moved) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java (moved) 1. As suggested by Dr Bainbridge, made the code changes to use ... Tue, 12 Nov 2019 07:41:13 GMT ak19 [33652] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/morphia (added) Introducing morphia subpackage Tue, 12 Nov 2019 05:11:39 GMT ak19 [33651] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * other-projects/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (modified) 1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor Mon, 11 Nov 2019 05:46:24 GMT ak19 [33646] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (modified) Saving the mongodb queries and learning links that Dr Bainbridge ... Mon, 11 Nov 2019 05:45:29 GMT ak19 [33645] * other-projects/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences ... Sun, 10 Nov 2019 22:50:29 GMT ak19 [33644] * other-projects/maori-lang-detection/MoreReading/mongodb.txt (added) Just committing the growing mongodb.txt file with links and ... Sun, 10 Nov 2019 22:46:48 GMT ak19 [33643] * other-projects/maori-lang-detection/conf/config.properties.in (moved) * other-projects/maori-lang-detection/conf/log4j.properties (deleted) * other-projects/maori-lang-detection/conf/log4j.properties.in (modified) Brought the template log4j.properties.in back up to speed. I forgot ... Sun, 10 Nov 2019 22:06:48 GMT ak19 [33642] * other-projects/maori-lang-detection/lib/mongo-java-driver-3.9.1.jar (added) Forgot to commit the java driver for mongodb when I committed the ... Sun, 10 Nov 2019 20:38:55 GMT ak19 [33635] * other-projects/maori-lang-detection (moved) Maori-language-detection doesn't use Greenstone 3 at present, it's ... Fri, 08 Nov 2019 10:59:07 GMT ak19 [33634] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/SentenceInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebsiteInfo.java (added) Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which ...