# # ChangeLog for gs3-extensions/maori-lang-detection/MoreReading # # Generated by Trac 1.4.2 # 2024-05-29T07:35:58+12:00 Mon, 19 Aug 2019 08:31:23 GMT ak19 [33428] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) Working commoncrawl cc-warc-examples' WET wordcount example using ... Fri, 16 Aug 2019 10:15:40 GMT ak19 [33425] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) A few more links now that I got past getting the vagrant VM with ... Thu, 15 Aug 2019 08:07:04 GMT ak19 [33423] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) Adding in the link to the vagrant VM with Hadoop, Spark for cluster ... Thu, 15 Aug 2019 05:52:19 GMT ak19 [33422] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) Some more links. Thu, 15 Aug 2019 04:20:03 GMT ak19 [33419] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) Last evening, I had found some links about how language-detection is ... Tue, 13 Aug 2019 09:57:58 GMT ak19 [33414] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) Adding important links Tue, 13 Aug 2019 03:59:29 GMT ak19 [33409] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) * gs3-extensions/maori-lang-detection/MoreReading/WebScraping.txt (added) * gs3-extensions/maori-lang-detection/MoreReading/macrons_with_emacs.txt (added) * gs3-extensions/maori-lang-detection/MoreReading/other.txt (modified) Forgot to commit 2 files with links and shuffling some links around ... Tue, 13 Aug 2019 03:09:28 GMT ak19 [33408] * gs3-extensions/maori-lang-detection/MoreReading/other.txt (modified) Some rough notes. Will move into appropriate file later. Mon, 12 Aug 2019 08:35:48 GMT ak19 [33404] * gs3-extensions/maori-lang-detection/MoreReading/other.txt (modified) 1. Links to other Java ways of extracting text from web content. 2. ... Fri, 09 Aug 2019 06:57:12 GMT ak19 [33393] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) * gs3-extensions/maori-lang-detection/bin/script/get_commoncrawl_nz_urls.sh (modified) Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls ... Wed, 07 Aug 2019 07:11:12 GMT ak19 [33391] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) Some rough bash scripting lines that work but aren't complete. Wed, 31 Jul 2019 06:39:24 GMT ak19 [33376] * gs3-extensions/maori-lang-detection/MoreReading (added) * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (added) * gs3-extensions/maori-lang-detection/MoreReading/Heritrix-and-WCT.txt (added) * gs3-extensions/maori-lang-detection/MoreReading/other.txt (added) Links and extracts I've read so far on the Web Curator Tool (WCT), ...