# # ChangeLog for gs3-extensions/maori-lang-detection/conf # # Generated by Trac 1.4.2 # 2024-05-28T14:29:55+12:00 Mon, 14 Oct 2019 08:04:58 GMT ak19 [33565] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) CCWETProcessor: domain url now goes in as a seedURL after the ... Fri, 11 Oct 2019 08:52:40 GMT ak19 [33562] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/lib/LICENSE.txt (added) * gs3-extensions/maori-lang-detection/lib/NOTICE.txt (added) * gs3-extensions/maori-lang-detection/lib/commons-csv-1.7.jar (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. The sites-too-big-to-exhaustively-crawl.txt is now a csv file of a ... Fri, 11 Oct 2019 07:49:05 GMT ak19 [33561] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. sites-too-big-to-exhaustively-crawl.txt is now a comma separated ... Thu, 10 Oct 2019 10:44:31 GMT ak19 [33559] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt (modified) 1. Special string COPY changed to SUBDOMAIN-COPY after Dr Bainbridge ... Wed, 09 Oct 2019 05:58:30 GMT ak19 [33556] * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) Blacklisted wikipedia pages that are actually in other languages ... Wed, 09 Oct 2019 05:43:47 GMT ak19 [33555] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) Modified top sites list as Dr Bainbridge described: suffixes for the ... Wed, 09 Oct 2019 05:11:19 GMT ak19 [33554] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) Added more to blacklist and greylist. And removed remaining ... Fri, 04 Oct 2019 09:19:20 GMT ak19 [33553] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) Comments Fri, 04 Oct 2019 06:35:06 GMT ak19 [33551] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) Added in top 500 urls from moz.com/top500 and removed duplicates, and ... Fri, 04 Oct 2019 06:06:51 GMT ak19 [33550] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (added) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) First stage of introducing sites-too-big-to-exhaustively-crawl.tx: ... Thu, 26 Sep 2019 11:06:11 GMT ak19 [33532] * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) Found the other top 500 sites link again at last which Dr Bainbridge ... Thu, 26 Sep 2019 11:03:01 GMT ak19 [33531] * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt (added) Added whitelist for mi.wikipedia.org, and updates to blacklist and ... Mon, 23 Sep 2019 11:11:29 GMT ak19 [33502] * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (added) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (added) Current url pattern blacklist and greylist filter files. Used by ... Mon, 16 Sep 2019 07:45:01 GMT ak19 [33480] * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WETProcessor.java (modified) Much harder to remove pages where words are fused together as some ... Fri, 13 Sep 2019 05:44:41 GMT ak19 [33467] * gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt (modified) * gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt (modified) * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WETProcessor.java (modified) Improved the code to use a static block to load the needed properties ... Tue, 13 Aug 2019 09:54:31 GMT ak19 [33412] * gs3-extensions/maori-lang-detection/conf/config.properties (modified) config command for wgetting a single file Sun, 11 Aug 2019 09:15:26 GMT ak19 [33400] * gs3-extensions/maori-lang-detection/conf/log4j.properties (added) * gs3-extensions/maori-lang-detection/conf/log4j.properties.in (added) * gs3-extensions/maori-lang-detection/lib/log4j-1.2.8.jar (added) 1. Setting up log4j.properties based on the macronizer's basic one ... Sun, 11 Aug 2019 08:48:54 GMT ak19 [33399] * gs3-extensions/maori-lang-detection/conf (added) * gs3-extensions/maori-lang-detection/conf/config.properties (moved) * gs3-extensions/maori-lang-detection/lib/gutil.jar (added) Putting properties files into the conf folder and keeping the lib ...