source: gs3-extensions/maori-lang-detection/conf

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33569   5 years ak19 1. batchcrawl.sh now does what it should have from the start, which is …
(edit) @33568   5 years ak19 1. More sites greylisted and blacklisted, discovered as I attempted to …
(edit) @33565   5 years ak19 CCWETProcessor: domain url now goes in as a seedURL after the …
(edit) @33562   5 years ak19 1. The sites-too-big-to-exhaustively-crawl.txt is now a csv file of a …
(edit) @33561   5 years ak19 1. sites-too-big-to-exhaustively-crawl.txt is now a comma separated …
(edit) @33559   5 years ak19 1. Special string COPY changed to SUBDOMAIN-COPY after Dr Bainbridge …
(edit) @33556   5 years ak19 Blacklisted wikipedia pages that are actually in other languages which …
(edit) @33555   5 years ak19 Modified top sites list as Dr Bainbridge described: suffixes for the …
(edit) @33554   5 years ak19 Added more to blacklist and greylist. And removed remaining duplicates …
(edit) @33553   5 years ak19 Comments
(edit) @33551   5 years ak19 Added in top 500 urls from moz.com/top500 and removed duplicates, and …
(edit) @33550   5 years ak19 First stage of introducing sites-too-big-to-exhaustively-crawl.tx: …
(edit) @33532   5 years ak19 Found the other top 500 sites link again at last which Dr Bainbridge …
(edit) @33531   5 years ak19 Added whitelist for mi.wikipedia.org, and updates to blacklist and …
(edit) @33502   5 years ak19 Current url pattern blacklist and greylist filter files. Used by …
(edit) @33480   5 years ak19 Much harder to remove pages where words are fused together as some are …
(edit) @33467   5 years ak19 Improved the code to use a static block to load the needed properties …
(edit) @33412   5 years ak19 config command for wgetting a single file
(edit) @33400   5 years ak19 1. Setting up log4j.properties based on the macronizer's basic one …
(add) @33399   5 years ak19 Putting properties files into the conf folder and keeping the lib …
Note: See TracRevisionLog for help on using the revision log.