source: main/trunk/greenstone2/perllib/plugins/NutchTextDumpPlugin.pm

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @34137   4 years ak19 Have only been able to incorporate one of Dr Bainbridge's improvements …
(edit) @34131   4 years ak19 Allowing input keep-urls-file to contain a comma followed by country …
(edit) @34130   4 years ak19 Some more tidying up while isMRI filtered collection rebuilding
(edit) @34129   4 years ak19 Implemented Kathy's suggestions: 1. Explicit ex prefix to ex meta …
(edit) @34126   4 years ak19 When I'd modified the code to make the keep_urls_file non-compulsory, …
(edit) @34125   4 years ak19 Commit message went awry. Cleaned up some comments to recommit with …
(edit) @34124   4 years ak19 Decoding the title and text using the encoding seemed to have turned …
(edit) @34123   4 years ak19 Some more minor changes
(edit) @34122   4 years ak19 1. After some testing of building the complete commoncrawl collection, …
(add) @34121   4 years ak19 1. Introducing NutchTextDumpPlugin to process the records …
Note: See TracRevisionLog for help on using the revision log.