source: gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33608   5 years ak19 1. New script to export from HBase so that we could in theory reimport …
(edit) @33574   5 years ak19 If nutch stores a crawled site in more than 1 file, then cat all of …
(edit) @33573   5 years ak19 Forgot to document that spaces were also allowed as separator in the …
(edit) @33571   5 years ak19 Adding Dr Bainbridge's suggestion of appending the crawlId of each …
(edit) @33570   5 years ak19 Need to check if UNFINISHED file actually exists before moving it …
(edit) @33569   5 years ak19 1. batchcrawl.sh now does what it should have from the start, which is …
(edit) @33567   5 years ak19 batchcrawl.sh now supports -all flag (and prints usage on 0 args). The …
(edit) @33566   5 years ak19 batchcrawl.sh script now supports taking a comma or space separated …
(edit) @33564   5 years ak19 batchcrawl.sh now does the crawl and logs output of the crawl, dumps …
(add) @33563   5 years ak19 Committing inactive testing batch scripts (only creates the …
Note: See TracRevisionLog for help on using the revision log.