source: gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33574   5 years ak19 If nutch stores a crawled site in more than 1 file, then cat all of …
(edit) @33573   5 years ak19 Forgot to document that spaces were also allowed as separator in the …
(edit) @33571   5 years ak19 Adding Dr Bainbridge's suggestion of appending the crawlId of each …
(edit) @33570   5 years ak19 Need to check if UNFINISHED file actually exists before moving it …
(edit) @33569   5 years ak19 1. batchcrawl.sh now does what it should have from the start, which is …
(edit) @33567   5 years ak19 batchcrawl.sh now supports -all flag (and prints usage on 0 args). The …
(edit) @33566   5 years ak19 batchcrawl.sh script now supports taking a comma or space separated …
(edit) @33564   5 years ak19 batchcrawl.sh now does the crawl and logs output of the crawl, dumps …
(edit) @33563   5 years ak19 Committing inactive testing batch scripts (only creates the …
(edit) @33538   5 years ak19 Some additions to the setup.sh script to query commoncrawl for MRI …
(edit) @33535   5 years ak19 1. New setup.sh script for on a hadoop system to setup the git …
(edit) @33534   5 years ak19 Correction: toplevel script has to be placed inside cc-index-table not …
(copy) @33527   5 years ak19 Name change for folder
copied from gs3-extensions/maori-lang-detection/hdfs-instructions/scripts
(edit) @33526   5 years ak19 Moved hadoop related scripts from bin/script into hdfs-instructions
Note: See TracRevisionLog for help on using the revision log.