source: gs3-extensions/maori-lang-detection/hdfs-cc-work

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33618   4 years ak19 Adding in the download URL
(edit) @33608   4 years ak19 1. New script to export from HBase so that we could in theory reimport …
(edit) @33598   5 years ak19 More instructions on setting up Nutch now that I've remembered to …
(edit) @33597   5 years ak19 Committing active version of template file which has a newline at end …
(edit) @33596   5 years ak19 Adding in the nutch-site.xml and regex-urlfilter.GS_TEMPLATE template …
(edit) @33574   5 years ak19 If nutch stores a crawled site in more than 1 file, then cat all of …
(edit) @33573   5 years ak19 Forgot to document that spaces were also allowed as separator in the …
(edit) @33571   5 years ak19 Adding Dr Bainbridge's suggestion of appending the crawlId of each …
(edit) @33570   5 years ak19 Need to check if UNFINISHED file actually exists before moving it …
(edit) @33569   5 years ak19 1. batchcrawl.sh now does what it should have from the start, which is …
(edit) @33567   5 years ak19 batchcrawl.sh now supports -all flag (and prints usage on 0 args). The …
(edit) @33566   5 years ak19 batchcrawl.sh script now supports taking a comma or space separated …
(edit) @33564   5 years ak19 batchcrawl.sh now does the crawl and logs output of the crawl, dumps …
(edit) @33563   5 years ak19 Committing inactive testing batch scripts (only creates the …
(edit) @33545   5 years ak19 Mainly changes to crawling-Nutch.txt and some minor changes to other …
(edit) @33543   5 years ak19 Filled in some missing instructions
(edit) @33541   5 years ak19 1. hdfs-cc-work/GS_README.txt now contains the complete instructions …
(edit) @33539   5 years ak19 File rename
(edit) @33538   5 years ak19 Some additions to the setup.sh script to query commoncrawl for MRI …
(edit) @33536   5 years ak19 Changes required to the commoncrawl related Vagrant github project to …
(edit) @33535   5 years ak19 1. New setup.sh script for on a hadoop system to setup the git …
(edit) @33534   5 years ak19 Correction: toplevel script has to be placed inside cc-index-table not …
(edit) @33530   5 years ak19 Completed sentence that was left hanging.
(copy) @33527   5 years ak19 Name change for folder
copied from gs3-extensions/maori-lang-detection/hdfs-instructions
(edit) @33526   5 years ak19 Moved hadoop related scripts from bin/script into hdfs-instructions
Note: See TracRevisionLog for help on using the revision log.