|
|
@33608
|
5 years |
ak19 |
1. New script to export from HBase so that we could in theory reimport …
|
|
|
@33598
|
5 years |
ak19 |
More instructions on setting up Nutch now that I've remembered to …
|
|
|
@33597
|
5 years |
ak19 |
Committing active version of template file which has a newline at end …
|
|
|
@33596
|
5 years |
ak19 |
Adding in the nutch-site.xml and regex-urlfilter.GS_TEMPLATE template …
|
|
|
@33574
|
5 years |
ak19 |
If nutch stores a crawled site in more than 1 file, then cat all of …
|
|
|
@33573
|
5 years |
ak19 |
Forgot to document that spaces were also allowed as separator in the …
|
|
|
@33571
|
5 years |
ak19 |
Adding Dr Bainbridge's suggestion of appending the crawlId of each …
|
|
|
@33570
|
5 years |
ak19 |
Need to check if UNFINISHED file actually exists before moving it …
|
|
|
@33569
|
5 years |
ak19 |
1. batchcrawl.sh now does what it should have from the start, which is …
|
|
|
@33567
|
5 years |
ak19 |
batchcrawl.sh now supports -all flag (and prints usage on 0 args). The …
|
|
|
@33566
|
5 years |
ak19 |
batchcrawl.sh script now supports taking a comma or space separated …
|
|
|
@33564
|
5 years |
ak19 |
batchcrawl.sh now does the crawl and logs output of the crawl, dumps …
|
|
|
@33563
|
5 years |
ak19 |
Committing inactive testing batch scripts (only creates the …
|
|
|
@33545
|
5 years |
ak19 |
Mainly changes to crawling-Nutch.txt and some minor changes to other …
|
|
|
@33543
|
5 years |
ak19 |
Filled in some missing instructions
|
|
|
@33541
|
5 years |
ak19 |
1. hdfs-cc-work/GS_README.txt now contains the complete instructions …
|
|
|
@33539
|
5 years |
ak19 |
File rename
|
|
|
@33538
|
5 years |
ak19 |
Some additions to the setup.sh script to query commoncrawl for MRI …
|
|
|
@33536
|
5 years |
ak19 |
Changes required to the commoncrawl related Vagrant github project to …
|
|
|
@33535
|
5 years |
ak19 |
1. New setup.sh script for on a hadoop system to setup the git …
|
|
|
@33534
|
5 years |
ak19 |
Correction: toplevel script has to be placed inside cc-index-table not …
|
|
|
@33530
|
5 years |
ak19 |
Completed sentence that was left hanging.
|
|
|
@33527
|
5 years |
ak19 |
Name change for folder
|
|
copied from gs3-extensions/maori-lang-detection/hdfs-instructions
|
|
|
@33526
|
5 years |
ak19 |
Moved hadoop related scripts from bin/script into hdfs-instructions
|