Changeset 33574
- Timestamp:
- 2019-10-16T23:35:45+13:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh
r33573 r33574 36 36 37 37 # $siteDir parameter is the folder containing seedURLs.txt 38 crawl_cmd="./$CRAWL_COMMAND $siteDir $crawlId $CRAWL_ITERATIONS" 39 40 # Since we're going to crawl from scratch, create log.out file 41 # Logging to terminal and log file simultaenously 38 42 # https://stackoverflow.com/questions/418896/how-to-redirect-output-to-a-file-and-stdout 39 crawl_cmd="./$CRAWL_COMMAND $siteDir $crawlId $CRAWL_ITERATIONS"40 41 # Since we're going to crawl from scratch, create log.out file42 43 echo "Going to run nutch crawl command (and copy output to ${siteDir}log.out):" 2>&1 | tee ${siteDir}log.out 43 44 # append to log.out file hereafter … … 69 70 ./$NUTCH_COMMAND readdb -dump $outputDir/$crawlId -text -crawlId $crawlId 70 71 ./$NUTCH_COMMAND readdb -stats -crawlId $crawlId > $outputDir/$crawlId/stats 71 cat $outputDir/$crawlId/part-r- 00000> $outputDir/$crawlId/dump.txt72 cat $outputDir/$crawlId/part-r-* > $outputDir/$crawlId/dump.txt 72 73 else 73 74 # appending to log.out
Note:
See TracChangeset
for help on using the changeset viewer.