Ignore:
Timestamp:
2019-10-30T23:02:26+13:00 (4 years ago)
Author:
ak19
Message:
  1. New script to export from HBase so that we could in theory reimport into HBase. I've not tried the reimport out, but I followed instructions to export and I got a non-zero output file, so I am assuming it worked. 2. Committing today's new crawls in crawledNode4.tar. Each crawled site's folder inside it now includes a file called part-m-* that is the exported Hbase on that node VM. 3. Updated hdfs related GS_README.txt with instructions on viewing the contents of a table in HBase and a link on exporting/importing from HBase. 4. Minor changes like the tar files shouldn't be called tar.gz.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh

    r33574 r33608  
    6464        echo "2. copy the regex-urlfilter file:" 2>&1 | tee -a ${siteDir}UNFINISHED
    6565        echo "   cp $NUTCH_URLFILTER_TEMPLATE $NUTCH_URLFILTER_FILE" 2>&1 | tee -a ${siteDir}UNFINISHED
    66         echo "3. Adjust # crawl iterations in old crawl command:\n$crawl_cmd" 2>&1 | tee -a ${siteDir}UNFINISHED
     66        echo "3. Adjust # crawl iterations in old crawl command:" 2>&1 | tee -a ${siteDir}UNFINISHED
     67        echo "   $crawl_cmd" 2>&1 | tee -a ${siteDir}UNFINISHED
    6768    fi
    6869   
Note: See TracChangeset for help on using the changeset viewer.