Changeset 33538

Show
Ignore:
Timestamp:
01.10.2019 21:36:06 (2 weeks ago)
Author:
ak19
Message:

Some additions to the setup.sh script to query commoncrawl for MRI data on hadoop before I commit what I've done to crawl with Autistici's crawl software.

Location:
gs3-extensions/maori-lang-detection/hdfs-cc-work
Files:
2 modified

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/hdfs-cc-work/Readme.txt

    r33535 r33538  
    55B.  Create IAM role on Amazon AWS to use S3a 
    66C.  Configure Spark on your vagrant VM with the AWS authentication details 
     7--- 
     8Script scripts/setup.sh now is automated to do the steps in D-F below 
     9and prints out the main instruction for G. 
     10--- 
    711D.  OPTIONAL? Further configuration for Hadoop to work with Amazon AWS 
    812E.  Setup cc-index-table git project 
     
    152156] 
    153157 
     158---------------------------------------------------------------------- 
     159NOTE: 
     160Script scripts/setup.sh now is automated to do the steps in D-F below 
     161and prints out the main instruction for G. 
     162 
    154163 
    155164---------------------------------------------------------------------- 
  • gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/setup.sh

    r33535 r33538  
    7171fi 
    7272 
    73 echo "Done compiling and setting up." 
     73echo "Done compiling and automated parts of setting up." 
     74echo "NEXT STEP:" 
     75echo "Ensure you have sudo edited $SPARK_HOME/conf/spark-defaults.conf" 
     76echo "  (/usr/local/spark-2.3.0-bin-hadoop2.7/conf/spark-defaults.conf)" 
     77echo "to contain the following 3 lines with YOUR Amazon AWS IAM Role access and secret keys:" 
     78echo "   spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" 
     79echo "   spark.hadoop.fs.s3a.access.key=YOUR_AWS_IAM-ROLE_ACCESSKEY_HERE" 
     80echo "   spark.hadoop.fs.s3a.secret.key=YOUR_AWS_IAM-ROLE_SECRETKEY_HERE" 
     81echo "Consult GS_README.TXT section B (and C) for instructions on setting up an AWS IAM role." 
     82echo "Only when that's done will you be ready to run the following script." 
     83echo "" 
     84echo "THEN:" 
    7485echo "To get MRI warc to wet for a particular crawl timestamp, cd into cc-index-table and RUN:" 
    7586echo "./get_maori_WET_records_for_crawl.sh CC-MAIN-<YYYY-##>"