Changeset 33538 for gs3-extensions


Ignore:
Timestamp:
2019-10-01T21:36:06+13:00 (5 years ago)
Author:
ak19
Message:

Some additions to the setup.sh script to query commoncrawl for MRI data on hadoop before I commit what I've done to crawl with Autistici's crawl software.

Location:
gs3-extensions/maori-lang-detection/hdfs-cc-work
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/hdfs-cc-work/Readme.txt

    r33535 r33538  
    55B.  Create IAM role on Amazon AWS to use S3a
    66C.  Configure Spark on your vagrant VM with the AWS authentication details
     7---
     8Script scripts/setup.sh now is automated to do the steps in D-F below
     9and prints out the main instruction for G.
     10---
    711D.  OPTIONAL? Further configuration for Hadoop to work with Amazon AWS
    812E.  Setup cc-index-table git project
     
    152156]
    153157
     158----------------------------------------------------------------------
     159NOTE:
     160Script scripts/setup.sh now is automated to do the steps in D-F below
     161and prints out the main instruction for G.
     162
    154163
    155164----------------------------------------------------------------------
  • gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/setup.sh

    r33535 r33538  
    7171fi
    7272
    73 echo "Done compiling and setting up."
     73echo "Done compiling and automated parts of setting up."
     74echo "NEXT STEP:"
     75echo "Ensure you have sudo edited $SPARK_HOME/conf/spark-defaults.conf"
     76echo "  (/usr/local/spark-2.3.0-bin-hadoop2.7/conf/spark-defaults.conf)"
     77echo "to contain the following 3 lines with YOUR Amazon AWS IAM Role access and secret keys:"
     78echo "   spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
     79echo "   spark.hadoop.fs.s3a.access.key=YOUR_AWS_IAM-ROLE_ACCESSKEY_HERE"
     80echo "   spark.hadoop.fs.s3a.secret.key=YOUR_AWS_IAM-ROLE_SECRETKEY_HERE"
     81echo "Consult GS_README.TXT section B (and C) for instructions on setting up an AWS IAM role."
     82echo "Only when that's done will you be ready to run the following script."
     83echo ""
     84echo "THEN:"
    7485echo "To get MRI warc to wet for a particular crawl timestamp, cd into cc-index-table and RUN:"
    7586echo "./get_maori_WET_records_for_crawl.sh CC-MAIN-<YYYY-##>"
Note: See TracChangeset for help on using the changeset viewer.