1. Most of the scripts in this folder are based on https://github.com/commoncrawl/cc-index-table/tree/master/src/script/convert_url_index.sh to set the environment and setup SPARK etc, in order to run cc-index-table related tasks to process data that we're particularly interested in. Several of the modifications are further based on the instructions and examples at https://github.com/commoncrawl/cc-index-table 2. The third/final phase of the script get_maori_WET_records_for_crawl.sh is based on the instructions at https://groups.google.com/forum/#!topic/common-crawl/hsb90GHq6to 3. The script get_Maori_WET_records_from_CCSep2018_on.sh is merely a batch processing script. 4. Script limit10_export_index.sh never really worked, but it was only scripted for testing purposes, and has no other purpose.