Greenstone3 collection resulting from Anu's work with CommonCrawl web dumps The PRE-IMPORT-PREPARE.sh script currently grabs the archives.tar.gz and index.tar.gz from the Atea google-drive area, and untars them