Committing README on starting off with the vagrant VM for hadoop-spark to getting the various git-projects (and the jars they need) and compiling them, until we can execute the scripts that run the spark jobs and ultimately produce the wet files for MRI records in common crawls since Sep 2018.