Vargrant provisioning files to spin up a modest (4 node) Hadoop cluster for experiments processing HTRC Extracted Feature JSON files suitable for ingesting into Solr. Top-level code Apache Spark, processing HDFS stored JSON files, hence the need for an underlying Hadoop cluster. Provisioning based on the following online resources, but updated to use newer versions of Ubuntu, Java, and Hadoop. http://cscarioni.blogspot.co.nz/2012/09/setting-up-hadoop-virtual-cluster-with.html https://github.com/calo81/vagrant-hadoop-cluster