Vargrant provisioning files to spin up a modest (4 node) Hadoop cluster for experiments processing HTRC Extracted Feature JSON files suitable for ingesting into Solr. Top-level code Apache Spark, processing HDFS stored JSON files, hence the need for an underlying Hadoop cluster. Provisioning based on the following online resources, but updated to use newer versions of Ubuntu, Java, and Hadoop. http://cscarioni.blogspot.co.nz/2012/09/setting-up-hadoop-virtual-cluster-with.html https://github.com/calo81/vagrant-hadoop-cluster Supporting Resources ==================== ---- Basic Hadoop Cluster ---- Useful documentation about setting up a Hadoop cluster, read: http://chaalpritam.blogspot.co.nz/2015/05/hadoop-270-single-node-cluster-setup-on.html then http://chaalpritam.blogspot.co.nz/2015/05/hadoop-270-multi-node-cluster-setup-on.html OR https://xuri.me/2015/03/09/setup-hadoop-on-ubuntu-single-node-cluster.html then https://xuri.me/2016/03/22/setup-hadoop-on-ubuntu-multi-node-cluster.html For working with newer Linux OS and version of software: http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php ---- Hadoop + Apache Ambari in 3 lines: ---- https://blog.codecentric.de/en/2014/04/hadoop-cluster-automation/ but looks like a fairly old version of software (currently unused). ---- Vagrant ---- To get rid of 'Guest Additions' warnins (about potentially incompatible version numbers) use 'vbguest' plugin: vagrant plugin install vagrant-vbguest For more details see: http://kvz.io/blog/2013/01/16/vagrant-tip-keep-virtualbox-guest-additions-in-sync/