root/other-projects/hathitrust/vagrant-hadoop-cluster/trunk/README.txt @ 30905

Revision 30905, 1.6 KB (checked in by davidb, 4 years ago)

Additional resources

Line 
1
2Vargrant provisioning files to spin up a modest (4 node) Hadoop
3cluster for experiments processing HTRC Extracted Feature JSON files
4suitable for ingesting into Solr.
5
6
7Top-level code Apache Spark, processing HDFS stored JSON files, hence
8the need for an underlying Hadoop cluster.
9
10Provisioning based on the following online resources, but updated to
11use newer versions of Ubuntu, Java, and Hadoop.
12
13  http://cscarioni.blogspot.co.nz/2012/09/setting-up-hadoop-virtual-cluster-with.html
14
15  https://github.com/calo81/vagrant-hadoop-cluster
16
17
18Supporting Resources
19====================
20
21----
22Basic Hadoop Cluster
23----
24
25Useful documentation about setting up a Hadoop cluster, read:
26
27  http://chaalpritam.blogspot.co.nz/2015/05/hadoop-270-single-node-cluster-setup-on.html
28then
29  http://chaalpritam.blogspot.co.nz/2015/05/hadoop-270-multi-node-cluster-setup-on.html
30
31OR
32
33  https://xuri.me/2015/03/09/setup-hadoop-on-ubuntu-single-node-cluster.html
34then
35  https://xuri.me/2016/03/22/setup-hadoop-on-ubuntu-multi-node-cluster.html
36
37For working with newer Linux OS and version of software:
38
39  http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
40
41----
42Hadoop + Apache Ambari in 3 lines:
43----
44
45  https://blog.codecentric.de/en/2014/04/hadoop-cluster-automation/
46
47but looks like a fairly old version of software (currently unused).
48
49----
50Vagrant
51----
52
53To get rid of 'Guest Additions' warnins (about potentially
54incompatible version numbers) use 'vbguest' plugin:
55
56  vagrant plugin install vagrant-vbguest
57
58For more details see:
59
60http://kvz.io/blog/2013/01/16/vagrant-tip-keep-virtualbox-guest-additions-in-sync/
61
Note: See TracBrowser for help on using the browser.