source: other-projects/hathitrust/wcsa/extracted-features-solr/trunk/vagrant-spark-hdfs-cluster/README.txt@ 31161

Last change on this file since 31161 was 30914, checked in by davidb, 7 years ago

Tidy up of setup description

File size: 2.4 KB
Line 
1
2----
3Introduction
4----
5
6Vargrant provisioning files to spin up a modest Spark cluster (master
7+ 3 slaves + backup) for experiments in processing HTRC Extracted
8Feature JSON files in parallel, suitable for ingesting into Solr.
9
10
11*Assumptions*
12
13 * You have VirtualBox and Vagrant installed
14 (at time of writing VirtualBox v5.0.28, Vagrant 1.8.6)
15
16
17*Useful*
18
19 * Installing the Vagrant VirutalBox Guest Additions plugin to stop warnings
20 about potentially incompatible versions:
21
22 vagrant plugin install vagrant-vbguest
23
24
25----
26Setup Procedure
27----
28
29This is a 2 step process:
30
31 Step 1: Setting up the cluster
32 Step 2: Checking out the Java code to processing the JSON files
33
34
35Step 1 is covered by this README file, ending with an svn checkout of
36the Java code on the 'master' node that processes the JSON files. The
37files checked out includes the README file covering Step 2.
38
39----
40Step 1
41----
42
43From within the directory this README.txt is located enter:
44
45 vagrant up
46
47The first time this is run, there is a lot of downloading and setup to
48do. Subsequent use of this command spins the cluster up much faster.
49
50Once the cluster is set up, you need to get the Spark framework up and
51running, which in turn uses Hadoop's HDFS. You do this as the user
52'htrc' on the 'master' node:
53
54 vagrant ssh master
55 sudo su - htrc
56
57If the first time, you need to format an HDFS area to use:
58
59 hdfs namenode -format
60
61Otherwise start up HDFS and Spark deamon processes:
62
63 start-dfs.sh
64 spark-start-all.sh
65
66You can visit the Spark cluster monitoring page at:
67
68 http://10.10.0.52:8080/
69
70----
71Getting ready for Step 2
72----
73
74With the Spark cluster with HDFS up and running, you are now ready to
75proceed to Step 2, running the JSON processing code.
76
77
78There are a couple of packages the 'master' node needs for this ('svn'
79and 'mvn'), which we install as the 'vagrant' user. Then we are in a
80position to check out the Java code, which in turn includes the README
81file for Step 2.
82
83Install subversion and maven as using the 'vagrant' user's sudo ability:
84
85 vagrant ssh master
86 sudo apt-get install subversion
87 sudo apt-get install maven
88
89Now switch from the 'vagrant' user to 'htrc' and check out the Java code:
90
91 sudo su - htrc
92
93 svn co http://svn.greenstone.org/other-projects/hathitrust/solr-extracted-features/trunk solr-extracted-features
94
95Now follow the README file for Step 2:
96
97 cd solr-extracted-features
98 less README.txt
99
100----
101
102
103
104
105
Note: See TracBrowser for help on using the repository browser.