Context Navigation

README.txt@ 31161

Last change on this file since 31161 was 30914, checked in by davidb, 7 years ago
Tidy up of setup description
File size: 2.4 KB

Line
1
2	----
3	Introduction
4	----
5
6	Vargrant provisioning files to spin up a modest Spark cluster (master
7	+ 3 slaves + backup) for experiments in processing HTRC Extracted
8	Feature JSON files in parallel, suitable for ingesting into Solr.
9
10
11	Assumptions
12
13	* You have VirtualBox and Vagrant installed
14	(at time of writing VirtualBox v5.0.28, Vagrant 1.8.6)
15
16
17	Useful
18
19	* Installing the Vagrant VirutalBox Guest Additions plugin to stop warnings
20	about potentially incompatible versions:
21
22	vagrant plugin install vagrant-vbguest
23
24
25	----
26	Setup Procedure
27	----
28
29	This is a 2 step process:
30
31	Step 1: Setting up the cluster
32	Step 2: Checking out the Java code to processing the JSON files
33
34
35	Step 1 is covered by this README file, ending with an svn checkout of
36	the Java code on the 'master' node that processes the JSON files. The
37	files checked out includes the README file covering Step 2.
38
39	----
40	Step 1
41	----
42
43	From within the directory this README.txt is located enter:
44
45	vagrant up
46
47	The first time this is run, there is a lot of downloading and setup to
48	do. Subsequent use of this command spins the cluster up much faster.
49
50	Once the cluster is set up, you need to get the Spark framework up and
51	running, which in turn uses Hadoop's HDFS. You do this as the user
52	'htrc' on the 'master' node:
53
54	vagrant ssh master
55	sudo su - htrc
56
57	If the first time, you need to format an HDFS area to use:
58
59	hdfs namenode -format
60
61	Otherwise start up HDFS and Spark deamon processes:
62
63	start-dfs.sh
64	spark-start-all.sh
65
66	You can visit the Spark cluster monitoring page at:
67
68	http://10.10.0.52:8080/
69
70	----
71	Getting ready for Step 2
72	----
73
74	With the Spark cluster with HDFS up and running, you are now ready to
75	proceed to Step 2, running the JSON processing code.
76
77
78	There are a couple of packages the 'master' node needs for this ('svn'
79	and 'mvn'), which we install as the 'vagrant' user. Then we are in a
80	position to check out the Java code, which in turn includes the README
81	file for Step 2.
82
83	Install subversion and maven as using the 'vagrant' user's sudo ability:
84
85	vagrant ssh master
86	sudo apt-get install subversion
87	sudo apt-get install maven
88
89	Now switch from the 'vagrant' user to 'htrc' and check out the Java code:
90
91	sudo su - htrc
92
93	svn co http://svn.greenstone.org/other-projects/hathitrust/solr-extracted-features/trunk solr-extracted-features
94
95	Now follow the README file for Step 2:
96
97	cd solr-extracted-features
98	less README.txt
99
100	----
101
102
103
104
105

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: other-projects/hathitrust/wcsa/extracted-features-solr/trunk/vagrant-spark-hdfs-cluster/README.txt@ 31161

Download in other formats: