Context Navigation

README.txt@ 30919

Last change on this file since 30919 was 30916, checked in by davidb, 7 years ago
Some additional details -- note form
File size: 1.0 KB

Line
1
2	----
3	Introduction
4	----
5
6	Java code for processing HTRC Extracted Feature JSON files, suitable for
7	ingesting into Solr. Designed to be used on a Spark cluster with HDFS.
8
9	----
10	Setup Proceddure
11	----
12
13	This is Step 2, of a two step setup procedure.
14
15	For Step 1, see:
16
17	http://svn.greenstone.org/other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt
18
19	Assumptions
20
21	* You have 'svn' and 'mvn' on your PATH
22
23	----
24	Step 2
25	----
26
27
28	Compile the code:
29
30	./COMPILE.bash
31
32	The first time this is run, a variety of Maven/Java dependencies will be
33	downloaded.
34
35
36	Next acquire some JSON files to procesds. For example:
37
38	./scripts/PD-GET-FULL-FILE-LIST.sh
39	./scripts/PD-SELECT-EVERY-10000.sh
40	./scripts/PD-DOWNLOAD-EVERY-10000.sh
41
42	Now run the code:
43	./RUN.bash pd-ef-json-filelist.txt
44
45
46	% jps
47	19468 SecondaryNameNode
48	19604 Master
49	19676 Jps
50	19212 NameNode
51
52
53	hdfs -mkdir /user
54	46 hdfs dfs -mkdir /user
55	47 hdfs dfs -mkdir /user/htrc
56	48 hdfs dfs -put pd-file-listing-step10000.txt /user/htrc/.
57
58

Note: See TracBrowser for help on using the repository browser.