source: other-projects/hathitrust/solr-extracted-features/trunk/README.txt@ 30915

Last change on this file since 30915 was 30915, checked in by davidb, 8 years ago

Initial cut at instructions to follow to get code set up and running

File size: 798 bytes
RevLine 
[30915]1
2----
3Introduction
4----
5
6Java code for processing HTRC Extracted Feature JSON files, suitable for
7ingesting into Solr. Designed to be used on a Spark cluster with HDFS.
8
9----
10Setup Proceddure
11----
12
13This is Step 2, of a two step setup procedure.
14
15For Step 1, see:
16
17 http://svn.greenstone.org/other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt
18
19*Assumptions*
20
21 * You have 'svn' and 'mvn' on your PATH
22
23----
24Step 2
25----
26
27
28Compile the code:
29
30 ./COMPILE.bash
31
32The first time this is run, a variety of Maven/Java dependencies will be
33downloaded.
34
35
36Next acquire some JSON files to procesds. For example:
37
38 ./scripts/PD-GET-FULL-FILE-LIST.sh
39 ./scripts/PD-SELECT-EVERY-10000.sh
40 ./scripts/PD-DOWNLOAD-EVERY-10000.sh
41
42Now run the code:
43 ./RUN.bash pd-ef-json-filelist.txt
44
45
46
47
Note: See TracBrowser for help on using the repository browser.