Last change
on this file since 30915 was 30915, checked in by davidb, 7 years ago |
Initial cut at instructions to follow to get code set up and running
|
File size:
798 bytes
|
Line | |
---|
1 |
|
---|
2 | ----
|
---|
3 | Introduction
|
---|
4 | ----
|
---|
5 |
|
---|
6 | Java code for processing HTRC Extracted Feature JSON files, suitable for
|
---|
7 | ingesting into Solr. Designed to be used on a Spark cluster with HDFS.
|
---|
8 |
|
---|
9 | ----
|
---|
10 | Setup Proceddure
|
---|
11 | ----
|
---|
12 |
|
---|
13 | This is Step 2, of a two step setup procedure.
|
---|
14 |
|
---|
15 | For Step 1, see:
|
---|
16 |
|
---|
17 | http://svn.greenstone.org/other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt
|
---|
18 |
|
---|
19 | *Assumptions*
|
---|
20 |
|
---|
21 | * You have 'svn' and 'mvn' on your PATH
|
---|
22 |
|
---|
23 | ----
|
---|
24 | Step 2
|
---|
25 | ----
|
---|
26 |
|
---|
27 |
|
---|
28 | Compile the code:
|
---|
29 |
|
---|
30 | ./COMPILE.bash
|
---|
31 |
|
---|
32 | The first time this is run, a variety of Maven/Java dependencies will be
|
---|
33 | downloaded.
|
---|
34 |
|
---|
35 |
|
---|
36 | Next acquire some JSON files to procesds. For example:
|
---|
37 |
|
---|
38 | ./scripts/PD-GET-FULL-FILE-LIST.sh
|
---|
39 | ./scripts/PD-SELECT-EVERY-10000.sh
|
---|
40 | ./scripts/PD-DOWNLOAD-EVERY-10000.sh
|
---|
41 |
|
---|
42 | Now run the code:
|
---|
43 | ./RUN.bash pd-ef-json-filelist.txt
|
---|
44 |
|
---|
45 |
|
---|
46 |
|
---|
47 |
|
---|
Note:
See
TracBrowser
for help on using the repository browser.