Last change
on this file since 30915 was 30915, checked in by davidb, 8 years ago |
Initial cut at instructions to follow to get code set up and running
|
File size:
798 bytes
|
Rev | Line | |
---|
[30915] | 1 |
|
---|
| 2 | ----
|
---|
| 3 | Introduction
|
---|
| 4 | ----
|
---|
| 5 |
|
---|
| 6 | Java code for processing HTRC Extracted Feature JSON files, suitable for
|
---|
| 7 | ingesting into Solr. Designed to be used on a Spark cluster with HDFS.
|
---|
| 8 |
|
---|
| 9 | ----
|
---|
| 10 | Setup Proceddure
|
---|
| 11 | ----
|
---|
| 12 |
|
---|
| 13 | This is Step 2, of a two step setup procedure.
|
---|
| 14 |
|
---|
| 15 | For Step 1, see:
|
---|
| 16 |
|
---|
| 17 | http://svn.greenstone.org/other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt
|
---|
| 18 |
|
---|
| 19 | *Assumptions*
|
---|
| 20 |
|
---|
| 21 | * You have 'svn' and 'mvn' on your PATH
|
---|
| 22 |
|
---|
| 23 | ----
|
---|
| 24 | Step 2
|
---|
| 25 | ----
|
---|
| 26 |
|
---|
| 27 |
|
---|
| 28 | Compile the code:
|
---|
| 29 |
|
---|
| 30 | ./COMPILE.bash
|
---|
| 31 |
|
---|
| 32 | The first time this is run, a variety of Maven/Java dependencies will be
|
---|
| 33 | downloaded.
|
---|
| 34 |
|
---|
| 35 |
|
---|
| 36 | Next acquire some JSON files to procesds. For example:
|
---|
| 37 |
|
---|
| 38 | ./scripts/PD-GET-FULL-FILE-LIST.sh
|
---|
| 39 | ./scripts/PD-SELECT-EVERY-10000.sh
|
---|
| 40 | ./scripts/PD-DOWNLOAD-EVERY-10000.sh
|
---|
| 41 |
|
---|
| 42 | Now run the code:
|
---|
| 43 | ./RUN.bash pd-ef-json-filelist.txt
|
---|
| 44 |
|
---|
| 45 |
|
---|
| 46 |
|
---|
| 47 |
|
---|
Note:
See
TracBrowser
for help on using the repository browser.