---- Introduction ---- Java code for processing HTRC Extracted Feature JSON files, suitable for ingesting into Solr. Designed to be used on a Spark cluster with HDFS. ---- Setup Proceddure ---- This is Step 2, of a two step setup procedure. For Step 1, see: http://svn.greenstone.org/other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt *Assumptions* * You have 'svn' and 'mvn' on your PATH ---- Step 2 ---- Compile the code: ./COMPILE.bash The first time this is run, a variety of Maven/Java dependencies will be downloaded. Next acquire some JSON files to procesds. For example: ./scripts/PD-GET-FULL-FILE-LIST.sh ./scripts/PD-SELECT-EVERY-10000.sh ./scripts/PD-DOWNLOAD-EVERY-10000.sh Now run the code: ./RUN.bash pd-ef-json-filelist.txt