Context Navigation

README.txt@ 30915

Last change on this file since 30915 was 30915, checked in by davidb, 8 years ago
Initial cut at instructions to follow to get code set up and running
File size: 798 bytes

Rev	Line
[30915]	1
	2	----
	3	Introduction
	4	----
	5
	6	Java code for processing HTRC Extracted Feature JSON files, suitable for
	7	ingesting into Solr. Designed to be used on a Spark cluster with HDFS.
	8
	9	----
	10	Setup Proceddure
	11	----
	12
	13	This is Step 2, of a two step setup procedure.
	14
	15	For Step 1, see:
	16
	17	http://svn.greenstone.org/other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt
	18
	19	Assumptions
	20
	21	* You have 'svn' and 'mvn' on your PATH
	22
	23	----
	24	Step 2
	25	----
	26
	27
	28	Compile the code:
	29
	30	./COMPILE.bash
	31
	32	The first time this is run, a variety of Maven/Java dependencies will be
	33	downloaded.
	34
	35
	36	Next acquire some JSON files to procesds. For example:
	37
	38	./scripts/PD-GET-FULL-FILE-LIST.sh
	39	./scripts/PD-SELECT-EVERY-10000.sh
	40	./scripts/PD-DOWNLOAD-EVERY-10000.sh
	41
	42	Now run the code:
	43	./RUN.bash pd-ef-json-filelist.txt
	44
	45
	46
	47

Note: See TracBrowser for help on using the repository browser.