# # ChangeLog for other-projects/hathitrust/solr-extracted-features/trunk/src/main/java # # Generated by Trac 1.4.2 # 2024-06-01T12:36:12+12:00 Fri, 28 Oct 2016 01:35:52 GMT davidb [30977] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Only have RDD if an output directory was specified on the command- ... Fri, 28 Oct 2016 01:32:25 GMT davidb [30976] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Change to reflect changed order of command-line arguments Fri, 28 Oct 2016 01:28:51 GMT davidb [30975] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-CLUSTER.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Introduction of new solr-url command line argument, leading to some ... Fri, 28 Oct 2016 00:47:10 GMT davidb [30974] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) update/add/doc JSON structure needed Thu, 27 Oct 2016 22:53:02 GMT davidb [30973] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Changed to saving Solr JSON file for debugging purposes Thu, 27 Oct 2016 22:35:48 GMT davidb [30971] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Adding in post to Solr cloud. Changed text_t to _text_ Thu, 27 Oct 2016 22:10:32 GMT davidb [30970] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Added in mapping of EF-JSON to Solr 'add' JSON format Wed, 26 Oct 2016 05:00:53 GMT davidb [30953] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Need to specify _output_dir as part of output JSON filename Wed, 26 Oct 2016 04:54:44 GMT davidb [30951] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Save a JSONObject as a file in the output directory Wed, 26 Oct 2016 04:40:49 GMT davidb [30949] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Use better name than 'foo'. Further fix to JSON name generated Wed, 26 Oct 2016 04:24:44 GMT davidb [30947] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Correction to 'pages-' part of JSON.bz2 output filename used Wed, 26 Oct 2016 02:47:01 GMT davidb [30946] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Correction to output JSON.bz2 name generated Wed, 26 Oct 2016 02:37:24 GMT davidb [30945] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Getting closer to writing out JSON files Wed, 26 Oct 2016 01:47:29 GMT davidb [30944] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Forcer higher partition (6) than default, which seems to be 2 Wed, 26 Oct 2016 01:39:12 GMT davidb [30943] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Extra debug info Wed, 26 Oct 2016 01:27:44 GMT davidb [30942] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Improved output printing for slave node Wed, 26 Oct 2016 01:16:25 GMT davidb [30941] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Moved to getFileSystemInstance() method to play nice on cluster Wed, 26 Oct 2016 01:01:01 GMT davidb [30940] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) Change to using URI not fileIn directly Wed, 26 Oct 2016 00:53:39 GMT davidb [30938] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (added) Experiment with using Hadoop's FileSystem class for local file:// access Wed, 26 Oct 2016 00:44:38 GMT davidb [30937] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Expanded set of ClusterFileIO methods Tue, 25 Oct 2016 22:05:28 GMT davidb [30934] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Providing json-filelist now a compulsory argument, rather than an option Tue, 25 Oct 2016 21:24:53 GMT davidb [30933] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) More careful parsing of file prefix Tue, 25 Oct 2016 21:16:06 GMT davidb [30932] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Support both file:// and hdfs:// Tue, 25 Oct 2016 10:49:13 GMT davidb [30924] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Tidy up of code. Removed commented out code Tue, 25 Oct 2016 10:23:08 GMT davidb [30921] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Code change to read in JSON file over HDFS Tue, 25 Oct 2016 01:49:36 GMT davidb [30918] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) More flexible command-line args Sat, 22 Oct 2016 03:28:03 GMT davidb [30898] * other-projects/hathitrust/solr-extracted-features/trunk/COMPILE.bat (added) * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bat (added) * other-projects/hathitrust/solr-extracted-features/trunk/pom.xml (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-1000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-SELECT-EVERY-1000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/src (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (added) Scripts for downloading sample JSON data from public domain extracted ...