# # ChangeLog for other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust # # Generated by Trac 1.4.2 # 2024-06-03T15:34:05+12:00 Sun, 30 Oct 2016 11:07:39 GMT davidb [31002] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/BasePerJSON.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PagedJSONForeach.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Need to separate flatMap and foreach calls in PagedJSON Sun, 30 Oct 2016 10:51:07 GMT davidb [31001] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/JSONSolrTransform.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Code to work per-volume and per-page Sun, 30 Oct 2016 09:49:56 GMT davidb [30998] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (deleted) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (added) Class name refactoring Sun, 30 Oct 2016 09:49:39 GMT davidb [30997] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PagedJSON.java (modified) Verbosity control over printing Sun, 30 Oct 2016 09:25:42 GMT davidb [30996] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (deleted) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (deleted) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/JSONClusterFileIO.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/JSONSolrTransform.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PagedJSON.java (added) Code refactoring Sun, 30 Oct 2016 08:43:02 GMT davidb [30995] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation Sat, 29 Oct 2016 22:39:31 GMT davidb [30990] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) opt name change Sat, 29 Oct 2016 22:32:57 GMT davidb [30988] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Changed flag to 'read-only' and changed the filed name full text ... Sat, 29 Oct 2016 03:57:17 GMT davidb [30986] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Debugging for double accumulator added Sat, 29 Oct 2016 03:17:22 GMT davidb [30985] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Changed to run main processing method as action rather than ... Sat, 29 Oct 2016 02:45:38 GMT davidb [30984] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Introduction of Spark accumulator to measure progress. Output of ... Fri, 28 Oct 2016 02:15:20 GMT davidb [30980] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Code added to read response Fri, 28 Oct 2016 01:44:21 GMT davidb [30979] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) _solr_url needs to be stored in class! Fri, 28 Oct 2016 01:40:30 GMT davidb [30978] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Additional debug statements Fri, 28 Oct 2016 01:35:52 GMT davidb [30977] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Only have RDD if an output directory was specified on the command- ... Fri, 28 Oct 2016 01:32:25 GMT davidb [30976] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Change to reflect changed order of command-line arguments Fri, 28 Oct 2016 01:28:51 GMT davidb [30975] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-CLUSTER.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Introduction of new solr-url command line argument, leading to some ... Fri, 28 Oct 2016 00:47:10 GMT davidb [30974] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) update/add/doc JSON structure needed Thu, 27 Oct 2016 22:53:02 GMT davidb [30973] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Changed to saving Solr JSON file for debugging purposes Thu, 27 Oct 2016 22:35:48 GMT davidb [30971] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Adding in post to Solr cloud. Changed text_t to _text_ Thu, 27 Oct 2016 22:10:32 GMT davidb [30970] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Added in mapping of EF-JSON to Solr 'add' JSON format Wed, 26 Oct 2016 05:00:53 GMT davidb [30953] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Need to specify _output_dir as part of output JSON filename Wed, 26 Oct 2016 04:54:44 GMT davidb [30951] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Save a JSONObject as a file in the output directory Wed, 26 Oct 2016 04:40:49 GMT davidb [30949] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Use better name than 'foo'. Further fix to JSON name generated Wed, 26 Oct 2016 04:24:44 GMT davidb [30947] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Correction to 'pages-' part of JSON.bz2 output filename used Wed, 26 Oct 2016 02:47:01 GMT davidb [30946] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Correction to output JSON.bz2 name generated Wed, 26 Oct 2016 02:37:24 GMT davidb [30945] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Getting closer to writing out JSON files Wed, 26 Oct 2016 01:47:29 GMT davidb [30944] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Forcer higher partition (6) than default, which seems to be 2 Wed, 26 Oct 2016 01:39:12 GMT davidb [30943] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Extra debug info Wed, 26 Oct 2016 01:27:44 GMT davidb [30942] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Improved output printing for slave node Wed, 26 Oct 2016 01:16:25 GMT davidb [30941] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Moved to getFileSystemInstance() method to play nice on cluster Wed, 26 Oct 2016 01:01:01 GMT davidb [30940] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) Change to using URI not fileIn directly Wed, 26 Oct 2016 00:53:39 GMT davidb [30938] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (added) Experiment with using Hadoop's FileSystem class for local file:// access Wed, 26 Oct 2016 00:44:38 GMT davidb [30937] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Expanded set of ClusterFileIO methods Tue, 25 Oct 2016 22:05:28 GMT davidb [30934] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Providing json-filelist now a compulsory argument, rather than an option Tue, 25 Oct 2016 21:24:53 GMT davidb [30933] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) More careful parsing of file prefix Tue, 25 Oct 2016 21:16:06 GMT davidb [30932] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Support both file:// and hdfs:// Tue, 25 Oct 2016 10:49:13 GMT davidb [30924] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Tidy up of code. Removed commented out code Tue, 25 Oct 2016 10:23:08 GMT davidb [30921] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Code change to read in JSON file over HDFS Tue, 25 Oct 2016 01:49:36 GMT davidb [30918] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) More flexible command-line args Sat, 22 Oct 2016 03:28:03 GMT davidb [30898] * other-projects/hathitrust/solr-extracted-features/trunk/COMPILE.bat (added) * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bat (added) * other-projects/hathitrust/solr-extracted-features/trunk/pom.xml (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-1000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-SELECT-EVERY-1000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/src (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (added) Scripts for downloading sample JSON data from public domain extracted ...