# # ChangeLog for other-projects/hathitrust/solr-extracted-features # # Generated by Trac 1.4.2 # 2024-04-27T02:55:17+12:00 Sat, 29 Oct 2016 03:57:17 GMT davidb [30986] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Debugging for double accumulator added Sat, 29 Oct 2016 03:17:22 GMT davidb [30985] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Changed to run main processing method as action rather than ... Sat, 29 Oct 2016 02:45:38 GMT davidb [30984] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Introduction of Spark accumulator to measure progress. Output of ... Fri, 28 Oct 2016 02:15:20 GMT davidb [30980] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Code added to read response Fri, 28 Oct 2016 01:44:21 GMT davidb [30979] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) _solr_url needs to be stored in class! Fri, 28 Oct 2016 01:40:30 GMT davidb [30978] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Additional debug statements Fri, 28 Oct 2016 01:35:52 GMT davidb [30977] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Only have RDD if an output directory was specified on the command- ... Fri, 28 Oct 2016 01:32:25 GMT davidb [30976] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Change to reflect changed order of command-line arguments Fri, 28 Oct 2016 01:28:51 GMT davidb [30975] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-CLUSTER.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Introduction of new solr-url command line argument, leading to some ... Fri, 28 Oct 2016 00:47:10 GMT davidb [30974] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) update/add/doc JSON structure needed Thu, 27 Oct 2016 22:53:02 GMT davidb [30973] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Changed to saving Solr JSON file for debugging purposes Thu, 27 Oct 2016 22:46:53 GMT davidb [30972] * other-projects/hathitrust/solr-extracted-features/trunk/README.txt (modified) addition of useful command needed before re-running Thu, 27 Oct 2016 22:35:48 GMT davidb [30971] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Adding in post to Solr cloud. Changed text_t to _text_ Thu, 27 Oct 2016 22:10:32 GMT davidb [30970] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Added in mapping of EF-JSON to Solr 'add' JSON format Wed, 26 Oct 2016 09:58:55 GMT davidb [30957] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bat (deleted) No longer needed. (Local copy taken on Windows laptop.) Wed, 26 Oct 2016 05:00:53 GMT davidb [30953] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Need to specify _output_dir as part of output JSON filename Wed, 26 Oct 2016 05:00:21 GMT davidb [30952] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Further text tidy up Wed, 26 Oct 2016 04:54:44 GMT davidb [30951] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Save a JSONObject as a file in the output directory Wed, 26 Oct 2016 04:41:02 GMT davidb [30950] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Tweak to text Wed, 26 Oct 2016 04:40:49 GMT davidb [30949] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Use better name than 'foo'. Further fix to JSON name generated Wed, 26 Oct 2016 04:24:44 GMT davidb [30947] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Correction to 'pages-' part of JSON.bz2 output filename used Wed, 26 Oct 2016 02:47:01 GMT davidb [30946] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Correction to output JSON.bz2 name generated Wed, 26 Oct 2016 02:37:24 GMT davidb [30945] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Getting closer to writing out JSON files Wed, 26 Oct 2016 01:47:29 GMT davidb [30944] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Forcer higher partition (6) than default, which seems to be 2 Wed, 26 Oct 2016 01:39:12 GMT davidb [30943] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Extra debug info Wed, 26 Oct 2016 01:27:44 GMT davidb [30942] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Improved output printing for slave node Wed, 26 Oct 2016 01:16:25 GMT davidb [30941] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Moved to getFileSystemInstance() method to play nice on cluster Wed, 26 Oct 2016 01:01:01 GMT davidb [30940] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (modified) Change to using URI not fileIn directly Wed, 26 Oct 2016 00:57:19 GMT davidb [30939] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Minor tweaks Wed, 26 Oct 2016 00:53:39 GMT davidb [30938] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/ClusterFileIO.java (added) Experiment with using Hadoop's FileSystem class for local file:// access Wed, 26 Oct 2016 00:44:38 GMT davidb [30937] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Expanded set of ClusterFileIO methods Tue, 25 Oct 2016 22:11:45 GMT davidb [30936] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Refinement of Spark Monitor echo statements Tue, 25 Oct 2016 22:09:17 GMT davidb [30935] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec Tue, 25 Oct 2016 22:05:28 GMT davidb [30934] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) Providing json-filelist now a compulsory argument, rather than an option Tue, 25 Oct 2016 21:24:53 GMT davidb [30933] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) More careful parsing of file prefix Tue, 25 Oct 2016 21:16:06 GMT davidb [30932] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Support both file:// and hdfs:// Tue, 25 Oct 2016 20:58:45 GMT davidb [30931] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-LOCAL.bash (modified) Version that runs using fil:// tested Tue, 25 Oct 2016 20:47:36 GMT davidb [30929] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-CLUSTER.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Tweaks made while testing the script Tue, 25 Oct 2016 20:15:31 GMT davidb [30928] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-CLUSTER.bash (modified) Forgot to set json_filelist Tue, 25 Oct 2016 20:12:42 GMT davidb [30927] * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (modified) Fixed silly typo in stdout redirect Tue, 25 Oct 2016 20:09:03 GMT davidb [30926] * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-CLUSTER.bash (added) * other-projects/hathitrust/solr-extracted-features/trunk/RUN-PD-LOCAL.bash (added) * other-projects/hathitrust/solr-extracted-features/trunk/_RUN.bash (moved) Restructuring of RUN scripts to be more flexible Tue, 25 Oct 2016 10:49:36 GMT davidb [30925] * other-projects/hathitrust/solr-extracted-features/trunk/README.txt (modified) Improved instrutions Tue, 25 Oct 2016 10:49:13 GMT davidb [30924] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Tidy up of code. Removed commented out code Tue, 25 Oct 2016 10:28:22 GMT davidb [30923] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (modified) Rough cut version that reads in each JSON file over HDFS Tue, 25 Oct 2016 10:27:28 GMT davidb [30922] * other-projects/hathitrust/solr-extracted-features/trunk/README.txt (modified) Additional rough-cut notes Tue, 25 Oct 2016 10:23:08 GMT davidb [30921] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) Code change to read in JSON file over HDFS Tue, 25 Oct 2016 01:52:52 GMT davidb [30919] * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-1000.sh (modified) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-10000.sh (modified) More consistent naming of folders used Tue, 25 Oct 2016 01:49:36 GMT davidb [30918] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (modified) More flexible command-line args Tue, 25 Oct 2016 00:44:43 GMT davidb [30916] * other-projects/hathitrust/solr-extracted-features/trunk/README.txt (modified) Some additional details -- note form Tue, 25 Oct 2016 00:32:09 GMT davidb [30915] * other-projects/hathitrust/solr-extracted-features/trunk/README.txt (added) Initial cut at instructions to follow to get code set up and running Mon, 24 Oct 2016 02:55:06 GMT davidb [30912] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (modified) Changed to Unix style line-endings Mon, 24 Oct 2016 02:54:23 GMT davidb [30911] * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (modified) Changed name of input directory Mon, 24 Oct 2016 02:40:31 GMT davidb [30910] * other-projects/hathitrust/solr-extracted-features/trunk/packages/SETUP.bash.in (modified) Additional finesse added in as a result of further testing on Vagrant ... Mon, 24 Oct 2016 02:39:48 GMT davidb [30909] * other-projects/hathitrust/solr-extracted-features/trunk/COMPILE.bash (added) * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bash (added) * other-projects/hathitrust/solr-extracted-features/trunk/pom.xml (modified) Additional finesse added in as a result of further testing on Vagrant ... Mon, 24 Oct 2016 02:39:07 GMT davidb [30908] * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-1000.sh (modified) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-10000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-GET-FULL-FILE-LIST.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-SELECT-EVERY-10000.sh (added) Additional finesse added in as a result of further testing on Vagrant ... Sun, 23 Oct 2016 09:08:45 GMT davidb [30907] * other-projects/hathitrust/solr-extracted-features/trunk/packages/SETUP.bash.in (moved) Name change to reflect need for 'bash' not 'sh' Sun, 23 Oct 2016 09:05:23 GMT davidb [30906] * other-projects/hathitrust/solr-extracted-features/trunk/packages/SETUP.sh.in (added) Bash version of BAT script Sat, 22 Oct 2016 06:06:02 GMT davidb [30902] * other-projects/hathitrust/solr-extracted-features/trunk/packages/GET-PACKAGES.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/packages/README.txt (added) Details of what packages are needed Sat, 22 Oct 2016 03:33:04 GMT davidb [30901] * other-projects/hathitrust/solr-extracted-features/trunk/packages/SETUP.bat.in (added) Template setup file Sat, 22 Oct 2016 03:30:55 GMT davidb [30900] * other-projects/hathitrust/solr-extracted-features/trunk/packages (added) For support Java packages Sat, 22 Oct 2016 03:29:25 GMT davidb [30899] * other-projects/hathitrust/solr-extracted-features/trunk/.classpath (added) * other-projects/hathitrust/solr-extracted-features/trunk/.project (added) * other-projects/hathitrust/solr-extracted-features/trunk/.settings (added) * other-projects/hathitrust/solr-extracted-features/trunk/.settings/org.eclipse.jdt.core.prefs (added) * other-projects/hathitrust/solr-extracted-features/trunk/.settings/org.eclipse.m2e.core.prefs (added) Files for compilation using Eclipse Sat, 22 Oct 2016 03:28:03 GMT davidb [30898] * other-projects/hathitrust/solr-extracted-features/trunk/COMPILE.bat (added) * other-projects/hathitrust/solr-extracted-features/trunk/RUN.bat (added) * other-projects/hathitrust/solr-extracted-features/trunk/pom.xml (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-DOWNLOAD-EVERY-1000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/scripts/PD-SELECT-EVERY-1000.sh (added) * other-projects/hathitrust/solr-extracted-features/trunk/src (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PagedJSON.java (added) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java (added) Scripts for downloading sample JSON data from public domain extracted ... Sat, 22 Oct 2016 03:26:03 GMT davidb [30897] * other-projects/hathitrust/solr-extracted-features (added) * other-projects/hathitrust/solr-extracted-features/trunk (added) Sub-project for converted HTRC Extract Feature dataset into a form ...