# # ChangeLog for other-projects # # Generated by Trac 1.4.2 # 2024-03-29T01:53:12+13:00 Mon, 23 Jan 2017 07:37:32 GMT davidb [31335] * other-projects/hathitrust/wcsa/vol-checker/src/org/hathitrust/extractedfeatures/VolumeCheck.java (modified) Too expensive to hold pairtree filename in hashmap, so change to ... Mon, 23 Jan 2017 05:03:51 GMT davidb [31334] * other-projects/hathitrust/wcsa/vol-checker/src/org/hathitrust/extractedfeatures/VolumeCheck.java (modified) Initial cut at rsync download Mon, 23 Jan 2017 03:03:20 GMT davidb [31333] * other-projects/hathitrust/wcsa/vol-checker/WebContent/index.html (modified) Minor word tweak Mon, 23 Jan 2017 03:02:50 GMT davidb [31332] * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/lib/jetty-servlets.jar (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/lib/jetty-util-7.6.21.v20160908.jar (added) needed in Jetty CORS support Mon, 23 Jan 2017 03:01:35 GMT davidb [31331] * other-projects/hathitrust/wcsa/vol-checker/WebContent/HT-HTRC_Mashup.user.js (modified) * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/web.xml (modified) * other-projects/hathitrust/wcsa/vol-checker/WebContent/bookworm.png (added) Reworked to use CORS and $.ajax() so TamperMonkey doesn't interceed ... Sun, 22 Jan 2017 23:07:00 GMT davidb [31330] * other-projects/hathitrust/wcsa/vol-checker/WebContent/HT-HTRC_Mashup.user.js (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent/index.html (added) Initial cut a files that explain how to install the user-script Sun, 22 Jan 2017 21:30:58 GMT davidb [31329] * other-projects/hathitrust/wcsa/vol-checker/SETUP.bash.in (modified) Tweaks after testing INSTALL.sh Sun, 22 Jan 2017 21:30:45 GMT davidb [31328] * other-projects/hathitrust/wcsa/vol-checker/INSTALL.sh (added) Install the necessary files in the jetty webapps dir Sun, 22 Jan 2017 21:24:47 GMT davidb [31327] * other-projects/hathitrust/wcsa/vol-checker/DOWNLOAD-HTRC-VOL-LIST.sh (moved) name change to be more consistent Sun, 22 Jan 2017 21:20:57 GMT davidb [31326] * other-projects/hathitrust/wcsa/vol-checker/COMPILE.sh (modified) * other-projects/hathitrust/wcsa/vol-checker/DOWNLOAD-JETTY.sh (modified) Further tweaks Sun, 22 Jan 2017 21:20:32 GMT davidb [31325] * other-projects/hathitrust/wcsa/vol-checker/SETUP.bash.in (modified) Further tweaks Sun, 22 Jan 2017 21:14:46 GMT davidb [31324] * other-projects/hathitrust/wcsa/vol-checker/SETUP.bash.in (moved) More accurate name Sun, 22 Jan 2017 21:08:18 GMT davidb [31323] * other-projects/hathitrust/wcsa/vol-checker/DOWNLOAD-JETTY.sh (added) Download script plus setup instructions Sun, 22 Jan 2017 20:59:46 GMT davidb [31322] * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/classes (added) Location for the Java byte compiled code to link in with rest of servlet Sun, 22 Jan 2017 07:38:08 GMT davidb [31321] * other-projects/hathitrust/wcsa/vol-checker/GET-DATA.sh (added) * other-projects/hathitrust/wcsa/vol-checker/SETUP.sh.in (added) useful scripts Sun, 22 Jan 2017 03:13:35 GMT davidb [31320] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) build Document rather than parse JSON string Sun, 22 Jan 2017 02:19:50 GMT davidb [31319] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForMongoDBIngest.java (modified) Changed to replace existing MongoDB entry. Fixed up printt statement Sun, 22 Jan 2017 01:47:33 GMT davidb [31318] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) change to using contains() Sun, 22 Jan 2017 01:18:15 GMT davidb [31317] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) added debug statement Sat, 21 Jan 2017 11:26:50 GMT davidb [31316] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) fixed typo Sat, 21 Jan 2017 11:17:49 GMT davidb [31315] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) Further tweak Sat, 21 Jan 2017 11:14:48 GMT davidb [31314] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) Another go at avoiding concurrency update exception Sat, 21 Jan 2017 11:09:04 GMT davidb [31313] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) Alternative to avoid concurrency update exception Sat, 21 Jan 2017 10:57:09 GMT davidb [31312] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) MongoDB can't have 'period' and 'dollar' in key, as reserved characters Sat, 21 Jan 2017 08:43:12 GMT davidb [31311] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (modified) Processing print statement added Sat, 21 Jan 2017 08:18:02 GMT davidb [31310] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/pom.xml (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeMongoDBDocumentsMap.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForMongoDBIngest.java (added) Initial cut at files for working with MongoDB Fri, 20 Jan 2017 07:34:36 GMT davidb [31309] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/pom.xml (modified) Sparked MongoDB connector added Fri, 20 Jan 2017 07:33:39 GMT davidb [31308] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Minor tidy-up Fri, 20 Jan 2017 05:01:03 GMT davidb [31307] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-all.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-stop-all.sh (added) convenience scripts Fri, 20 Jan 2017 04:50:33 GMT davidb [31306] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-add-shards-to-routers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-init-collection.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-stop-routerservers.sh (modified) Final part of the mongodb shard puzzle -- router servers Fri, 20 Jan 2017 01:50:12 GMT davidb [31305] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-init-configservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-init-shardservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-setup-local-disk-all.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-configservers.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-routerservers.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-shardservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-stop-shardservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP.bash (modified) Next good commit point. Initial testing of shard replset scripts Thu, 19 Jan 2017 05:10:18 GMT davidb [31304] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-solr-setup-local-disk-all.sh (modified) Changes made whe (it turned out) the real source of the error was an ... Thu, 19 Jan 2017 05:07:02 GMT davidb [31303] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-configservers.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-routerservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-stop-configservers.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-stop-routerservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP.bash (modified) Adding in support to start and stop router server Thu, 19 Jan 2017 04:25:05 GMT davidb [31302] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-setup-local-disk-all.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-start-configservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-mongodb-stop-configservers.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP/setup-mongodb.bash (modified) Initial commit of scripts, after some testing, and subsequent changes ... Thu, 19 Jan 2017 04:22:54 GMT davidb [31301] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP/setup-solr.bash (modified) Fix for gsliscluster1 Thu, 19 Jan 2017 02:59:38 GMT davidb [31300] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP/setup-mongodb.bash (modified) Need to use NETWORK not PACKAGE Thu, 19 Jan 2017 02:49:30 GMT davidb [31299] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP.bash (modified) Additionally setup MongoDB Thu, 19 Jan 2017 02:48:04 GMT davidb [31298] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP/setup-mongodb.bash (added) Initial cut at setup file for MongoDB Thu, 19 Jan 2017 02:44:53 GMT davidb [31297] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/GET-PACKAGES-MONGODB.sh (added) Wed, 18 Jan 2017 04:54:56 GMT davidb [31296] * other-projects/hathitrust/wcsa/vol-checker/src/org/hathitrust/extractedfeatures/VolumeCheck.java (modified) Make loading in of ID file more portable Wed, 18 Jan 2017 04:54:24 GMT davidb [31295] * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/web.xml (modified) name change of webapp Tue, 17 Jan 2017 22:09:33 GMT davidb [31294] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeCatalogLangStreamFlatmap.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForCatalogLangCount.java (added) Version for language counting the catalog assignment language ... Tue, 10 Jan 2017 08:39:28 GMT davidb [31283] * other-projects/hathitrust/wcsa/vol-checker/COMPILE.sh (modified) Fixed typo Tue, 10 Jan 2017 08:29:00 GMT davidb [31282] * other-projects/hathitrust/wcsa/vol-checker/start.jar (added) Jetty jar-runable server Tue, 10 Jan 2017 08:25:12 GMT davidb [31281] * other-projects/hathitrust/wcsa/vol-checker/WebContent/META-INF (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent/META-INF/MANIFEST.MF (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/lib (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent/WEB-INF/web.xml (added) Tue, 10 Jan 2017 08:24:24 GMT davidb [31280] * other-projects/hathitrust/wcsa/vol-checker/build/classes (added) Tue, 10 Jan 2017 08:22:17 GMT davidb [31279] * other-projects/hathitrust/wcsa/vol-checker (added) * other-projects/hathitrust/wcsa/vol-checker/COMPILE.sh (added) * other-projects/hathitrust/wcsa/vol-checker/WebContent (added) * other-projects/hathitrust/wcsa/vol-checker/build (added) * other-projects/hathitrust/wcsa/vol-checker/jars (added) * other-projects/hathitrust/wcsa/vol-checker/jars/servlet-api.jar (added) * other-projects/hathitrust/wcsa/vol-checker/src (added) * other-projects/hathitrust/wcsa/vol-checker/src/org (added) * other-projects/hathitrust/wcsa/vol-checker/src/org/hathitrust (added) * other-projects/hathitrust/wcsa/vol-checker/src/org/hathitrust/extractedfeatures (added) * other-projects/hathitrust/wcsa/vol-checker/src/org/hathitrust/extractedfeatures/VolumeCheck.java (added) First cut at servlet Thu, 05 Jan 2017 21:04:54 GMT davidb [31278] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) To avoid null pointer on ids.iterator() Thu, 05 Jan 2017 12:56:42 GMT davidb [31277] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Tweak to minimum value Thu, 05 Jan 2017 11:04:53 GMT davidb [31276] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Min num partition guard put in Thu, 05 Jan 2017 10:44:10 GMT davidb [31275] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP.bash (modified) Changes to allow gc slave nodes to work with local disk versions of ... Thu, 05 Jan 2017 10:30:00 GMT davidb [31274] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Need to use JSONArray no JSONObject for a multifield item Thu, 05 Jan 2017 10:09:29 GMT davidb [31273] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeWordStreamFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Code moved to store fields for multilingual use using dynamic Solr ... Tue, 03 Jan 2017 20:37:56 GMT davidb [31272] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/JSONClusterFileIO.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForLangCount.java (modified) Use disk and memory to store main language RDD Wed, 28 Dec 2016 01:04:19 GMT davidb [31271] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumePOSStreamFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForLangCount.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForPOSCount.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Updating of POS code to new files-per-partition paramater, plus some ... Tue, 27 Dec 2016 21:36:17 GMT davidb [31270] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForLangCount.java (modified) Changed over to repartition approach Tue, 27 Dec 2016 21:30:08 GMT davidb [31269] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeLangStreamFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Some variable name changes, and printing tidy up Tue, 27 Dec 2016 05:54:09 GMT davidb [31268] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/scripts/FULL-RUN-MASTER-SPARK.sh (modified) Adjustments to memory allocation in response to test runs on 10% of ... Tue, 27 Dec 2016 05:52:41 GMT davidb [31267] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) Values trialed on gsliscluster1. Rekindling idea of per-vol processing Tue, 27 Dec 2016 05:51:42 GMT davidb [31266] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Rekindling of per-volume approach. Also some tweaking to verbosity ... Wed, 21 Dec 2016 00:47:56 GMT davidb [31264] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForLangCount.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForPOSCount.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Switching to 'long' in counts to allow higher number representation Wed, 21 Dec 2016 00:26:31 GMT davidb [31263] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForPOSCount.java (modified) Change to using long for higher word counts Tue, 20 Dec 2016 11:14:48 GMT davidb [31261] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForLangCount.java (modified) Overlooked changes from POS to lang Tue, 20 Dec 2016 11:12:10 GMT davidb [31260] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/scripts/FULL-RUN-MASTER-SPARK-LANG-COUNT.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeLangStreamFlatmap.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForLangCount.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForPOSCount.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Language counting Tue, 20 Dec 2016 10:45:28 GMT davidb [31259] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForPOSCount.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Lambda sort had wrong boolean arg to sort descending. Now fixed Tue, 20 Dec 2016 10:39:40 GMT davidb [31258] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/scripts/FULL-RUN-MASTER-SPARK-POS-COUNT.sh (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumePOSStreamFlatmap.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForPOSCount.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) POS Label count, similar to Whitelist word count Tue, 20 Dec 2016 03:52:52 GMT davidb [31257] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Fixed typo Tue, 20 Dec 2016 03:44:40 GMT davidb [31256] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Earlier check of output directory to prevent large scale processing, ... Tue, 20 Dec 2016 02:37:26 GMT davidb [31255] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Changed to using lambda functions Tue, 20 Dec 2016 02:29:56 GMT davidb [31254] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Experimenting with Lucene lowercase filter Tue, 20 Dec 2016 01:57:38 GMT davidb [31253] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) Identified a typo, and changed to being true anyway Tue, 20 Dec 2016 01:15:05 GMT davidb [31252] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeWordStreamFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Support for icu-tokenize property added, plus relevant refactoring. Mon, 19 Dec 2016 02:13:52 GMT davidb [31251] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Code tidy up. Timed experiment showed sorting by key with ... Mon, 19 Dec 2016 02:03:27 GMT davidb [31250] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Minor mods Sun, 18 Dec 2016 07:38:59 GMT davidb [31247] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Change sort order. Pick better output directory name Sun, 18 Dec 2016 05:25:02 GMT davidb [31246] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (modified) Experimenting with sorting Sun, 18 Dec 2016 04:18:13 GMT davidb [31245] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Refactored so processing of words from TokenPosCount now done by the ... Sun, 18 Dec 2016 03:57:05 GMT davidb [31244] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Tidy up Sat, 17 Dec 2016 04:25:08 GMT davidb [31243] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/pom.xml (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Experimenting with Lucene/Solr's ICU tokenizer Sat, 17 Dec 2016 02:53:23 GMT davidb [31242] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeWordStreamFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Method name refactor Tue, 13 Dec 2016 22:29:31 GMT davidb [31235] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP/setup-solr.bash (modified) More fine-grained testing to help nema setup Tue, 13 Dec 2016 22:20:57 GMT davidb [31234] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP.bash (modified) More selective control of what to source/setup depending on hostname Tue, 13 Dec 2016 22:12:29 GMT davidb [31233] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP.bash (modified) Changes to operate on nema as well as gsliscluster1 and gc0-9 Tue, 13 Dec 2016 22:11:00 GMT davidb [31232] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/CONF/nema-state.json (added) Hand edited version of state.json from gsliscluster1 suitable for ... Tue, 13 Dec 2016 09:41:54 GMT davidb [31231] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/nema-solr-start-all.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-solr-check-local-shardsize-all.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-solr-rsync2nema-local-shard-all.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-solr-setup-local-disk-all.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SCRIPTS/remote-solr-start-all.sh (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/gslis-cluster/SETUP/setup-solr.bash (modified) Changes to allow SOLR to run on nodes in /hdfsd05/dbbridge/solr-ef Tue, 13 Dec 2016 01:02:01 GMT davidb [31228] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (modified) Change to see if code can be made more unified. If so, then ... Tue, 13 Dec 2016 01:00:15 GMT davidb [31227] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Code tidy up Tue, 13 Dec 2016 00:53:48 GMT davidb [31226] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) Fixed bloom test for init Tue, 13 Dec 2016 00:46:23 GMT davidb [31225] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) Relocated bloomfilter creation to within call() method, so done on ... Mon, 12 Dec 2016 10:30:27 GMT davidb [31224] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Debug added Mon, 12 Dec 2016 10:28:08 GMT davidb [31223] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (modified) Exception printStackTrace Mon, 12 Dec 2016 10:22:33 GMT davidb [31222] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Changed to using ClusterFileIO supporting methods Mon, 12 Dec 2016 07:20:25 GMT davidb [31221] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) Missing argument added in Mon, 12 Dec 2016 07:18:04 GMT davidb [31220] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Use of whitelist Bloom filter added to words going into Solr index Mon, 12 Dec 2016 07:12:02 GMT ak19 [31219] * other-projects/nightly-tasks/diffcol/trunk/model-collect/Demo-Lucene/etc/oai-inf-tmp.gdb (added) Forgot to add to model-collect with previous commit. Mon, 12 Dec 2016 06:06:08 GMT ak19 [31217] * other-projects/nightly-tasks/diffcol/trunk/model-collect/Associated-Files/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/CDS-ISIS/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Customization/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/DSpace-To-GS/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Demo-MGPP/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Demo-Section-Tagging/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/MARC-Exploded/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/MARC-Singlefile/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/METS/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Multimedia/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/OAI-Local/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/PDFBox/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Scanned-Img-Advanced/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Scanned-Img-Basic/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Simple-Image/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Small-HTML/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Tudor-Basic/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Tudor-Enhanced/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Web-Tudor/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Word-PDF-Basic/etc/oai-inf-tmp.gdb (added) * other-projects/nightly-tasks/diffcol/trunk/model-collect/Word-PDF-Formatting/etc/oai-inf-tmp.gdb (added) Adding the new oai-inf.db files, created by rebuilding the model ... Mon, 12 Dec 2016 04:12:56 GMT davidb [31215] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Changed back to Guava 20 API, now mvn shading allows me to have this ... Mon, 12 Dec 2016 04:08:51 GMT davidb [31214] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/com (deleted) Not needed now using mvn shading Mon, 12 Dec 2016 04:08:06 GMT davidb [31213] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/pom.xml (modified) Tidy up Mon, 12 Dec 2016 04:06:50 GMT davidb [31212] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/COMPILE.bash (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/pom.xml (modified) Changed from mvn assemblhy to shadowing, which has more control