# # ChangeLog for other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust # # Generated by Trac 1.4.2 # 2024-06-02T00:56:18+12:00 Mon, 12 Dec 2016 07:20:25 GMT davidb [31221] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) Missing argument added in Mon, 12 Dec 2016 07:18:04 GMT davidb [31220] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Use of whitelist Bloom filter added to words going into Solr index Mon, 12 Dec 2016 04:12:56 GMT davidb [31215] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Changed back to Guava 20 API, now mvn shading allows me to have this ... Mon, 12 Dec 2016 03:01:59 GMT davidb [31211] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Changing back to regular Guava classes. Looking to use maven shading ... Mon, 12 Dec 2016 01:28:20 GMT davidb [31204] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/com (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/com/google (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/com/google/common (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/com/google/common/hash (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/com/google/common/hash/BloomFilterAdvanced.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Splicing in Guava verion 20 of BloomFilter into code as own class ... Mon, 12 Dec 2016 00:57:01 GMT davidb [31203] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Use class provided stringFunnel Mon, 12 Dec 2016 00:53:06 GMT davidb [31202] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Turns out Spark uses Guava 14.0 not 20.0. Additional code to fill in ... Sun, 11 Dec 2016 21:35:42 GMT davidb [31201] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (added) Trigger serialization of whitelist in main program Sun, 11 Dec 2016 21:35:05 GMT davidb [31200] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistBloomFilter.java (modified) Better output statement Sun, 11 Dec 2016 21:04:55 GMT davidb [31199] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistBloomFilter.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistDictionaryMain.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistHashmap.java (modified) Renaming of classname to reflect filename rename Sun, 11 Dec 2016 21:03:20 GMT davidb [31198] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistHashmap.java (moved) File renaming to make way for newer version of classes needed in the ... Sun, 11 Dec 2016 21:02:37 GMT davidb [31197] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/DictionaryWhitelist.java (deleted) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (deleted) File renaming to make way for newer version of classes needed in the ... Sun, 11 Dec 2016 21:01:30 GMT davidb [31196] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistBloomFilter.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TestWhitelistDictionaryMain.java (added) File renaming to make way for newer version of classes needed in the ... Sun, 11 Dec 2016 21:00:08 GMT davidb [31195] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/TESTWhitelistHashmap.java (moved) File renaming to make way for newer version of classes needed in the ... Sun, 11 Dec 2016 20:51:07 GMT davidb [31194] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (modified) Serialize in and out methods added Sat, 03 Dec 2016 08:16:38 GMT davidb [31176] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java (modified) Support added for producing whitelist word count Sat, 03 Dec 2016 08:15:52 GMT davidb [31175] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/DictionaryWhitelist.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeWordStreamFlatmap.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistBloomFilter.java (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/WhitelistHashmap.java (added) Trial to find memory difference betwen Hashmap and Bloom filters Thu, 10 Nov 2016 09:58:19 GMT davidb [31100] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Change to using solr-cloud-nodes that include port number Thu, 10 Nov 2016 06:25:14 GMT davidb [31096] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerVolumeJSON.java (modified) Only need to create a volume's pages output directory is _output_dir ... Thu, 10 Nov 2016 05:58:06 GMT davidb [31095] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Introduced num-partitions property Thu, 10 Nov 2016 03:15:30 GMT davidb [31091] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Change of number of core for 'gsliscluster1' machine; commmented out ... Thu, 10 Nov 2016 03:14:21 GMT davidb [31090] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) Memory monitor debugging code, commented out Thu, 10 Nov 2016 03:13:12 GMT davidb [31089] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/JSONClusterFileIO.java (modified) Change in way the JSON file is read in. Motivation was a out-of- ... Thu, 10 Nov 2016 03:09:55 GMT davidb [31088] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ClusterFileIO.java (modified) Shift to newIstance for FileSystem due to StackOverflow page ... Wed, 02 Nov 2016 08:34:47 GMT davidb [31045] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/JSONClusterFileIO.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONMap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) More careful treatment of what to do when a JSON file isn't there Wed, 02 Nov 2016 07:07:40 GMT davidb [31041] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Test needs to be more careful if -read-only specified Wed, 02 Nov 2016 01:28:39 GMT davidb [31030] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONFlatmap.java (modified) Tweak to some verbosity level 2 printing Wed, 02 Nov 2016 01:17:45 GMT davidb [31028] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/_RUN.bash (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONMap.java (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Support for randonly choosing Solr endpoints added in Wed, 02 Nov 2016 00:06:15 GMT davidb [31027] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Mixed typo in property name used Wed, 02 Nov 2016 00:01:16 GMT davidb [31026] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Corrected flag setting Tue, 01 Nov 2016 22:59:37 GMT davidb [31025] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (modified) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Use property process-json-mode to determine which sort of Spark ... Tue, 01 Nov 2016 22:37:07 GMT davidb [31024] * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Support for Java properties file Tue, 01 Nov 2016 01:06:05 GMT davidb [31015] * other-projects/hathitrust/wcsa (added) * other-projects/hathitrust/wcsa/extracted-features-solr (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk (added) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest (moved) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/vagrant-solr-cluster (moved) * other-projects/hathitrust/wcsa/extracted-features-solr/trunk/vagrant-spark-hdfs-cluster (moved) * other-projects/hathitrust/wcsa/extracted-features-solr/web-portal (moved) Restructuring of projects into one Mon, 31 Oct 2016 07:51:39 GMT davidb [31013] * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/PerPageJSONMap.java (modified) * other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures/ProcessForSolrIngest.java (modified) Accumulator for PerPageMap