Context Navigation

View Latest Revision

extractedfeatures

Legend:

: Added
: Modified
: Copied or renamed

	Rev	Age	Author	Log Message
(edit)	@31272	7 years	davidb	Use disk and memory to store main language RDD
(edit)	@31271	7 years	davidb	Updating of POS code to new files-per-partition paramater, plus some …
(edit)	@31270	7 years	davidb	Changed over to repartition approach
(edit)	@31269	7 years	davidb	Some variable name changes, and printing tidy up
(edit)	@31266	7 years	davidb	Rekindling of per-volume approach. Also some tweaking to verbosity …
(edit)	@31264	7 years	davidb	Switching to 'long' in counts to allow higher number representation
(edit)	@31263	7 years	davidb	Change to using long for higher word counts
(edit)	@31261	7 years	davidb	Overlooked changes from POS to lang
(edit)	@31260	7 years	davidb	Language counting
(edit)	@31259	7 years	davidb	Lambda sort had wrong boolean arg to sort descending. Now fixed
(edit)	@31258	7 years	davidb	POS Label count, similar to Whitelist word count
(edit)	@31257	7 years	davidb	Fixed typo
(edit)	@31256	7 years	davidb	Earlier check of output directory to prevent large scale processing, …
(edit)	@31255	7 years	davidb	Changed to using lambda functions
(edit)	@31254	7 years	davidb	Experimenting with Lucene lowercase filter
(edit)	@31252	7 years	davidb	Support for icu-tokenize property added, plus relevant refactoring.
(edit)	@31251	7 years	davidb	Code tidy up. Timed experiment showed sorting by key with …
(edit)	@31250	7 years	davidb	Minor mods
(edit)	@31247	7 years	davidb	Change sort order. Pick better output directory name
(edit)	@31246	7 years	davidb	Experimenting with sorting
(edit)	@31245	7 years	davidb	Refactored so processing of words from TokenPosCount now done by the …
(edit)	@31244	7 years	davidb	Tidy up
(edit)	@31243	7 years	davidb	Experimenting with Lucene/Solr's ICU tokenizer
(edit)	@31242	7 years	davidb	Method name refactor
(edit)	@31228	7 years	davidb	Change to see if code can be made more unified. If so, then …
(edit)	@31227	7 years	davidb	Code tidy up
(edit)	@31226	7 years	davidb	Fixed bloom test for init
(edit)	@31225	7 years	davidb	Relocated bloomfilter creation to within call() method, so done on the …
(edit)	@31224	7 years	davidb	Debug added
(edit)	@31223	7 years	davidb	Exception printStackTrace
(edit)	@31222	7 years	davidb	Changed to using ClusterFileIO supporting methods
(edit)	@31221	7 years	davidb	Missing argument added in
(edit)	@31220	7 years	davidb	Use of whitelist Bloom filter added to words going into Solr index
(edit)	@31215	7 years	davidb	Changed back to Guava 20 API, now mvn shading allows me to have this …
(edit)	@31211	7 years	davidb	Changing back to regular Guava classes. Looking to use maven shading …
(edit)	@31204	7 years	davidb	Splicing in Guava verion 20 of BloomFilter into code as own class (now …
(edit)	@31203	7 years	davidb	Use class provided stringFunnel
(edit)	@31202	7 years	davidb	Turns out Spark uses Guava 14.0 not 20.0. Additional code to fill in …
(edit)	@31201	7 years	davidb	Trigger serialization of whitelist in main program
(edit)	@31200	7 years	davidb	Better output statement
(edit)	@31199	7 years	davidb	Renaming of classname to reflect filename rename
(edit)	@31198	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31197	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31196	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31195	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31194	7 years	davidb	Serialize in and out methods added
(edit)	@31176	7 years	davidb	Support added for producing whitelist word count
(edit)	@31175	7 years	davidb	Trial to find memory difference betwen Hashmap and Bloom filters
(edit)	@31100	7 years	davidb	Change to using solr-cloud-nodes that include port number
(edit)	@31096	7 years	davidb	Only need to create a volume's pages output directory is _output_dir …
(edit)	@31095	7 years	davidb	Introduced num-partitions property
(edit)	@31091	7 years	davidb	Change of number of core for 'gsliscluster1' machine; commmented out …
(edit)	@31090	7 years	davidb	Memory monitor debugging code, commented out
(edit)	@31089	7 years	davidb	Change in way the JSON file is read in. Motivation was a …
(edit)	@31088	7 years	davidb	Shift to newIstance for FileSystem due to StackOverflow page …
(edit)	@31045	7 years	davidb	More careful treatment of what to do when a JSON file isn't there
(edit)	@31041	7 years	davidb	Test needs to be more careful if -read-only specified
(edit)	@31030	7 years	davidb	Tweak to some verbosity level 2 printing
(edit)	@31028	7 years	davidb	Support for randonly choosing Solr endpoints added in
(edit)	@31027	7 years	davidb	Mixed typo in property name used
(edit)	@31026	7 years	davidb	Corrected flag setting
(edit)	@31025	7 years	davidb	Use property process-json-mode to determine which sort of Spark …
(edit)	@31024	7 years	davidb	Support for Java properties file
(copy)	@31015	7 years	davidb	Restructuring of projects into one
copied from other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/extractedfeatures
(edit)	@31013	7 years	davidb	Accumulator for PerPageMap

Note: See TracRevisionLog for help on using the revision log.

Download in other formats: