Context Navigation

trunk

Legend:

: Added
: Modified
: Copied or renamed

	Rev	Age	Author	Log Message
(edit)	@31278	7 years	davidb	To avoid null pointer on ids.iterator()
(edit)	@31277	7 years	davidb	Tweak to minimum value
(edit)	@31276	7 years	davidb	Min num partition guard put in
(edit)	@31275	7 years	davidb	Changes to allow gc slave nodes to work with local disk versions of …
(edit)	@31274	7 years	davidb	Need to use JSONArray no JSONObject for a multifield item
(edit)	@31273	7 years	davidb	Code moved to store fields for multilingual use using dynamic Solr …
(edit)	@31272	7 years	davidb	Use disk and memory to store main language RDD
(edit)	@31271	7 years	davidb	Updating of POS code to new files-per-partition paramater, plus some …
(edit)	@31270	7 years	davidb	Changed over to repartition approach
(edit)	@31269	7 years	davidb	Some variable name changes, and printing tidy up
(edit)	@31268	7 years	davidb	Adjustments to memory allocation in response to test runs on 10% of dataset
(edit)	@31267	7 years	davidb	Values trialed on gsliscluster1. Rekindling idea of per-vol processing
(edit)	@31266	7 years	davidb	Rekindling of per-volume approach. Also some tweaking to verbosity …
(edit)	@31264	7 years	davidb	Switching to 'long' in counts to allow higher number representation
(edit)	@31263	7 years	davidb	Change to using long for higher word counts
(edit)	@31261	7 years	davidb	Overlooked changes from POS to lang
(edit)	@31260	7 years	davidb	Language counting
(edit)	@31259	7 years	davidb	Lambda sort had wrong boolean arg to sort descending. Now fixed
(edit)	@31258	7 years	davidb	POS Label count, similar to Whitelist word count
(edit)	@31257	7 years	davidb	Fixed typo
(edit)	@31256	7 years	davidb	Earlier check of output directory to prevent large scale processing, …
(edit)	@31255	7 years	davidb	Changed to using lambda functions
(edit)	@31254	7 years	davidb	Experimenting with Lucene lowercase filter
(edit)	@31253	7 years	davidb	Identified a typo, and changed to being true anyway
(edit)	@31252	7 years	davidb	Support for icu-tokenize property added, plus relevant refactoring.
(edit)	@31251	7 years	davidb	Code tidy up. Timed experiment showed sorting by key with …
(edit)	@31250	7 years	davidb	Minor mods
(edit)	@31247	7 years	davidb	Change sort order. Pick better output directory name
(edit)	@31246	7 years	davidb	Experimenting with sorting
(edit)	@31245	7 years	davidb	Refactored so processing of words from TokenPosCount now done by the …
(edit)	@31244	7 years	davidb	Tidy up
(edit)	@31243	7 years	davidb	Experimenting with Lucene/Solr's ICU tokenizer
(edit)	@31242	7 years	davidb	Method name refactor
(edit)	@31235	7 years	davidb	More fine-grained testing to help nema setup
(edit)	@31234	7 years	davidb	More selective control of what to source/setup depending on hostname
(edit)	@31233	7 years	davidb	Changes to operate on nema as well as gsliscluster1 and gc0-9
(edit)	@31232	7 years	davidb	Hand edited version of state.json from gsliscluster1 suitable for …
(edit)	@31231	7 years	davidb	Changes to allow SOLR to run on nodes in /hdfsd05/dbbridge/solr-ef
(edit)	@31228	7 years	davidb	Change to see if code can be made more unified. If so, then …
(edit)	@31227	7 years	davidb	Code tidy up
(edit)	@31226	7 years	davidb	Fixed bloom test for init
(edit)	@31225	7 years	davidb	Relocated bloomfilter creation to within call() method, so done on the …
(edit)	@31224	7 years	davidb	Debug added
(edit)	@31223	7 years	davidb	Exception printStackTrace
(edit)	@31222	7 years	davidb	Changed to using ClusterFileIO supporting methods
(edit)	@31221	7 years	davidb	Missing argument added in
(edit)	@31220	7 years	davidb	Use of whitelist Bloom filter added to words going into Solr index
(edit)	@31215	7 years	davidb	Changed back to Guava 20 API, now mvn shading allows me to have this …
(edit)	@31214	7 years	davidb	Not needed now using mvn shading
(edit)	@31213	7 years	davidb	Tidy up
(edit)	@31212	7 years	davidb	Changed from mvn assemblhy to shadowing, which has more control
(edit)	@31211	7 years	davidb	Changing back to regular Guava classes. Looking to use maven shading …
(edit)	@31209	7 years	davidb	checkArgument added in
(edit)	@31207	7 years	davidb	And some more tweaking
(edit)	@31206	7 years	davidb	More tweaking of Guava cloned code
(edit)	@31205	7 years	davidb	Next added in part of new Guava code
(edit)	@31204	7 years	davidb	Splicing in Guava verion 20 of BloomFilter into code as own class (now …
(edit)	@31203	7 years	davidb	Use class provided stringFunnel
(edit)	@31202	7 years	davidb	Turns out Spark uses Guava 14.0 not 20.0. Additional code to fill in …
(edit)	@31201	7 years	davidb	Trigger serialization of whitelist in main program
(edit)	@31200	7 years	davidb	Better output statement
(edit)	@31199	7 years	davidb	Renaming of classname to reflect filename rename
(edit)	@31198	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31197	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31196	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31195	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31194	7 years	davidb	Serialize in and out methods added
(edit)	@31193	7 years	davidb	Peter's white-list file
(edit)	@31184	7 years	davidb	New provision to run different main classes in _RUN.sh; New top-level …
(edit)	@31183	7 years	davidb	Bump up to project using Java 1.8
(edit)	@31177	7 years	davidb	Adding in Google jar that supports Bloom filters
(edit)	@31176	7 years	davidb	Support added for producing whitelist word count
(edit)	@31175	7 years	davidb	Trial to find memory difference betwen Hashmap and Bloom filters
(edit)	@31174	7 years	davidb	One of the last scripts developed for getting ef dataset into HDFS
(edit)	@31173	7 years	davidb	individual file sizes per top-level folder
(edit)	@31172	7 years	davidb	to help track down missing files in HDFS copy
(edit)	@31171	7 years	davidb	Util to help find where missing files are
(edit)	@31170	7 years	davidb	Targetted sub-dir copy
(edit)	@31169	7 years	davidb	Improved logic
(edit)	@31161	7 years	davidb	Comparison of local disk version with HDFS version
(edit)	@31152	7 years	davidb	Development of script
(edit)	@31151	7 years	davidb	More nuanced version to help finish off the 'big put'
(edit)	@31128	7 years	davidb	Some scripts to help with pushing and monitoring the progress of the …
(edit)	@31112	7 years	davidb	To move out shards saved in /tmp on gsliscluter1 nodes to nema
(edit)	@31106	7 years	davidb	Scripts to help run an rsync'd copy of gslistcluster1 …
(edit)	@31105	7 years	davidb	Additional scripts to help with running solr locally out of /tmp area
(edit)	@31104	7 years	davidb	now configurable to be run from local disk (/tmp)
(edit)	@31103	7 years	davidb	Changes made after testing with 20 solr nodes
(edit)	@31102	7 years	davidb	Command line way of running a Solr test query
(edit)	@31101	7 years	davidb	Correction to collection name
(edit)	@31100	7 years	davidb	Change to using solr-cloud-nodes that include port number
(edit)	@31099	7 years	davidb	Changes resulting from test runs to get Zookeeper and Solr running on …
(edit)	@31098	7 years	davidb	Changes resulting from test runs to get Zookeeper and Solr running on …
(edit)	@31097	7 years	davidb	Changed to .in style namne
(edit)	@31096	7 years	davidb	Only need to create a volume's pages output directory is _output_dir …
(edit)	@31095	7 years	davidb	Introduced num-partitions property
(edit)	@31094	7 years	davidb	Changes triggered by running on gsliscluster1
(edit)	@31093	7 years	davidb	Changes triggered by running on gsliscluster1
(edit)	@31092	7 years	davidb	Minor tweak to spark/hadoop combination downloaded
(edit)	@31091	7 years	davidb	Change of number of core for 'gsliscluster1' machine; commmented out …

Note: See TracRevisionLog for help on using the revision log.

Download in other formats: