root/other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @31246 [31246] 2 years davidb Experimenting with sorting
(edit) @31245 [31245] 2 years davidb Refactored so processing of words from TokenPosCount? now done by the same …
(edit) @31244 [31244] 2 years davidb Tidy up
(edit) @31243 [31243] 2 years davidb Experimenting with Lucene/Solr's ICU tokenizer
(edit) @31242 [31242] 2 years davidb Method name refactor
(edit) @31235 [31235] 2 years davidb More fine-grained testing to help nema setup
(edit) @31234 [31234] 2 years davidb More selective control of what to source/setup depending on hostname
(edit) @31233 [31233] 2 years davidb Changes to operate on nema as well as gsliscluster1 and gc0-9
(edit) @31232 [31232] 2 years davidb Hand edited version of state.json from gsliscluster1 suitable for running …
(edit) @31231 [31231] 2 years davidb Changes to allow SOLR to run on nodes in /hdfsd05/dbbridge/solr-ef
(edit) @31228 [31228] 2 years davidb Change to see if code can be made more unified. If so, then …
(edit) @31227 [31227] 2 years davidb Code tidy up
(edit) @31226 [31226] 2 years davidb Fixed bloom test for init
(edit) @31225 [31225] 2 years davidb Relocated bloomfilter creation to within call() method, so done on the …
(edit) @31224 [31224] 2 years davidb Debug added
(edit) @31223 [31223] 2 years davidb Exception printStackTrace
(edit) @31222 [31222] 2 years davidb Changed to using ClusterFileIO supporting methods
(edit) @31221 [31221] 2 years davidb Missing argument added in
(edit) @31220 [31220] 2 years davidb Use of whitelist Bloom filter added to words going into Solr index
(edit) @31219 [31219] 2 years ak19 Forgot to add to model-collect with previous commit.
(edit) @31217 [31217] 2 years ak19 Adding the new oai-inf.db files, created by rebuilding the model …
(edit) @31215 [31215] 2 years davidb Changed back to Guava 20 API, now mvn shading allows me to have this in …
(edit) @31214 [31214] 2 years davidb Not needed now using mvn shading
(edit) @31213 [31213] 2 years davidb Tidy up
(edit) @31212 [31212] 2 years davidb Changed from mvn assemblhy to shadowing, which has more control
(edit) @31211 [31211] 2 years davidb Changing back to regular Guava classes. Looking to use maven shading to …
(edit) @31209 [31209] 2 years davidb checkArgument added in
(edit) @31207 [31207] 2 years davidb And some more tweaking
(edit) @31206 [31206] 2 years davidb More tweaking of Guava cloned code
(edit) @31205 [31205] 2 years davidb Next added in part of new Guava code
(edit) @31204 [31204] 2 years davidb Splicing in Guava verion 20 of BloomFilter? into code as own class (now …
(edit) @31203 [31203] 2 years davidb Use class provided stringFunnel
(edit) @31202 [31202] 2 years davidb Turns out Spark uses Guava 14.0 not 20.0. Additional code to fill in some …
(edit) @31201 [31201] 2 years davidb Trigger serialization of whitelist in main program
(edit) @31200 [31200] 2 years davidb Better output statement
(edit) @31199 [31199] 2 years davidb Renaming of classname to reflect filename rename
(edit) @31198 [31198] 2 years davidb File renaming to make way for newer version of classes needed in the main …
(edit) @31197 [31197] 2 years davidb File renaming to make way for newer version of classes needed in the main …
(edit) @31196 [31196] 2 years davidb File renaming to make way for newer version of classes needed in the main …
(edit) @31195 [31195] 2 years davidb File renaming to make way for newer version of classes needed in the main …
(edit) @31194 [31194] 2 years davidb Serialize in and out methods added
(edit) @31193 [31193] 2 years davidb Peter's white-list file
(edit) @31184 [31184] 2 years davidb New provision to run different main classes in _RUN.sh; New top-level …
(edit) @31183 [31183] 2 years davidb Bump up to project using Java 1.8
(edit) @31177 [31177] 2 years davidb Adding in Google jar that supports Bloom filters
(edit) @31176 [31176] 2 years davidb Support added for producing whitelist word count
(edit) @31175 [31175] 2 years davidb Trial to find memory difference betwen Hashmap and Bloom filters
(edit) @31174 [31174] 2 years davidb One of the last scripts developed for getting ef dataset into HDFS
(edit) @31173 [31173] 2 years davidb individual file sizes per top-level folder
(edit) @31172 [31172] 2 years davidb to help track down missing files in HDFS copy
(edit) @31171 [31171] 2 years davidb Util to help find where missing files are
(edit) @31170 [31170] 2 years davidb Targetted sub-dir copy
(edit) @31169 [31169] 2 years davidb Improved logic
(edit) @31161 [31161] 2 years davidb Comparison of local disk version with HDFS version
(edit) @31152 [31152] 2 years davidb Development of script
(edit) @31151 [31151] 2 years davidb More nuanced version to help finish off the 'big put'
(edit) @31128 [31128] 2 years davidb Some scripts to help with pushing and monitoring the progress of the put …
(edit) @31112 [31112] 2 years davidb To move out shards saved in /tmp on gsliscluter1 nodes to nema
(edit) @31106 [31106] 2 years davidb Scripts to help run an rsync'd copy of gslistcluster1 /tmp/gcX-solr-shard …
(edit) @31105 [31105] 2 years davidb Additional scripts to help with running solr locally out of /tmp area
(edit) @31104 [31104] 2 years davidb now configurable to be run from local disk (/tmp)
(edit) @31103 [31103] 2 years davidb Changes made after testing with 20 solr nodes
(edit) @31102 [31102] 2 years davidb Command line way of running a Solr test query
(edit) @31101 [31101] 2 years davidb Correction to collection name
(edit) @31100 [31100] 2 years davidb Change to using solr-cloud-nodes that include port number
(edit) @31099 [31099] 2 years davidb Changes resulting from test runs to get Zookeeper and Solr running on …
(edit) @31098 [31098] 2 years davidb Changes resulting from test runs to get Zookeeper and Solr running on …
(edit) @31097 [31097] 2 years davidb Changed to .in style namne
(edit) @31096 [31096] 2 years davidb Only need to create a volume's pages output directory is _output_dir has …
(edit) @31095 [31095] 2 years davidb Introduced num-partitions property
(edit) @31094 [31094] 2 years davidb Changes triggered by running on gsliscluster1
(edit) @31093 [31093] 2 years davidb Changes triggered by running on gsliscluster1
(edit) @31092 [31092] 2 years davidb Minor tweak to spark/hadoop combination downloaded
(edit) @31091 [31091] 2 years davidb Change of number of core for 'gsliscluster1' machine; commmented out …
(edit) @31090 [31090] 2 years davidb Memory monitor debugging code, commented out
(edit) @31089 [31089] 2 years davidb Change in way the JSON file is read in. Motivation was a out-of-memory …
(edit) @31088 [31088] 2 years davidb Shift to newIstance for FileSystem? due to StackOverflow? page describing …
(edit) @31082 [31082] 2 years davidb Changes in response to testing on gchead
(edit) @31081 [31081] 2 years davidb Going live with generation of spark slaves file
(edit) @31080 [31080] 2 years davidb echo formatting tidy up. Fixed some typos
(edit) @31079 [31079] 2 years davidb Useful get started scripts
(edit) @31078 [31078] 2 years davidb Some setup files and scripts to make running Spark and Solr easier on the …
(edit) @31077 [31077] 2 years davidb Move up to JDK1.8. Tidy up of Vagrant machine names. Support for YARN. …
(edit) @31065 [31065] 2 years davidb Additional echo output
(edit) @31062 [31062] 2 years davidb Added in -W option so check-sum calculation is skipped
(edit) @31058 [31058] 2 years davidb echo for additional information added
(edit) @31057 [31057] 2 years davidb Tweak to jps output formatting
(edit) @31053 [31053] 2 years davidb Addition of second argument, optional, for where to save the files
(edit) @31051 [31051] 2 years davidb Added in JDK to list of possible packages needed
(edit) @31046 [31046] 2 years davidb Added property to control how severe a JSON IO problem is
(edit) @31045 [31045] 2 years davidb More careful treatment of what to do when a JSON file isn't there
(edit) @31044 [31044] 2 years davidb Fixed up error when output_dir is empty
(edit) @31043 [31043] 2 years davidb Version for processing full EF set
(edit) @31042 [31042] 2 years davidb Name changes, preparing the way for FULL-RUN versions
(edit) @31041 [31041] 2 years davidb Test needs to be more careful if -read-only specified
(edit) @31036 [31036] 2 years davidb Renaming to prepare way for YARN version of script
(edit) @31035 [31035] 2 years davidb Changes after testing scripts
(edit) @31034 [31034] 2 years davidb Development of scripts for working with Full EF dataset
(edit) @31033 [31033] 2 years davidb Development of scripts for working with Full EF dataset
(edit) @31030 [31030] 2 years davidb Tweak to some verbosity level 2 printing
Note: See TracRevisionLog for help on using the revision log.