|
|
@31299
|
7 years |
davidb |
Additionally setup MongoDB
|
|
|
@31298
|
7 years |
davidb |
Initial cut at setup file for MongoDB
|
|
|
@31297
|
7 years |
davidb |
|
|
|
@31296
|
7 years |
davidb |
Make loading in of ID file more portable
|
|
|
@31295
|
7 years |
davidb |
name change of webapp
|
|
|
@31294
|
7 years |
davidb |
Version for language counting the catalog assignment language …
|
|
|
@31283
|
7 years |
davidb |
Fixed typo
|
|
|
@31282
|
7 years |
davidb |
Jetty jar-runable server
|
|
|
@31281
|
7 years |
davidb |
|
|
|
@31280
|
7 years |
davidb |
|
|
|
@31279
|
7 years |
davidb |
First cut at servlet
|
|
|
@31278
|
7 years |
davidb |
To avoid null pointer on ids.iterator()
|
|
|
@31277
|
7 years |
davidb |
Tweak to minimum value
|
|
|
@31276
|
7 years |
davidb |
Min num partition guard put in
|
|
|
@31275
|
7 years |
davidb |
Changes to allow gc slave nodes to work with local disk versions of …
|
|
|
@31274
|
7 years |
davidb |
Need to use JSONArray no JSONObject for a multifield item
|
|
|
@31273
|
7 years |
davidb |
Code moved to store fields for multilingual use using dynamic Solr …
|
|
|
@31272
|
7 years |
davidb |
Use disk and memory to store main language RDD
|
|
|
@31271
|
7 years |
davidb |
Updating of POS code to new files-per-partition paramater, plus some …
|
|
|
@31270
|
7 years |
davidb |
Changed over to repartition approach
|
|
|
@31269
|
7 years |
davidb |
Some variable name changes, and printing tidy up
|
|
|
@31268
|
7 years |
davidb |
Adjustments to memory allocation in response to test runs on 10% of dataset
|
|
|
@31267
|
7 years |
davidb |
Values trialed on gsliscluster1. Rekindling idea of per-vol processing
|
|
|
@31266
|
7 years |
davidb |
Rekindling of per-volume approach. Also some tweaking to verbosity …
|
|
|
@31264
|
7 years |
davidb |
Switching to 'long' in counts to allow higher number representation
|
|
|
@31263
|
7 years |
davidb |
Change to using long for higher word counts
|
|
|
@31261
|
7 years |
davidb |
Overlooked changes from POS to lang
|
|
|
@31260
|
7 years |
davidb |
Language counting
|
|
|
@31259
|
7 years |
davidb |
Lambda sort had wrong boolean arg to sort descending. Now fixed
|
|
|
@31258
|
7 years |
davidb |
POS Label count, similar to Whitelist word count
|
|
|
@31257
|
7 years |
davidb |
Fixed typo
|
|
|
@31256
|
7 years |
davidb |
Earlier check of output directory to prevent large scale processing, …
|
|
|
@31255
|
7 years |
davidb |
Changed to using lambda functions
|
|
|
@31254
|
7 years |
davidb |
Experimenting with Lucene lowercase filter
|
|
|
@31253
|
7 years |
davidb |
Identified a typo, and changed to being true anyway
|
|
|
@31252
|
7 years |
davidb |
Support for icu-tokenize property added, plus relevant refactoring.
|
|
|
@31251
|
7 years |
davidb |
Code tidy up. Timed experiment showed sorting by key with …
|
|
|
@31250
|
7 years |
davidb |
Minor mods
|
|
|
@31247
|
7 years |
davidb |
Change sort order. Pick better output directory name
|
|
|
@31246
|
7 years |
davidb |
Experimenting with sorting
|
|
|
@31245
|
7 years |
davidb |
Refactored so processing of words from TokenPosCount now done by the …
|
|
|
@31244
|
7 years |
davidb |
Tidy up
|
|
|
@31243
|
7 years |
davidb |
Experimenting with Lucene/Solr's ICU tokenizer
|
|
|
@31242
|
7 years |
davidb |
Method name refactor
|
|
|
@31235
|
7 years |
davidb |
More fine-grained testing to help nema setup
|
|
|
@31234
|
7 years |
davidb |
More selective control of what to source/setup depending on hostname
|
|
|
@31233
|
7 years |
davidb |
Changes to operate on nema as well as gsliscluster1 and gc0-9
|
|
|
@31232
|
7 years |
davidb |
Hand edited version of state.json from gsliscluster1 suitable for …
|
|
|
@31231
|
7 years |
davidb |
Changes to allow SOLR to run on nodes in /hdfsd05/dbbridge/solr-ef
|
|
|
@31228
|
7 years |
davidb |
Change to see if code can be made more unified. If so, then …
|
|
|
@31227
|
7 years |
davidb |
Code tidy up
|
|
|
@31226
|
7 years |
davidb |
Fixed bloom test for init
|
|
|
@31225
|
7 years |
davidb |
Relocated bloomfilter creation to within call() method, so done on the …
|
|
|
@31224
|
7 years |
davidb |
Debug added
|
|
|
@31223
|
7 years |
davidb |
Exception printStackTrace
|
|
|
@31222
|
7 years |
davidb |
Changed to using ClusterFileIO supporting methods
|
|
|
@31221
|
7 years |
davidb |
Missing argument added in
|
|
|
@31220
|
7 years |
davidb |
Use of whitelist Bloom filter added to words going into Solr index
|
|
|
@31215
|
7 years |
davidb |
Changed back to Guava 20 API, now mvn shading allows me to have this …
|
|
|
@31214
|
7 years |
davidb |
Not needed now using mvn shading
|
|
|
@31213
|
7 years |
davidb |
Tidy up
|
|
|
@31212
|
7 years |
davidb |
Changed from mvn assemblhy to shadowing, which has more control
|
|
|
@31211
|
7 years |
davidb |
Changing back to regular Guava classes. Looking to use maven shading …
|
|
|
@31209
|
7 years |
davidb |
checkArgument added in
|
|
|
@31207
|
7 years |
davidb |
And some more tweaking
|
|
|
@31206
|
7 years |
davidb |
More tweaking of Guava cloned code
|
|
|
@31205
|
7 years |
davidb |
Next added in part of new Guava code
|
|
|
@31204
|
7 years |
davidb |
Splicing in Guava verion 20 of BloomFilter into code as own class (now …
|
|
|
@31203
|
7 years |
davidb |
Use class provided stringFunnel
|
|
|
@31202
|
7 years |
davidb |
Turns out Spark uses Guava 14.0 not 20.0. Additional code to fill in …
|
|
|
@31201
|
7 years |
davidb |
Trigger serialization of whitelist in main program
|
|
|
@31200
|
7 years |
davidb |
Better output statement
|
|
|
@31199
|
7 years |
davidb |
Renaming of classname to reflect filename rename
|
|
|
@31198
|
7 years |
davidb |
File renaming to make way for newer version of classes needed in the …
|
|
|
@31197
|
7 years |
davidb |
File renaming to make way for newer version of classes needed in the …
|
|
|
@31196
|
7 years |
davidb |
File renaming to make way for newer version of classes needed in the …
|
|
|
@31195
|
7 years |
davidb |
File renaming to make way for newer version of classes needed in the …
|
|
|
@31194
|
7 years |
davidb |
Serialize in and out methods added
|
|
|
@31193
|
7 years |
davidb |
Peter's white-list file
|
|
|
@31184
|
7 years |
davidb |
New provision to run different main classes in _RUN.sh; New top-level …
|
|
|
@31183
|
7 years |
davidb |
Bump up to project using Java 1.8
|
|
|
@31177
|
7 years |
davidb |
Adding in Google jar that supports Bloom filters
|
|
|
@31176
|
7 years |
davidb |
Support added for producing whitelist word count
|
|
|
@31175
|
7 years |
davidb |
Trial to find memory difference betwen Hashmap and Bloom filters
|
|
|
@31174
|
7 years |
davidb |
One of the last scripts developed for getting ef dataset into HDFS
|
|
|
@31173
|
7 years |
davidb |
individual file sizes per top-level folder
|
|
|
@31172
|
7 years |
davidb |
to help track down missing files in HDFS copy
|
|
|
@31171
|
7 years |
davidb |
Util to help find where missing files are
|
|
|
@31170
|
7 years |
davidb |
Targetted sub-dir copy
|
|
|
@31169
|
7 years |
davidb |
Improved logic
|
|
|
@31161
|
7 years |
davidb |
Comparison of local disk version with HDFS version
|
|
|
@31152
|
7 years |
davidb |
Development of script
|
|
|
@31151
|
7 years |
davidb |
More nuanced version to help finish off the 'big put'
|
|
|
@31128
|
7 years |
davidb |
Some scripts to help with pushing and monitoring the progress of the …
|
|
|
@31112
|
7 years |
davidb |
To move out shards saved in /tmp on gsliscluter1 nodes to nema
|
|
|
@31106
|
7 years |
davidb |
Scripts to help run an rsync'd copy of gslistcluster1 …
|
|
|
@31105
|
7 years |
davidb |
Additional scripts to help with running solr locally out of /tmp area
|
|
|
@31104
|
7 years |
davidb |
now configurable to be run from local disk (/tmp)
|
|
|
@31103
|
7 years |
davidb |
Changes made after testing with 20 solr nodes
|
|
|
@31102
|
7 years |
davidb |
Command line way of running a Solr test query
|
|
|