|
|
@31044
|
7 years |
davidb |
Fixed up error when output_dir is empty
|
|
|
@31043
|
7 years |
davidb |
Version for processing full EF set
|
|
|
@31042
|
7 years |
davidb |
Name changes, preparing the way for FULL-RUN versions
|
|
|
@31041
|
7 years |
davidb |
Test needs to be more careful if -read-only specified
|
|
|
@31036
|
7 years |
davidb |
Renaming to prepare way for YARN version of script
|
|
|
@31035
|
7 years |
davidb |
Changes after testing scripts
|
|
|
@31034
|
7 years |
davidb |
Development of scripts for working with Full EF dataset
|
|
|
@31033
|
7 years |
davidb |
Development of scripts for working with Full EF dataset
|
|
|
@31030
|
7 years |
davidb |
Tweak to some verbosity level 2 printing
|
|
|
@31029
|
7 years |
davidb |
Newline at end of file added
|
|
|
@31028
|
7 years |
davidb |
Support for randonly choosing Solr endpoints added in
|
|
|
@31027
|
7 years |
davidb |
Mixed typo in property name used
|
|
|
@31026
|
7 years |
davidb |
Corrected flag setting
|
|
|
@31025
|
7 years |
davidb |
Use property process-json-mode to determine which sort of Spark …
|
|
|
@31024
|
7 years |
davidb |
Support for Java properties file
|
|
|
@31022
|
7 years |
davidb |
No longer used
|
|
|
@31021
|
7 years |
davidb |
Folder restructure to remove 'trunk' part
|
|
|
@31020
|
7 years |
davidb |
No longer used
|
|
|
@31019
|
7 years |
davidb |
Part 2 or two-step folder restructure
|
|
|
@31018
|
7 years |
davidb |
Part 1 or two-step folder restructure
|
|
|
@31017
|
7 years |
davidb |
Moved to correct position
|
|
|
@31016
|
7 years |
davidb |
No longer used
|
|
|
@31015
|
7 years |
davidb |
Restructuring of projects into one
|
|
|
@31013
|
7 years |
davidb |
Accumulator for PerPageMap
|
|
|
@31011
|
7 years |
davidb |
Further RDD flatMap/map restructuring and refactoring, for per-page
|
|
|
@31010
|
7 years |
davidb |
Tidy up on generating Spark App name
|
|
|
@31009
|
7 years |
davidb |
Adjustments after latest fresh 'vagrant up' trial
|
|
|
@31008
|
7 years |
davidb |
Additional detail added into Spark app name
|
|
|
@31007
|
7 years |
davidb |
Class name refactoring
|
|
|
@31006
|
7 years |
davidb |
Further reversal of Base class. Switch to PerPage
|
|
|
@31005
|
7 years |
davidb |
Reversal of Base class in PerVolumeJSON
|
|
|
@31004
|
7 years |
davidb |
added debug
|
|
|
@31003
|
7 years |
davidb |
Explicity default constructors added
|
|
|
@31002
|
7 years |
davidb |
Need to separate flatMap and foreach calls in PagedJSON
|
|
|
@31001
|
7 years |
davidb |
Code to work per-volume and per-page
|
|
|
@31000
|
7 years |
davidb |
Class name refactoring
|
|
|
@30999
|
7 years |
davidb |
Class name refactoring
|
|
|
@30998
|
7 years |
davidb |
Class name refactoring
|
|
|
@30997
|
7 years |
davidb |
Verbosity control over printing
|
|
|
@30996
|
7 years |
davidb |
Code refactoring
|
|
|
@30995
|
7 years |
davidb |
Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
|
|
|
@30994
|
7 years |
davidb |
Additional useful links. Links open in new tab
|
|
|
@30993
|
7 years |
davidb |
Placeholder page to provide useful links to hadoop and solr cluster …
|
|
|
@30992
|
7 years |
davidb |
Additional adjustments after test run on cluster
|
|
|
@30991
|
7 years |
davidb |
Inital cut at README notes, and supporting links
|
|
|
@30990
|
7 years |
davidb |
opt name change
|
|
|
@30989
|
7 years |
davidb |
Changes to better suit EF set used with solr
|
|
|
@30988
|
7 years |
davidb |
Changed flag to 'read-only' and changed the filed name full text saved …
|
|
|
@30986
|
7 years |
davidb |
Debugging for double accumulator added
|
|
|
@30985
|
7 years |
davidb |
Changed to run main processing method as action rather than transform. …
|
|
|
@30984
|
7 years |
davidb |
Introduction of Spark accumulator to measure progress. Output of POST …
|
|
|
@30983
|
7 years |
davidb |
Useful helper script
|
|
|
@30982
|
7 years |
davidb |
Fixed to host_name for solr2 and solr3
|
|
|
@30981
|
7 years |
davidb |
Useful folder for 'on-the-side' packages
|
|
|
@30980
|
7 years |
davidb |
Code added to read response
|
|
|
@30979
|
7 years |
davidb |
_solr_url needs to be stored in class!
|
|
|
@30978
|
7 years |
davidb |
Additional debug statements
|
|
|
@30977
|
7 years |
davidb |
Only have RDD if an output directory was specified on the command-line …
|
|
|
@30976
|
7 years |
davidb |
Change to reflect changed order of command-line arguments
|
|
|
@30975
|
7 years |
davidb |
Introduction of new solr-url command line argument, leading to some …
|
|
|
@30974
|
7 years |
davidb |
update/add/doc JSON structure needed
|
|
|
@30973
|
7 years |
davidb |
Changed to saving Solr JSON file for debugging purposes
|
|
|
@30972
|
7 years |
davidb |
addition of useful command needed before re-running
|
|
|
@30971
|
7 years |
davidb |
Adding in post to Solr cloud. Changed text_t to _text_
|
|
|
@30970
|
7 years |
davidb |
Added in mapping of EF-JSON to Solr 'add' JSON format
|
|
|
@30969
|
7 years |
davidb |
Fine tuning resulting from testing the cloud/cluster
|
|
|
@30962
|
7 years |
davidb |
Corrections and improvements made after initial testing between …
|
|
|
@30960
|
7 years |
davidb |
Switch to using Puppet to provision machine. Strongly based on files …
|
|
|
@30957
|
8 years |
davidb |
No longer needed. (Local copy taken on Windows laptop.)
|
|
|
@30956
|
8 years |
davidb |
Initial commit of files for setting up with Vagrant a Solr cloud
|
|
|
@30953
|
8 years |
davidb |
Need to specify _output_dir as part of output JSON filename
|
|
|
@30952
|
8 years |
davidb |
Further text tidy up
|
|
|
@30951
|
8 years |
davidb |
Save a JSONObject as a file in the output directory
|
|
|
@30950
|
8 years |
davidb |
Tweak to text
|
|
|
@30949
|
8 years |
davidb |
Use better name than 'foo'. Further fix to JSON name generated
|
|
|
@30947
|
8 years |
davidb |
Correction to 'pages-' part of JSON.bz2 output filename used
|
|
|
@30946
|
8 years |
davidb |
Correction to output JSON.bz2 name generated
|
|
|
@30945
|
8 years |
davidb |
Getting closer to writing out JSON files
|
|
|
@30944
|
8 years |
davidb |
Forcer higher partition (6) than default, which seems to be 2
|
|
|
@30943
|
8 years |
davidb |
Extra debug info
|
|
|
@30942
|
8 years |
davidb |
Improved output printing for slave node
|
|
|
@30941
|
8 years |
davidb |
Moved to getFileSystemInstance() method to play nice on cluster
|
|
|
@30940
|
8 years |
davidb |
Change to using URI not fileIn directly
|
|
|
@30939
|
8 years |
davidb |
Minor tweaks
|
|
|
@30938
|
8 years |
davidb |
Experiment with using Hadoop's FileSystem class for local file:// access
|
|
|
@30937
|
8 years |
davidb |
Expanded set of ClusterFileIO methods
|
|
|
@30936
|
8 years |
davidb |
Refinement of Spark Monitor echo statements
|
|
|
@30935
|
8 years |
davidb |
Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
|
|
|
@30934
|
8 years |
davidb |
Providing json-filelist now a compulsory argument, rather than an option
|
|
|
@30933
|
8 years |
davidb |
More careful parsing of file prefix
|
|
|
@30932
|
8 years |
davidb |
Support both file:// and hdfs://
|
|
|
@30931
|
8 years |
davidb |
Version that runs using fil:// tested
|
|
|
@30930
|
8 years |
davidb |
Expansion of useful alias commands for Hadoop and Spark
|
|
|
@30929
|
8 years |
davidb |
Tweaks made while testing the script
|
|
|
@30928
|
8 years |
davidb |
Forgot to set json_filelist
|
|
|
@30927
|
8 years |
davidb |
Fixed silly typo in stdout redirect
|
|
|
@30926
|
8 years |
davidb |
Restructuring of RUN scripts to be more flexible
|
|
|
@30925
|
8 years |
davidb |
Improved instrutions
|
|
|
@30924
|
8 years |
davidb |
Tidy up of code. Removed commented out code
|
|
|
@30923
|
8 years |
davidb |
Rough cut version that reads in each JSON file over HDFS
|
|
|