|
|
@31011
|
8 years |
davidb |
Further RDD flatMap/map restructuring and refactoring, for per-page
|
|
|
@31010
|
8 years |
davidb |
Tidy up on generating Spark App name
|
|
|
@31009
|
8 years |
davidb |
Adjustments after latest fresh 'vagrant up' trial
|
|
|
@31008
|
8 years |
davidb |
Additional detail added into Spark app name
|
|
|
@31007
|
8 years |
davidb |
Class name refactoring
|
|
|
@31006
|
8 years |
davidb |
Further reversal of Base class. Switch to PerPage
|
|
|
@31005
|
8 years |
davidb |
Reversal of Base class in PerVolumeJSON
|
|
|
@31004
|
8 years |
davidb |
added debug
|
|
|
@31003
|
8 years |
davidb |
Explicity default constructors added
|
|
|
@31002
|
8 years |
davidb |
Need to separate flatMap and foreach calls in PagedJSON
|
|
|
@31001
|
8 years |
davidb |
Code to work per-volume and per-page
|
|
|
@31000
|
8 years |
davidb |
Class name refactoring
|
|
|
@30999
|
8 years |
davidb |
Class name refactoring
|
|
|
@30998
|
8 years |
davidb |
Class name refactoring
|
|
|
@30997
|
8 years |
davidb |
Verbosity control over printing
|
|
|
@30996
|
8 years |
davidb |
Code refactoring
|
|
|
@30995
|
8 years |
davidb |
Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
|
|
|
@30994
|
8 years |
davidb |
Additional useful links. Links open in new tab
|
|
|
@30993
|
8 years |
davidb |
Placeholder page to provide useful links to hadoop and solr cluster …
|
|
|
@30992
|
8 years |
davidb |
Additional adjustments after test run on cluster
|
|
|
@30991
|
8 years |
davidb |
Inital cut at README notes, and supporting links
|
|
|
@30990
|
8 years |
davidb |
opt name change
|
|
|
@30989
|
8 years |
davidb |
Changes to better suit EF set used with solr
|
|
|
@30988
|
8 years |
davidb |
Changed flag to 'read-only' and changed the filed name full text saved …
|
|
|
@30986
|
8 years |
davidb |
Debugging for double accumulator added
|
|
|
@30985
|
8 years |
davidb |
Changed to run main processing method as action rather than transform. …
|
|
|
@30984
|
8 years |
davidb |
Introduction of Spark accumulator to measure progress. Output of POST …
|
|
|
@30983
|
8 years |
davidb |
Useful helper script
|
|
|
@30982
|
8 years |
davidb |
Fixed to host_name for solr2 and solr3
|
|
|
@30981
|
8 years |
davidb |
Useful folder for 'on-the-side' packages
|
|
|
@30980
|
8 years |
davidb |
Code added to read response
|
|
|
@30979
|
8 years |
davidb |
_solr_url needs to be stored in class!
|
|
|
@30978
|
8 years |
davidb |
Additional debug statements
|
|
|
@30977
|
8 years |
davidb |
Only have RDD if an output directory was specified on the command-line …
|
|
|
@30976
|
8 years |
davidb |
Change to reflect changed order of command-line arguments
|
|
|
@30975
|
8 years |
davidb |
Introduction of new solr-url command line argument, leading to some …
|
|
|
@30974
|
8 years |
davidb |
update/add/doc JSON structure needed
|
|
|
@30973
|
8 years |
davidb |
Changed to saving Solr JSON file for debugging purposes
|
|
|
@30972
|
8 years |
davidb |
addition of useful command needed before re-running
|
|
|
@30971
|
8 years |
davidb |
Adding in post to Solr cloud. Changed text_t to _text_
|
|
|
@30970
|
8 years |
davidb |
Added in mapping of EF-JSON to Solr 'add' JSON format
|
|
|
@30969
|
8 years |
davidb |
Fine tuning resulting from testing the cloud/cluster
|
|
|
@30962
|
8 years |
davidb |
Corrections and improvements made after initial testing between …
|
|
|
@30960
|
8 years |
davidb |
Switch to using Puppet to provision machine. Strongly based on files …
|
|
|
@30957
|
8 years |
davidb |
No longer needed. (Local copy taken on Windows laptop.)
|
|
|
@30956
|
8 years |
davidb |
Initial commit of files for setting up with Vagrant a Solr cloud
|
|
|
@30953
|
8 years |
davidb |
Need to specify _output_dir as part of output JSON filename
|
|
|
@30952
|
8 years |
davidb |
Further text tidy up
|
|
|
@30951
|
8 years |
davidb |
Save a JSONObject as a file in the output directory
|
|
|
@30950
|
8 years |
davidb |
Tweak to text
|
|
|
@30949
|
8 years |
davidb |
Use better name than 'foo'. Further fix to JSON name generated
|
|
|
@30947
|
8 years |
davidb |
Correction to 'pages-' part of JSON.bz2 output filename used
|
|
|
@30946
|
8 years |
davidb |
Correction to output JSON.bz2 name generated
|
|
|
@30945
|
8 years |
davidb |
Getting closer to writing out JSON files
|
|
|
@30944
|
8 years |
davidb |
Forcer higher partition (6) than default, which seems to be 2
|
|
|
@30943
|
8 years |
davidb |
Extra debug info
|
|
|
@30942
|
8 years |
davidb |
Improved output printing for slave node
|
|
|
@30941
|
8 years |
davidb |
Moved to getFileSystemInstance() method to play nice on cluster
|
|
|
@30940
|
8 years |
davidb |
Change to using URI not fileIn directly
|
|
|
@30939
|
8 years |
davidb |
Minor tweaks
|
|
|
@30938
|
8 years |
davidb |
Experiment with using Hadoop's FileSystem class for local file:// access
|
|
|
@30937
|
8 years |
davidb |
Expanded set of ClusterFileIO methods
|
|
|
@30936
|
8 years |
davidb |
Refinement of Spark Monitor echo statements
|
|
|
@30935
|
8 years |
davidb |
Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
|
|
|
@30934
|
8 years |
davidb |
Providing json-filelist now a compulsory argument, rather than an option
|
|
|
@30933
|
8 years |
davidb |
More careful parsing of file prefix
|
|
|
@30932
|
8 years |
davidb |
Support both file:// and hdfs://
|
|
|
@30931
|
8 years |
davidb |
Version that runs using fil:// tested
|
|
|
@30930
|
8 years |
davidb |
Expansion of useful alias commands for Hadoop and Spark
|
|
|
@30929
|
8 years |
davidb |
Tweaks made while testing the script
|
|
|
@30928
|
8 years |
davidb |
Forgot to set json_filelist
|
|
|
@30927
|
8 years |
davidb |
Fixed silly typo in stdout redirect
|
|
|
@30926
|
8 years |
davidb |
Restructuring of RUN scripts to be more flexible
|
|
|
@30925
|
8 years |
davidb |
Improved instrutions
|
|
|
@30924
|
8 years |
davidb |
Tidy up of code. Removed commented out code
|
|
|
@30923
|
8 years |
davidb |
Rough cut version that reads in each JSON file over HDFS
|
|
|
@30922
|
8 years |
davidb |
Additional rough-cut notes
|
|
|
@30921
|
8 years |
davidb |
Code change to read in JSON file over HDFS
|
|
|
@30919
|
8 years |
davidb |
More consistent naming of folders used
|
|
|
@30918
|
8 years |
davidb |
More flexible command-line args
|
|
|
@30917
|
8 years |
davidb |
Changes resulting from a fresh run at provisioning, which yielded the …
|
|
|
@30916
|
8 years |
davidb |
Some additional details -- note form
|
|
|
@30915
|
8 years |
davidb |
Initial cut at instructions to follow to get code set up and running
|
|
|
@30914
|
8 years |
davidb |
Tidy up of setup description
|
|
|
@30913
|
8 years |
davidb |
Renaming to better represent what the cluster is designed for
|
|
|
@30912
|
8 years |
davidb |
Changed to Unix style line-endings
|
|
|
@30911
|
8 years |
davidb |
Changed name of input directory
|
|
|
@30910
|
8 years |
davidb |
Additional finesse added in as a result of further testing on Vagrant …
|
|
|
@30909
|
8 years |
davidb |
Additional finesse added in as a result of further testing on Vagrant …
|
|
|
@30908
|
8 years |
davidb |
Additional finesse added in as a result of further testing on Vagrant …
|
|
|
@30907
|
8 years |
davidb |
Name change to reflect need for 'bash' not 'sh'
|
|
|
@30906
|
8 years |
davidb |
Bash version of BAT script
|
|
|
@30905
|
8 years |
davidb |
Additional resources
|
|
|
@30904
|
8 years |
davidb |
Extra resource/links added
|
|
|
@30903
|
8 years |
davidb |
Vagrant provisioning files for a 4-node Hadoop cluster. See …
|
|
|
@30902
|
8 years |
davidb |
Details of what packages are needed
|
|
|
@30901
|
8 years |
davidb |
Template setup file
|
|
|
@30900
|
8 years |
davidb |
For support Java packages
|
|
|
@30899
|
8 years |
davidb |
Files for compilation using Eclipse
|
|
|
@30898
|
8 years |
davidb |
Scripts for downloading sample JSON data from public domain extracted …
|
|
|