|
|
@31013
|
7 years |
davidb |
Accumulator for PerPageMap
|
|
|
@31011
|
7 years |
davidb |
Further RDD flatMap/map restructuring and refactoring, for per-page
|
|
|
@31010
|
7 years |
davidb |
Tidy up on generating Spark App name
|
|
|
@31008
|
7 years |
davidb |
Additional detail added into Spark app name
|
|
|
@31007
|
7 years |
davidb |
Class name refactoring
|
|
|
@31006
|
7 years |
davidb |
Further reversal of Base class. Switch to PerPage
|
|
|
@31005
|
7 years |
davidb |
Reversal of Base class in PerVolumeJSON
|
|
|
@31004
|
7 years |
davidb |
added debug
|
|
|
@31003
|
7 years |
davidb |
Explicity default constructors added
|
|
|
@31002
|
7 years |
davidb |
Need to separate flatMap and foreach calls in PagedJSON
|
|
|
@31001
|
7 years |
davidb |
Code to work per-volume and per-page
|
|
|
@30998
|
7 years |
davidb |
Class name refactoring
|
|
|
@30997
|
7 years |
davidb |
Verbosity control over printing
|
|
|
@30996
|
7 years |
davidb |
Code refactoring
|
|
|
@30995
|
7 years |
davidb |
Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
|
|
|
@30990
|
7 years |
davidb |
opt name change
|
|
|
@30988
|
7 years |
davidb |
Changed flag to 'read-only' and changed the filed name full text saved …
|
|
|
@30986
|
7 years |
davidb |
Debugging for double accumulator added
|
|
|
@30985
|
7 years |
davidb |
Changed to run main processing method as action rather than transform. …
|
|
|
@30984
|
7 years |
davidb |
Introduction of Spark accumulator to measure progress. Output of POST …
|
|
|
@30980
|
7 years |
davidb |
Code added to read response
|
|
|
@30979
|
7 years |
davidb |
_solr_url needs to be stored in class!
|
|
|
@30978
|
7 years |
davidb |
Additional debug statements
|
|
|
@30977
|
7 years |
davidb |
Only have RDD if an output directory was specified on the command-line …
|
|
|
@30976
|
7 years |
davidb |
Change to reflect changed order of command-line arguments
|
|
|
@30975
|
7 years |
davidb |
Introduction of new solr-url command line argument, leading to some …
|
|
|
@30974
|
7 years |
davidb |
update/add/doc JSON structure needed
|
|
|
@30973
|
7 years |
davidb |
Changed to saving Solr JSON file for debugging purposes
|
|
|
@30971
|
7 years |
davidb |
Adding in post to Solr cloud. Changed text_t to _text_
|
|
|
@30970
|
7 years |
davidb |
Added in mapping of EF-JSON to Solr 'add' JSON format
|
|
|
@30953
|
7 years |
davidb |
Need to specify _output_dir as part of output JSON filename
|
|
|
@30951
|
7 years |
davidb |
Save a JSONObject as a file in the output directory
|
|
|
@30949
|
7 years |
davidb |
Use better name than 'foo'. Further fix to JSON name generated
|
|
|
@30947
|
7 years |
davidb |
Correction to 'pages-' part of JSON.bz2 output filename used
|
|
|
@30946
|
7 years |
davidb |
Correction to output JSON.bz2 name generated
|
|
|
@30945
|
7 years |
davidb |
Getting closer to writing out JSON files
|
|
|
@30944
|
7 years |
davidb |
Forcer higher partition (6) than default, which seems to be 2
|
|
|
@30943
|
7 years |
davidb |
Extra debug info
|
|
|
@30942
|
7 years |
davidb |
Improved output printing for slave node
|
|
|
@30941
|
7 years |
davidb |
Moved to getFileSystemInstance() method to play nice on cluster
|
|
|
@30940
|
7 years |
davidb |
Change to using URI not fileIn directly
|
|
|
@30938
|
7 years |
davidb |
Experiment with using Hadoop's FileSystem class for local file:// access
|
|
|
@30937
|
7 years |
davidb |
Expanded set of ClusterFileIO methods
|
|
|
@30934
|
7 years |
davidb |
Providing json-filelist now a compulsory argument, rather than an option
|
|
|
@30933
|
7 years |
davidb |
More careful parsing of file prefix
|
|
|
@30932
|
7 years |
davidb |
Support both file:// and hdfs://
|
|
|
@30924
|
7 years |
davidb |
Tidy up of code. Removed commented out code
|
|
|
@30921
|
7 years |
davidb |
Code change to read in JSON file over HDFS
|
|
|
@30918
|
7 years |
davidb |
More flexible command-line args
|
|
|
@30898
|
7 years |
davidb |
Scripts for downloading sample JSON data from public domain extracted …
|