|
|
@30972
|
7 years |
davidb |
addition of useful command needed before re-running
|
|
|
@30971
|
7 years |
davidb |
Adding in post to Solr cloud. Changed text_t to _text_
|
|
|
@30970
|
7 years |
davidb |
Added in mapping of EF-JSON to Solr 'add' JSON format
|
|
|
@30957
|
8 years |
davidb |
No longer needed. (Local copy taken on Windows laptop.)
|
|
|
@30953
|
8 years |
davidb |
Need to specify _output_dir as part of output JSON filename
|
|
|
@30952
|
8 years |
davidb |
Further text tidy up
|
|
|
@30951
|
8 years |
davidb |
Save a JSONObject as a file in the output directory
|
|
|
@30950
|
8 years |
davidb |
Tweak to text
|
|
|
@30949
|
8 years |
davidb |
Use better name than 'foo'. Further fix to JSON name generated
|
|
|
@30947
|
8 years |
davidb |
Correction to 'pages-' part of JSON.bz2 output filename used
|
|
|
@30946
|
8 years |
davidb |
Correction to output JSON.bz2 name generated
|
|
|
@30945
|
8 years |
davidb |
Getting closer to writing out JSON files
|
|
|
@30944
|
8 years |
davidb |
Forcer higher partition (6) than default, which seems to be 2
|
|
|
@30943
|
8 years |
davidb |
Extra debug info
|
|
|
@30942
|
8 years |
davidb |
Improved output printing for slave node
|
|
|
@30941
|
8 years |
davidb |
Moved to getFileSystemInstance() method to play nice on cluster
|
|
|
@30940
|
8 years |
davidb |
Change to using URI not fileIn directly
|
|
|
@30939
|
8 years |
davidb |
Minor tweaks
|
|
|
@30938
|
8 years |
davidb |
Experiment with using Hadoop's FileSystem class for local file:// access
|
|
|
@30937
|
8 years |
davidb |
Expanded set of ClusterFileIO methods
|
|
|
@30936
|
8 years |
davidb |
Refinement of Spark Monitor echo statements
|
|
|
@30935
|
8 years |
davidb |
Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
|
|
|
@30934
|
8 years |
davidb |
Providing json-filelist now a compulsory argument, rather than an option
|
|
|
@30933
|
8 years |
davidb |
More careful parsing of file prefix
|
|
|
@30932
|
8 years |
davidb |
Support both file:// and hdfs://
|
|
|
@30931
|
8 years |
davidb |
Version that runs using fil:// tested
|
|
|
@30929
|
8 years |
davidb |
Tweaks made while testing the script
|
|
|
@30928
|
8 years |
davidb |
Forgot to set json_filelist
|
|
|
@30927
|
8 years |
davidb |
Fixed silly typo in stdout redirect
|
|
|
@30926
|
8 years |
davidb |
Restructuring of RUN scripts to be more flexible
|
|
|
@30925
|
8 years |
davidb |
Improved instrutions
|
|
|
@30924
|
8 years |
davidb |
Tidy up of code. Removed commented out code
|
|
|
@30923
|
8 years |
davidb |
Rough cut version that reads in each JSON file over HDFS
|
|
|
@30922
|
8 years |
davidb |
Additional rough-cut notes
|
|
|
@30921
|
8 years |
davidb |
Code change to read in JSON file over HDFS
|
|
|
@30919
|
8 years |
davidb |
More consistent naming of folders used
|
|
|
@30918
|
8 years |
davidb |
More flexible command-line args
|
|
|
@30916
|
8 years |
davidb |
Some additional details -- note form
|
|
|
@30915
|
8 years |
davidb |
Initial cut at instructions to follow to get code set up and running
|
|
|
@30912
|
8 years |
davidb |
Changed to Unix style line-endings
|
|
|
@30911
|
8 years |
davidb |
Changed name of input directory
|
|
|
@30910
|
8 years |
davidb |
Additional finesse added in as a result of further testing on Vagrant …
|
|
|
@30909
|
8 years |
davidb |
Additional finesse added in as a result of further testing on Vagrant …
|
|
|
@30908
|
8 years |
davidb |
Additional finesse added in as a result of further testing on Vagrant …
|
|
|
@30907
|
8 years |
davidb |
Name change to reflect need for 'bash' not 'sh'
|
|
|
@30906
|
8 years |
davidb |
Bash version of BAT script
|
|
|
@30902
|
8 years |
davidb |
Details of what packages are needed
|
|
|
@30901
|
8 years |
davidb |
Template setup file
|
|
|
@30900
|
8 years |
davidb |
For support Java packages
|
|
|
@30899
|
8 years |
davidb |
Files for compilation using Eclipse
|
|
|
@30898
|
8 years |
davidb |
Scripts for downloading sample JSON data from public domain extracted …
|
|
|
@30897
|
8 years |
davidb |
Sub-project for converted HTRC Extract Feature dataset into a form …
|