source: other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @31035   6 years davidb Changes after testing scripts
(edit) @31034   6 years davidb Development of scripts for working with Full EF dataset
(edit) @31033   6 years davidb Development of scripts for working with Full EF dataset
(edit) @31030   6 years davidb Tweak to some verbosity level 2 printing
(edit) @31029   6 years davidb Newline at end of file added
(edit) @31028   6 years davidb Support for randonly choosing Solr endpoints added in
(edit) @31027   6 years davidb Mixed typo in property name used
(edit) @31026   6 years davidb Corrected flag setting
(edit) @31025   6 years davidb Use property process-json-mode to determine which sort of Spark …
(edit) @31024   6 years davidb Support for Java properties file
(edit) @31022   6 years davidb No longer used
(edit) @31021   6 years davidb Folder restructure to remove 'trunk' part
(edit) @31020   6 years davidb No longer used
(edit) @31019   6 years davidb Part 2 or two-step folder restructure
(edit) @31018   6 years davidb Part 1 or two-step folder restructure
(edit) @31017   6 years davidb Moved to correct position
(edit) @31016   6 years davidb No longer used
(edit) @31015   6 years davidb Restructuring of projects into one
(edit) @31013   6 years davidb Accumulator for PerPageMap
(edit) @31011   6 years davidb Further RDD flatMap/map restructuring and refactoring, for per-page
(edit) @31010   6 years davidb Tidy up on generating Spark App name
(edit) @31009   6 years davidb Adjustments after latest fresh 'vagrant up' trial
(edit) @31008   6 years davidb Additional detail added into Spark app name
(edit) @31007   6 years davidb Class name refactoring
(edit) @31006   6 years davidb Further reversal of Base class. Switch to PerPage
(edit) @31005   6 years davidb Reversal of Base class in PerVolumeJSON
(edit) @31004   6 years davidb added debug
(edit) @31003   6 years davidb Explicity default constructors added
(edit) @31002   6 years davidb Need to separate flatMap and foreach calls in PagedJSON
(edit) @31001   6 years davidb Code to work per-volume and per-page
(edit) @31000   6 years davidb Class name refactoring
(edit) @30999   6 years davidb Class name refactoring
(edit) @30998   6 years davidb Class name refactoring
(edit) @30997   6 years davidb Verbosity control over printing
(edit) @30996   6 years davidb Code refactoring
(edit) @30995   6 years davidb Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit) @30994   6 years davidb Additional useful links. Links open in new tab
(edit) @30993   6 years davidb Placeholder page to provide useful links to hadoop and solr cluster …
(edit) @30992   6 years davidb Additional adjustments after test run on cluster
(edit) @30991   6 years davidb Inital cut at README notes, and supporting links
(edit) @30990   6 years davidb opt name change
(edit) @30989   6 years davidb Changes to better suit EF set used with solr
(edit) @30988   6 years davidb Changed flag to 'read-only' and changed the filed name full text saved …
(edit) @30986   6 years davidb Debugging for double accumulator added
(edit) @30985   6 years davidb Changed to run main processing method as action rather than transform. …
(edit) @30984   6 years davidb Introduction of Spark accumulator to measure progress. Output of POST …
(edit) @30983   6 years davidb Useful helper script
(edit) @30982   6 years davidb Fixed to host_name for solr2 and solr3
(edit) @30981   6 years davidb Useful folder for 'on-the-side' packages
(edit) @30980   6 years davidb Code added to read response
(edit) @30979   6 years davidb _solr_url needs to be stored in class!
(edit) @30978   6 years davidb Additional debug statements
(edit) @30977   6 years davidb Only have RDD if an output directory was specified on the command-line …
(edit) @30976   6 years davidb Change to reflect changed order of command-line arguments
(edit) @30975   6 years davidb Introduction of new solr-url command line argument, leading to some …
(edit) @30974   6 years davidb update/add/doc JSON structure needed
(edit) @30973   6 years davidb Changed to saving Solr JSON file for debugging purposes
(edit) @30972   6 years davidb addition of useful command needed before re-running
(edit) @30971   6 years davidb Adding in post to Solr cloud. Changed text_t to _text_
(edit) @30970   6 years davidb Added in mapping of EF-JSON to Solr 'add' JSON format
(edit) @30969   6 years davidb Fine tuning resulting from testing the cloud/cluster
(edit) @30962   6 years davidb Corrections and improvements made after initial testing between …
(edit) @30960   6 years davidb Switch to using Puppet to provision machine. Strongly based on files …
(edit) @30957   6 years davidb No longer needed. (Local copy taken on Windows laptop.)
(edit) @30956   6 years davidb Initial commit of files for setting up with Vagrant a Solr cloud
(edit) @30953   6 years davidb Need to specify _output_dir as part of output JSON filename
(edit) @30952   6 years davidb Further text tidy up
(edit) @30951   6 years davidb Save a JSONObject as a file in the output directory
(edit) @30950   6 years davidb Tweak to text
(edit) @30949   6 years davidb Use better name than 'foo'. Further fix to JSON name generated
(edit) @30947   6 years davidb Correction to 'pages-' part of JSON.bz2 output filename used
(edit) @30946   6 years davidb Correction to output JSON.bz2 name generated
(edit) @30945   6 years davidb Getting closer to writing out JSON files
(edit) @30944   6 years davidb Forcer higher partition (6) than default, which seems to be 2
(edit) @30943   6 years davidb Extra debug info
(edit) @30942   6 years davidb Improved output printing for slave node
(edit) @30941   6 years davidb Moved to getFileSystemInstance() method to play nice on cluster
(edit) @30940   6 years davidb Change to using URI not fileIn directly
(edit) @30939   6 years davidb Minor tweaks
(edit) @30938   6 years davidb Experiment with using Hadoop's FileSystem class for local file:// access
(edit) @30937   6 years davidb Expanded set of ClusterFileIO methods
(edit) @30936   6 years davidb Refinement of Spark Monitor echo statements
(edit) @30935   6 years davidb Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
(edit) @30934   6 years davidb Providing json-filelist now a compulsory argument, rather than an option
(edit) @30933   6 years davidb More careful parsing of file prefix
(edit) @30932   6 years davidb Support both file:// and hdfs://
(edit) @30931   6 years davidb Version that runs using fil:// tested
(edit) @30930   6 years davidb Expansion of useful alias commands for Hadoop and Spark
(edit) @30929   6 years davidb Tweaks made while testing the script
(edit) @30928   6 years davidb Forgot to set json_filelist
(edit) @30927   6 years davidb Fixed silly typo in stdout redirect
(edit) @30926   6 years davidb Restructuring of RUN scripts to be more flexible
(edit) @30925   6 years davidb Improved instrutions
(edit) @30924   6 years davidb Tidy up of code. Removed commented out code
(edit) @30923   6 years davidb Rough cut version that reads in each JSON file over HDFS
(edit) @30922   6 years davidb Additional rough-cut notes
(edit) @30921   6 years davidb Code change to read in JSON file over HDFS
(edit) @30919   6 years davidb More consistent naming of folders used
(edit) @30918   6 years davidb More flexible command-line args
(edit) @30917   6 years davidb Changes resulting from a fresh run at provisioning, which yielded the …
Note: See TracRevisionLog for help on using the revision log.