Context Navigation

source: other-projects

Legend:

: Added
: Modified
: Copied or renamed

	Rev	Age	Author	Log Message
(edit)	@31044	7 years	davidb	Fixed up error when output_dir is empty
(edit)	@31043	7 years	davidb	Version for processing full EF set
(edit)	@31042	7 years	davidb	Name changes, preparing the way for FULL-RUN versions
(edit)	@31041	7 years	davidb	Test needs to be more careful if -read-only specified
(edit)	@31036	7 years	davidb	Renaming to prepare way for YARN version of script
(edit)	@31035	7 years	davidb	Changes after testing scripts
(edit)	@31034	7 years	davidb	Development of scripts for working with Full EF dataset
(edit)	@31033	7 years	davidb	Development of scripts for working with Full EF dataset
(edit)	@31030	7 years	davidb	Tweak to some verbosity level 2 printing
(edit)	@31029	7 years	davidb	Newline at end of file added
(edit)	@31028	7 years	davidb	Support for randonly choosing Solr endpoints added in
(edit)	@31027	7 years	davidb	Mixed typo in property name used
(edit)	@31026	7 years	davidb	Corrected flag setting
(edit)	@31025	7 years	davidb	Use property process-json-mode to determine which sort of Spark …
(edit)	@31024	7 years	davidb	Support for Java properties file
(edit)	@31022	7 years	davidb	No longer used
(edit)	@31021	7 years	davidb	Folder restructure to remove 'trunk' part
(edit)	@31020	7 years	davidb	No longer used
(edit)	@31019	7 years	davidb	Part 2 or two-step folder restructure
(edit)	@31018	7 years	davidb	Part 1 or two-step folder restructure
(edit)	@31017	7 years	davidb	Moved to correct position
(edit)	@31016	7 years	davidb	No longer used
(edit)	@31015	7 years	davidb	Restructuring of projects into one
(edit)	@31013	7 years	davidb	Accumulator for PerPageMap
(edit)	@31011	7 years	davidb	Further RDD flatMap/map restructuring and refactoring, for per-page
(edit)	@31010	7 years	davidb	Tidy up on generating Spark App name
(edit)	@31009	7 years	davidb	Adjustments after latest fresh 'vagrant up' trial
(edit)	@31008	7 years	davidb	Additional detail added into Spark app name
(edit)	@31007	7 years	davidb	Class name refactoring
(edit)	@31006	7 years	davidb	Further reversal of Base class. Switch to PerPage
(edit)	@31005	7 years	davidb	Reversal of Base class in PerVolumeJSON
(edit)	@31004	7 years	davidb	added debug
(edit)	@31003	7 years	davidb	Explicity default constructors added
(edit)	@31002	7 years	davidb	Need to separate flatMap and foreach calls in PagedJSON
(edit)	@31001	7 years	davidb	Code to work per-volume and per-page
(edit)	@31000	7 years	davidb	Class name refactoring
(edit)	@30999	7 years	davidb	Class name refactoring
(edit)	@30998	7 years	davidb	Class name refactoring
(edit)	@30997	7 years	davidb	Verbosity control over printing
(edit)	@30996	7 years	davidb	Code refactoring
(edit)	@30995	7 years	davidb	Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit)	@30994	7 years	davidb	Additional useful links. Links open in new tab
(edit)	@30993	7 years	davidb	Placeholder page to provide useful links to hadoop and solr cluster …
(edit)	@30992	7 years	davidb	Additional adjustments after test run on cluster
(edit)	@30991	7 years	davidb	Inital cut at README notes, and supporting links
(edit)	@30990	7 years	davidb	opt name change
(edit)	@30989	7 years	davidb	Changes to better suit EF set used with solr
(edit)	@30988	7 years	davidb	Changed flag to 'read-only' and changed the filed name full text saved …
(edit)	@30986	7 years	davidb	Debugging for double accumulator added
(edit)	@30985	7 years	davidb	Changed to run main processing method as action rather than transform. …
(edit)	@30984	7 years	davidb	Introduction of Spark accumulator to measure progress. Output of POST …
(edit)	@30983	7 years	davidb	Useful helper script
(edit)	@30982	7 years	davidb	Fixed to host_name for solr2 and solr3
(edit)	@30981	7 years	davidb	Useful folder for 'on-the-side' packages
(edit)	@30980	7 years	davidb	Code added to read response
(edit)	@30979	7 years	davidb	_solr_url needs to be stored in class!
(edit)	@30978	7 years	davidb	Additional debug statements
(edit)	@30977	7 years	davidb	Only have RDD if an output directory was specified on the command-line …
(edit)	@30976	7 years	davidb	Change to reflect changed order of command-line arguments
(edit)	@30975	7 years	davidb	Introduction of new solr-url command line argument, leading to some …
(edit)	@30974	7 years	davidb	update/add/doc JSON structure needed
(edit)	@30973	7 years	davidb	Changed to saving Solr JSON file for debugging purposes
(edit)	@30972	7 years	davidb	addition of useful command needed before re-running
(edit)	@30971	7 years	davidb	Adding in post to Solr cloud. Changed text_t to _text_
(edit)	@30970	7 years	davidb	Added in mapping of EF-JSON to Solr 'add' JSON format
(edit)	@30969	7 years	davidb	Fine tuning resulting from testing the cloud/cluster
(edit)	@30962	7 years	davidb	Corrections and improvements made after initial testing between …
(edit)	@30960	7 years	davidb	Switch to using Puppet to provision machine. Strongly based on files …
(edit)	@30957	7 years	davidb	No longer needed. (Local copy taken on Windows laptop.)
(edit)	@30956	7 years	davidb	Initial commit of files for setting up with Vagrant a Solr cloud
(edit)	@30953	7 years	davidb	Need to specify _output_dir as part of output JSON filename
(edit)	@30952	7 years	davidb	Further text tidy up
(edit)	@30951	7 years	davidb	Save a JSONObject as a file in the output directory
(edit)	@30950	7 years	davidb	Tweak to text
(edit)	@30949	7 years	davidb	Use better name than 'foo'. Further fix to JSON name generated
(edit)	@30947	7 years	davidb	Correction to 'pages-' part of JSON.bz2 output filename used
(edit)	@30946	7 years	davidb	Correction to output JSON.bz2 name generated
(edit)	@30945	7 years	davidb	Getting closer to writing out JSON files
(edit)	@30944	7 years	davidb	Forcer higher partition (6) than default, which seems to be 2
(edit)	@30943	7 years	davidb	Extra debug info
(edit)	@30942	7 years	davidb	Improved output printing for slave node
(edit)	@30941	7 years	davidb	Moved to getFileSystemInstance() method to play nice on cluster
(edit)	@30940	7 years	davidb	Change to using URI not fileIn directly
(edit)	@30939	7 years	davidb	Minor tweaks
(edit)	@30938	7 years	davidb	Experiment with using Hadoop's FileSystem class for local file:// access
(edit)	@30937	7 years	davidb	Expanded set of ClusterFileIO methods
(edit)	@30936	7 years	davidb	Refinement of Spark Monitor echo statements
(edit)	@30935	7 years	davidb	Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
(edit)	@30934	7 years	davidb	Providing json-filelist now a compulsory argument, rather than an option
(edit)	@30933	7 years	davidb	More careful parsing of file prefix
(edit)	@30932	7 years	davidb	Support both file:// and hdfs://
(edit)	@30931	7 years	davidb	Version that runs using fil:// tested
(edit)	@30930	7 years	davidb	Expansion of useful alias commands for Hadoop and Spark
(edit)	@30929	7 years	davidb	Tweaks made while testing the script
(edit)	@30928	7 years	davidb	Forgot to set json_filelist
(edit)	@30927	7 years	davidb	Fixed silly typo in stdout redirect
(edit)	@30926	7 years	davidb	Restructuring of RUN scripts to be more flexible
(edit)	@30925	7 years	davidb	Improved instrutions
(edit)	@30924	7 years	davidb	Tidy up of code. Removed commented out code
(edit)	@30923	7 years	davidb	Rough cut version that reads in each JSON file over HDFS

Note: See TracRevisionLog for help on using the revision log.

Download in other formats: