Context Navigation

source: other-projects/hathitrust

Legend:

: Added
: Modified
: Copied or renamed

	Rev	Age	Author	Log Message
(edit)	@31200	7 years	davidb	Better output statement
(edit)	@31199	7 years	davidb	Renaming of classname to reflect filename rename
(edit)	@31198	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31197	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31196	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31195	7 years	davidb	File renaming to make way for newer version of classes needed in the …
(edit)	@31194	7 years	davidb	Serialize in and out methods added
(edit)	@31193	7 years	davidb	Peter's white-list file
(edit)	@31184	7 years	davidb	New provision to run different main classes in _RUN.sh; New top-level …
(edit)	@31183	7 years	davidb	Bump up to project using Java 1.8
(edit)	@31177	7 years	davidb	Adding in Google jar that supports Bloom filters
(edit)	@31176	7 years	davidb	Support added for producing whitelist word count
(edit)	@31175	7 years	davidb	Trial to find memory difference betwen Hashmap and Bloom filters
(edit)	@31174	7 years	davidb	One of the last scripts developed for getting ef dataset into HDFS
(edit)	@31173	7 years	davidb	individual file sizes per top-level folder
(edit)	@31172	7 years	davidb	to help track down missing files in HDFS copy
(edit)	@31171	7 years	davidb	Util to help find where missing files are
(edit)	@31170	7 years	davidb	Targetted sub-dir copy
(edit)	@31169	7 years	davidb	Improved logic
(edit)	@31161	7 years	davidb	Comparison of local disk version with HDFS version
(edit)	@31152	7 years	davidb	Development of script
(edit)	@31151	7 years	davidb	More nuanced version to help finish off the 'big put'
(edit)	@31128	7 years	davidb	Some scripts to help with pushing and monitoring the progress of the …
(edit)	@31112	7 years	davidb	To move out shards saved in /tmp on gsliscluter1 nodes to nema
(edit)	@31106	7 years	davidb	Scripts to help run an rsync'd copy of gslistcluster1 …
(edit)	@31105	7 years	davidb	Additional scripts to help with running solr locally out of /tmp area
(edit)	@31104	7 years	davidb	now configurable to be run from local disk (/tmp)
(edit)	@31103	7 years	davidb	Changes made after testing with 20 solr nodes
(edit)	@31102	7 years	davidb	Command line way of running a Solr test query
(edit)	@31101	7 years	davidb	Correction to collection name
(edit)	@31100	7 years	davidb	Change to using solr-cloud-nodes that include port number
(edit)	@31099	7 years	davidb	Changes resulting from test runs to get Zookeeper and Solr running on …
(edit)	@31098	7 years	davidb	Changes resulting from test runs to get Zookeeper and Solr running on …
(edit)	@31097	7 years	davidb	Changed to .in style namne
(edit)	@31096	7 years	davidb	Only need to create a volume's pages output directory is _output_dir …
(edit)	@31095	7 years	davidb	Introduced num-partitions property
(edit)	@31094	7 years	davidb	Changes triggered by running on gsliscluster1
(edit)	@31093	7 years	davidb	Changes triggered by running on gsliscluster1
(edit)	@31092	7 years	davidb	Minor tweak to spark/hadoop combination downloaded
(edit)	@31091	7 years	davidb	Change of number of core for 'gsliscluster1' machine; commmented out …
(edit)	@31090	7 years	davidb	Memory monitor debugging code, commented out
(edit)	@31089	7 years	davidb	Change in way the JSON file is read in. Motivation was a …
(edit)	@31088	7 years	davidb	Shift to newIstance for FileSystem due to StackOverflow page …
(edit)	@31082	7 years	davidb	Changes in response to testing on gchead
(edit)	@31081	7 years	davidb	Going live with generation of spark slaves file
(edit)	@31080	7 years	davidb	echo formatting tidy up. Fixed some typos
(edit)	@31079	7 years	davidb	Useful get started scripts
(edit)	@31078	7 years	davidb	Some setup files and scripts to make running Spark and Solr easier on …
(edit)	@31077	7 years	davidb	Move up to JDK1.8. Tidy up of Vagrant machine names. Support for YARN. …
(edit)	@31065	7 years	davidb	Additional echo output
(edit)	@31062	7 years	davidb	Added in -W option so check-sum calculation is skipped
(edit)	@31058	7 years	davidb	echo for additional information added
(edit)	@31057	7 years	davidb	Tweak to jps output formatting
(edit)	@31053	7 years	davidb	Addition of second argument, optional, for where to save the files
(edit)	@31051	7 years	davidb	Added in JDK to list of possible packages needed
(edit)	@31046	7 years	davidb	Added property to control how severe a JSON IO problem is
(edit)	@31045	7 years	davidb	More careful treatment of what to do when a JSON file isn't there
(edit)	@31044	7 years	davidb	Fixed up error when output_dir is empty
(edit)	@31043	7 years	davidb	Version for processing full EF set
(edit)	@31042	7 years	davidb	Name changes, preparing the way for FULL-RUN versions
(edit)	@31041	7 years	davidb	Test needs to be more careful if -read-only specified
(edit)	@31036	7 years	davidb	Renaming to prepare way for YARN version of script
(edit)	@31035	7 years	davidb	Changes after testing scripts
(edit)	@31034	7 years	davidb	Development of scripts for working with Full EF dataset
(edit)	@31033	7 years	davidb	Development of scripts for working with Full EF dataset
(edit)	@31030	7 years	davidb	Tweak to some verbosity level 2 printing
(edit)	@31029	7 years	davidb	Newline at end of file added
(edit)	@31028	7 years	davidb	Support for randonly choosing Solr endpoints added in
(edit)	@31027	7 years	davidb	Mixed typo in property name used
(edit)	@31026	7 years	davidb	Corrected flag setting
(edit)	@31025	7 years	davidb	Use property process-json-mode to determine which sort of Spark …
(edit)	@31024	7 years	davidb	Support for Java properties file
(edit)	@31022	7 years	davidb	No longer used
(edit)	@31021	7 years	davidb	Folder restructure to remove 'trunk' part
(edit)	@31020	7 years	davidb	No longer used
(edit)	@31019	7 years	davidb	Part 2 or two-step folder restructure
(edit)	@31018	7 years	davidb	Part 1 or two-step folder restructure
(edit)	@31017	7 years	davidb	Moved to correct position
(edit)	@31016	7 years	davidb	No longer used
(edit)	@31015	7 years	davidb	Restructuring of projects into one
(edit)	@31013	7 years	davidb	Accumulator for PerPageMap
(edit)	@31011	7 years	davidb	Further RDD flatMap/map restructuring and refactoring, for per-page
(edit)	@31010	7 years	davidb	Tidy up on generating Spark App name
(edit)	@31009	7 years	davidb	Adjustments after latest fresh 'vagrant up' trial
(edit)	@31008	7 years	davidb	Additional detail added into Spark app name
(edit)	@31007	7 years	davidb	Class name refactoring
(edit)	@31006	7 years	davidb	Further reversal of Base class. Switch to PerPage
(edit)	@31005	7 years	davidb	Reversal of Base class in PerVolumeJSON
(edit)	@31004	7 years	davidb	added debug
(edit)	@31003	7 years	davidb	Explicity default constructors added
(edit)	@31002	7 years	davidb	Need to separate flatMap and foreach calls in PagedJSON
(edit)	@31001	7 years	davidb	Code to work per-volume and per-page
(edit)	@31000	7 years	davidb	Class name refactoring
(edit)	@30999	7 years	davidb	Class name refactoring
(edit)	@30998	7 years	davidb	Class name refactoring
(edit)	@30997	7 years	davidb	Verbosity control over printing
(edit)	@30996	7 years	davidb	Code refactoring
(edit)	@30995	7 years	davidb	Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit)	@30994	7 years	davidb	Additional useful links. Links open in new tab
(edit)	@30993	7 years	davidb	Placeholder page to provide useful links to hadoop and solr cluster …

Note: See TracRevisionLog for help on using the revision log.

Download in other formats: