root/other-projects/hathitrust

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @31196 [31196] 3 years davidb File renaming to make way for newer version of classes needed in the main …
(edit) @31195 [31195] 3 years davidb File renaming to make way for newer version of classes needed in the main …
(edit) @31194 [31194] 3 years davidb Serialize in and out methods added
(edit) @31193 [31193] 3 years davidb Peter's white-list file
(edit) @31184 [31184] 3 years davidb New provision to run different main classes in _RUN.sh; New top-level …
(edit) @31183 [31183] 3 years davidb Bump up to project using Java 1.8
(edit) @31177 [31177] 3 years davidb Adding in Google jar that supports Bloom filters
(edit) @31176 [31176] 3 years davidb Support added for producing whitelist word count
(edit) @31175 [31175] 3 years davidb Trial to find memory difference betwen Hashmap and Bloom filters
(edit) @31174 [31174] 3 years davidb One of the last scripts developed for getting ef dataset into HDFS
(edit) @31173 [31173] 3 years davidb individual file sizes per top-level folder
(edit) @31172 [31172] 3 years davidb to help track down missing files in HDFS copy
(edit) @31171 [31171] 3 years davidb Util to help find where missing files are
(edit) @31170 [31170] 3 years davidb Targetted sub-dir copy
(edit) @31169 [31169] 3 years davidb Improved logic
(edit) @31161 [31161] 3 years davidb Comparison of local disk version with HDFS version
(edit) @31152 [31152] 3 years davidb Development of script
(edit) @31151 [31151] 3 years davidb More nuanced version to help finish off the 'big put'
(edit) @31128 [31128] 3 years davidb Some scripts to help with pushing and monitoring the progress of the put …
(edit) @31112 [31112] 3 years davidb To move out shards saved in /tmp on gsliscluter1 nodes to nema
(edit) @31106 [31106] 3 years davidb Scripts to help run an rsync'd copy of gslistcluster1 /tmp/gcX-solr-shard …
(edit) @31105 [31105] 3 years davidb Additional scripts to help with running solr locally out of /tmp area
(edit) @31104 [31104] 3 years davidb now configurable to be run from local disk (/tmp)
(edit) @31103 [31103] 3 years davidb Changes made after testing with 20 solr nodes
(edit) @31102 [31102] 3 years davidb Command line way of running a Solr test query
(edit) @31101 [31101] 3 years davidb Correction to collection name
(edit) @31100 [31100] 3 years davidb Change to using solr-cloud-nodes that include port number
(edit) @31099 [31099] 3 years davidb Changes resulting from test runs to get Zookeeper and Solr running on …
(edit) @31098 [31098] 3 years davidb Changes resulting from test runs to get Zookeeper and Solr running on …
(edit) @31097 [31097] 3 years davidb Changed to .in style namne
(edit) @31096 [31096] 3 years davidb Only need to create a volume's pages output directory is _output_dir has …
(edit) @31095 [31095] 3 years davidb Introduced num-partitions property
(edit) @31094 [31094] 3 years davidb Changes triggered by running on gsliscluster1
(edit) @31093 [31093] 3 years davidb Changes triggered by running on gsliscluster1
(edit) @31092 [31092] 3 years davidb Minor tweak to spark/hadoop combination downloaded
(edit) @31091 [31091] 3 years davidb Change of number of core for 'gsliscluster1' machine; commmented out …
(edit) @31090 [31090] 3 years davidb Memory monitor debugging code, commented out
(edit) @31089 [31089] 3 years davidb Change in way the JSON file is read in. Motivation was a out-of-memory …
(edit) @31088 [31088] 3 years davidb Shift to newIstance for FileSystem? due to StackOverflow? page describing …
(edit) @31082 [31082] 3 years davidb Changes in response to testing on gchead
(edit) @31081 [31081] 3 years davidb Going live with generation of spark slaves file
(edit) @31080 [31080] 3 years davidb echo formatting tidy up. Fixed some typos
(edit) @31079 [31079] 3 years davidb Useful get started scripts
(edit) @31078 [31078] 3 years davidb Some setup files and scripts to make running Spark and Solr easier on the …
(edit) @31077 [31077] 3 years davidb Move up to JDK1.8. Tidy up of Vagrant machine names. Support for YARN. …
(edit) @31065 [31065] 3 years davidb Additional echo output
(edit) @31062 [31062] 3 years davidb Added in -W option so check-sum calculation is skipped
(edit) @31058 [31058] 3 years davidb echo for additional information added
(edit) @31057 [31057] 3 years davidb Tweak to jps output formatting
(edit) @31053 [31053] 3 years davidb Addition of second argument, optional, for where to save the files
(edit) @31051 [31051] 3 years davidb Added in JDK to list of possible packages needed
(edit) @31046 [31046] 3 years davidb Added property to control how severe a JSON IO problem is
(edit) @31045 [31045] 3 years davidb More careful treatment of what to do when a JSON file isn't there
(edit) @31044 [31044] 3 years davidb Fixed up error when output_dir is empty
(edit) @31043 [31043] 3 years davidb Version for processing full EF set
(edit) @31042 [31042] 3 years davidb Name changes, preparing the way for FULL-RUN versions
(edit) @31041 [31041] 3 years davidb Test needs to be more careful if -read-only specified
(edit) @31036 [31036] 3 years davidb Renaming to prepare way for YARN version of script
(edit) @31035 [31035] 3 years davidb Changes after testing scripts
(edit) @31034 [31034] 3 years davidb Development of scripts for working with Full EF dataset
(edit) @31033 [31033] 3 years davidb Development of scripts for working with Full EF dataset
(edit) @31030 [31030] 3 years davidb Tweak to some verbosity level 2 printing
(edit) @31029 [31029] 3 years davidb Newline at end of file added
(edit) @31028 [31028] 3 years davidb Support for randonly choosing Solr endpoints added in
(edit) @31027 [31027] 3 years davidb Mixed typo in property name used
(edit) @31026 [31026] 3 years davidb Corrected flag setting
(edit) @31025 [31025] 3 years davidb Use property process-json-mode to determine which sort of Spark mapping to …
(edit) @31024 [31024] 3 years davidb Support for Java properties file
(edit) @31022 [31022] 3 years davidb No longer used
(edit) @31021 [31021] 3 years davidb Folder restructure to remove 'trunk' part
(edit) @31020 [31020] 3 years davidb No longer used
(edit) @31019 [31019] 3 years davidb Part 2 or two-step folder restructure
(edit) @31018 [31018] 3 years davidb Part 1 or two-step folder restructure
(edit) @31017 [31017] 3 years davidb Moved to correct position
(edit) @31016 [31016] 3 years davidb No longer used
(edit) @31015 [31015] 3 years davidb Restructuring of projects into one
(edit) @31013 [31013] 3 years davidb Accumulator for PerPageMap?
(edit) @31011 [31011] 3 years davidb Further RDD flatMap/map restructuring and refactoring, for per-page
(edit) @31010 [31010] 3 years davidb Tidy up on generating Spark App name
(edit) @31009 [31009] 3 years davidb Adjustments after latest fresh 'vagrant up' trial
(edit) @31008 [31008] 3 years davidb Additional detail added into Spark app name
(edit) @31007 [31007] 3 years davidb Class name refactoring
(edit) @31006 [31006] 3 years davidb Further reversal of Base class. Switch to PerPage?
(edit) @31005 [31005] 3 years davidb Reversal of Base class in PerVolumeJSON
(edit) @31004 [31004] 3 years davidb added debug
(edit) @31003 [31003] 3 years davidb Explicity default constructors added
(edit) @31002 [31002] 3 years davidb Need to separate flatMap and foreach calls in PagedJSON
(edit) @31001 [31001] 3 years davidb Code to work per-volume and per-page
(edit) @31000 [31000] 3 years davidb Class name refactoring
(edit) @30999 [30999] 3 years davidb Class name refactoring
(edit) @30998 [30998] 3 years davidb Class name refactoring
(edit) @30997 [30997] 3 years davidb Verbosity control over printing
(edit) @30996 [30996] 3 years davidb Code refactoring
(edit) @30995 [30995] 3 years davidb Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit) @30994 [30994] 3 years davidb Additional useful links. Links open in new tab
(edit) @30993 [30993] 3 years davidb Placeholder page to provide useful links to hadoop and solr cluster …
(edit) @30992 [30992] 3 years davidb Additional adjustments after test run on cluster
(edit) @30991 [30991] 3 years davidb Inital cut at README notes, and supporting links
(edit) @30990 [30990] 3 years davidb opt name change
(edit) @30989 [30989] 3 years davidb Changes to better suit EF set used with solr
Note: See TracRevisionLog for help on using the revision log.