source: other-projects/hathitrust

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @30952   4 years davidb Further text tidy up
(edit) @30951   4 years davidb Save a JSONObject as a file in the output directory
(edit) @30950   4 years davidb Tweak to text
(edit) @30949   4 years davidb Use better name than 'foo'. Further fix to JSON name generated
(edit) @30947   4 years davidb Correction to 'pages-' part of JSON.bz2 output filename used
(edit) @30946   4 years davidb Correction to output JSON.bz2 name generated
(edit) @30945   4 years davidb Getting closer to writing out JSON files
(edit) @30944   4 years davidb Forcer higher partition (6) than default, which seems to be 2
(edit) @30943   4 years davidb Extra debug info
(edit) @30942   4 years davidb Improved output printing for slave node
(edit) @30941   4 years davidb Moved to getFileSystemInstance() method to play nice on cluster
(edit) @30940   4 years davidb Change to using URI not fileIn directly
(edit) @30939   4 years davidb Minor tweaks
(edit) @30938   4 years davidb Experiment with using Hadoop's FileSystem class for local file:// access
(edit) @30937   4 years davidb Expanded set of ClusterFileIO methods
(edit) @30936   4 years davidb Refinement of Spark Monitor echo statements
(edit) @30935   4 years davidb Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
(edit) @30934   4 years davidb Providing json-filelist now a compulsory argument, rather than an option
(edit) @30933   4 years davidb More careful parsing of file prefix
(edit) @30932   4 years davidb Support both file:// and hdfs://
(edit) @30931   4 years davidb Version that runs using fil:// tested
(edit) @30930   4 years davidb Expansion of useful alias commands for Hadoop and Spark
(edit) @30929   4 years davidb Tweaks made while testing the script
(edit) @30928   4 years davidb Forgot to set json_filelist
(edit) @30927   4 years davidb Fixed silly typo in stdout redirect
(edit) @30926   4 years davidb Restructuring of RUN scripts to be more flexible
(edit) @30925   4 years davidb Improved instrutions
(edit) @30924   4 years davidb Tidy up of code. Removed commented out code
(edit) @30923   4 years davidb Rough cut version that reads in each JSON file over HDFS
(edit) @30922   4 years davidb Additional rough-cut notes
(edit) @30921   4 years davidb Code change to read in JSON file over HDFS
(edit) @30919   4 years davidb More consistent naming of folders used
(edit) @30918   4 years davidb More flexible command-line args
(edit) @30917   4 years davidb Changes resulting from a fresh run at provisioning, which yielded the …
(edit) @30916   4 years davidb Some additional details -- note form
(edit) @30915   4 years davidb Initial cut at instructions to follow to get code set up and running
(edit) @30914   4 years davidb Tidy up of setup description
(edit) @30913   4 years davidb Renaming to better represent what the cluster is designed for
(edit) @30912   4 years davidb Changed to Unix style line-endings
(edit) @30911   4 years davidb Changed name of input directory
(edit) @30910   4 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30909   4 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30908   4 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30907   4 years davidb Name change to reflect need for 'bash' not 'sh'
(edit) @30906   4 years davidb Bash version of BAT script
(edit) @30905   4 years davidb Additional resources
(edit) @30904   4 years davidb Extra resource/links added
(edit) @30903   4 years davidb Vagrant provisioning files for a 4-node Hadoop cluster. See …
(edit) @30902   4 years davidb Details of what packages are needed
(edit) @30901   4 years davidb Template setup file
(edit) @30900   4 years davidb For support Java packages
(edit) @30899   4 years davidb Files for compilation using Eclipse
(edit) @30898   4 years davidb Scripts for downloading sample JSON data from public domain extracted …
(edit) @30897   4 years davidb Sub-project for converted HTRC Extract Feature dataset into a form …
(add) @30890   4 years davidb folder to group together hathitrust related projects
Note: See TracRevisionLog for help on using the revision log.