source: other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @30994   7 years davidb Additional useful links. Links open in new tab
(edit) @30993   7 years davidb Placeholder page to provide useful links to hadoop and solr cluster …
(edit) @30992   7 years davidb Additional adjustments after test run on cluster
(edit) @30991   7 years davidb Inital cut at README notes, and supporting links
(edit) @30990   7 years davidb opt name change
(edit) @30989   7 years davidb Changes to better suit EF set used with solr
(edit) @30988   7 years davidb Changed flag to 'read-only' and changed the filed name full text saved …
(edit) @30986   7 years davidb Debugging for double accumulator added
(edit) @30985   7 years davidb Changed to run main processing method as action rather than transform. …
(edit) @30984   7 years davidb Introduction of Spark accumulator to measure progress. Output of POST …
(edit) @30983   7 years davidb Useful helper script
(edit) @30982   7 years davidb Fixed to host_name for solr2 and solr3
(edit) @30981   7 years davidb Useful folder for 'on-the-side' packages
(edit) @30980   7 years davidb Code added to read response
(edit) @30979   7 years davidb _solr_url needs to be stored in class!
(edit) @30978   7 years davidb Additional debug statements
(edit) @30977   7 years davidb Only have RDD if an output directory was specified on the command-line …
(edit) @30976   7 years davidb Change to reflect changed order of command-line arguments
(edit) @30975   7 years davidb Introduction of new solr-url command line argument, leading to some …
(edit) @30974   7 years davidb update/add/doc JSON structure needed
(edit) @30973   7 years davidb Changed to saving Solr JSON file for debugging purposes
(edit) @30972   7 years davidb addition of useful command needed before re-running
(edit) @30971   7 years davidb Adding in post to Solr cloud. Changed text_t to _text_
(edit) @30970   7 years davidb Added in mapping of EF-JSON to Solr 'add' JSON format
(edit) @30969   7 years davidb Fine tuning resulting from testing the cloud/cluster
(edit) @30962   7 years davidb Corrections and improvements made after initial testing between …
(edit) @30960   7 years davidb Switch to using Puppet to provision machine. Strongly based on files …
(edit) @30957   7 years davidb No longer needed. (Local copy taken on Windows laptop.)
(edit) @30956   7 years davidb Initial commit of files for setting up with Vagrant a Solr cloud
(edit) @30953   7 years davidb Need to specify _output_dir as part of output JSON filename
(edit) @30952   7 years davidb Further text tidy up
(edit) @30951   7 years davidb Save a JSONObject as a file in the output directory
(edit) @30950   7 years davidb Tweak to text
(edit) @30949   7 years davidb Use better name than 'foo'. Further fix to JSON name generated
(edit) @30947   7 years davidb Correction to 'pages-' part of JSON.bz2 output filename used
(edit) @30946   7 years davidb Correction to output JSON.bz2 name generated
(edit) @30945   7 years davidb Getting closer to writing out JSON files
(edit) @30944   7 years davidb Forcer higher partition (6) than default, which seems to be 2
(edit) @30943   7 years davidb Extra debug info
(edit) @30942   7 years davidb Improved output printing for slave node
(edit) @30941   7 years davidb Moved to getFileSystemInstance() method to play nice on cluster
(edit) @30940   7 years davidb Change to using URI not fileIn directly
(edit) @30939   7 years davidb Minor tweaks
(edit) @30938   7 years davidb Experiment with using Hadoop's FileSystem class for local file:// access
(edit) @30937   7 years davidb Expanded set of ClusterFileIO methods
(edit) @30936   7 years davidb Refinement of Spark Monitor echo statements
(edit) @30935   7 years davidb Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
(edit) @30934   7 years davidb Providing json-filelist now a compulsory argument, rather than an option
(edit) @30933   7 years davidb More careful parsing of file prefix
(edit) @30932   7 years davidb Support both file:// and hdfs://
(edit) @30931   7 years davidb Version that runs using fil:// tested
(edit) @30930   7 years davidb Expansion of useful alias commands for Hadoop and Spark
(edit) @30929   7 years davidb Tweaks made while testing the script
(edit) @30928   7 years davidb Forgot to set json_filelist
(edit) @30927   7 years davidb Fixed silly typo in stdout redirect
(edit) @30926   7 years davidb Restructuring of RUN scripts to be more flexible
(edit) @30925   7 years davidb Improved instrutions
(edit) @30924   7 years davidb Tidy up of code. Removed commented out code
(edit) @30923   7 years davidb Rough cut version that reads in each JSON file over HDFS
(edit) @30922   7 years davidb Additional rough-cut notes
(edit) @30921   7 years davidb Code change to read in JSON file over HDFS
(edit) @30919   7 years davidb More consistent naming of folders used
(edit) @30918   7 years davidb More flexible command-line args
(edit) @30917   7 years davidb Changes resulting from a fresh run at provisioning, which yielded the …
(edit) @30916   7 years davidb Some additional details -- note form
(edit) @30915   7 years davidb Initial cut at instructions to follow to get code set up and running
(edit) @30914   7 years davidb Tidy up of setup description
(edit) @30913   7 years davidb Renaming to better represent what the cluster is designed for
(edit) @30912   7 years davidb Changed to Unix style line-endings
(edit) @30911   7 years davidb Changed name of input directory
(edit) @30910   7 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30909   7 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30908   7 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30907   7 years davidb Name change to reflect need for 'bash' not 'sh'
(edit) @30906   7 years davidb Bash version of BAT script
(edit) @30905   7 years davidb Additional resources
(edit) @30904   7 years davidb Extra resource/links added
(edit) @30903   7 years davidb Vagrant provisioning files for a 4-node Hadoop cluster. See …
(edit) @30902   7 years davidb Details of what packages are needed
(edit) @30901   7 years davidb Template setup file
(edit) @30900   7 years davidb For support Java packages
(edit) @30899   7 years davidb Files for compilation using Eclipse
(edit) @30898   7 years davidb Scripts for downloading sample JSON data from public domain extracted …
(edit) @30897   7 years davidb Sub-project for converted HTRC Extract Feature dataset into a form …
(edit) @30890   7 years davidb folder to group together hathitrust related projects
(edit) @30846   8 years ak19 Wrong module in script name.
(edit) @30818   8 years ak19 Script needs to get rid of another intermediate file.
(edit) @30722   8 years ak19 Remove repeated empty lines, leaving just a single blank line between …
(edit) @30720   8 years ak19 Getting the nightly gti email to be sent again on the new machine …
(edit) @30652   8 years ak19 Committing outstanding files for diffcol supporting jdb for GS3 …
(edit) @30613   8 years ak19 Don't send nightly email messages about updates to the test language …
(edit) @30611   8 years ak19 Modified version of remove_extra_lines script to handle the Updated …
(edit) @30605   8 years ak19 Committing the changes necessary to get the GTI crons to work on the …
(edit) @30594   8 years kjdon modifying the input and results areas to get it looking the same in …
(edit) @30590   8 years kjdon this change was made on puka, and when I tried this on commdev, it …
(edit) @30581   8 years ak19 GTI related changes to add gs3 demo collection config files' …
(edit) @30425   8 years davidb Save metadata as JSON file. Create sub-directories to spreadout the …
(edit) @30424   8 years ak19 Dr Bainbridge improved the code so that the script always gets the …
(edit) @30422   8 years davidb Tidier treatment of 'bin' and 'audio' directories from an SVN point of view
(edit) @30421   8 years davidb Removed from SVN tree
Note: See TracRevisionLog for help on using the revision log.