root/other-projects/hathitrust

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @31004 [31004] 3 years davidb added debug
(edit) @31003 [31003] 3 years davidb Explicity default constructors added
(edit) @31002 [31002] 3 years davidb Need to separate flatMap and foreach calls in PagedJSON
(edit) @31001 [31001] 3 years davidb Code to work per-volume and per-page
(edit) @31000 [31000] 3 years davidb Class name refactoring
(edit) @30999 [30999] 3 years davidb Class name refactoring
(edit) @30998 [30998] 3 years davidb Class name refactoring
(edit) @30997 [30997] 3 years davidb Verbosity control over printing
(edit) @30996 [30996] 3 years davidb Code refactoring
(edit) @30995 [30995] 3 years davidb Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit) @30994 [30994] 3 years davidb Additional useful links. Links open in new tab
(edit) @30993 [30993] 3 years davidb Placeholder page to provide useful links to hadoop and solr cluster …
(edit) @30992 [30992] 3 years davidb Additional adjustments after test run on cluster
(edit) @30991 [30991] 3 years davidb Inital cut at README notes, and supporting links
(edit) @30990 [30990] 3 years davidb opt name change
(edit) @30989 [30989] 3 years davidb Changes to better suit EF set used with solr
(edit) @30988 [30988] 3 years davidb Changed flag to 'read-only' and changed the filed name full text saved …
(edit) @30986 [30986] 3 years davidb Debugging for double accumulator added
(edit) @30985 [30985] 3 years davidb Changed to run main processing method as action rather than transform. …
(edit) @30984 [30984] 3 years davidb Introduction of Spark accumulator to measure progress. Output of POST …
(edit) @30983 [30983] 3 years davidb Useful helper script
(edit) @30982 [30982] 3 years davidb Fixed to host_name for solr2 and solr3
(edit) @30981 [30981] 3 years davidb Useful folder for 'on-the-side' packages
(edit) @30980 [30980] 3 years davidb Code added to read response
(edit) @30979 [30979] 3 years davidb _solr_url needs to be stored in class!
(edit) @30978 [30978] 3 years davidb Additional debug statements
(edit) @30977 [30977] 3 years davidb Only have RDD if an output directory was specified on the command-line …
(edit) @30976 [30976] 3 years davidb Change to reflect changed order of command-line arguments
(edit) @30975 [30975] 3 years davidb Introduction of new solr-url command line argument, leading to some other …
(edit) @30974 [30974] 3 years davidb update/add/doc JSON structure needed
(edit) @30973 [30973] 3 years davidb Changed to saving Solr JSON file for debugging purposes
(edit) @30972 [30972] 3 years davidb addition of useful command needed before re-running
(edit) @30971 [30971] 3 years davidb Adding in post to Solr cloud. Changed text_t to _text_
(edit) @30970 [30970] 3 years davidb Added in mapping of EF-JSON to Solr 'add' JSON format
(edit) @30969 [30969] 3 years davidb Fine tuning resulting from testing the cloud/cluster
(edit) @30962 [30962] 3 years davidb Corrections and improvements made after initial testing between zookeeper …
(edit) @30960 [30960] 3 years davidb Switch to using Puppet to provision machine. Strongly based on files …
(edit) @30957 [30957] 3 years davidb No longer needed. (Local copy taken on Windows laptop.)
(edit) @30956 [30956] 3 years davidb Initial commit of files for setting up with Vagrant a Solr cloud
(edit) @30953 [30953] 3 years davidb Need to specify _output_dir as part of output JSON filename
(edit) @30952 [30952] 3 years davidb Further text tidy up
(edit) @30951 [30951] 3 years davidb Save a JSONObject as a file in the output directory
(edit) @30950 [30950] 3 years davidb Tweak to text
(edit) @30949 [30949] 3 years davidb Use better name than 'foo'. Further fix to JSON name generated
(edit) @30947 [30947] 3 years davidb Correction to 'pages-' part of JSON.bz2 output filename used
(edit) @30946 [30946] 3 years davidb Correction to output JSON.bz2 name generated
(edit) @30945 [30945] 3 years davidb Getting closer to writing out JSON files
(edit) @30944 [30944] 3 years davidb Forcer higher partition (6) than default, which seems to be 2
(edit) @30943 [30943] 3 years davidb Extra debug info
(edit) @30942 [30942] 3 years davidb Improved output printing for slave node
(edit) @30941 [30941] 3 years davidb Moved to getFileSystemInstance() method to play nice on cluster
(edit) @30940 [30940] 3 years davidb Change to using URI not fileIn directly
(edit) @30939 [30939] 3 years davidb Minor tweaks
(edit) @30938 [30938] 3 years davidb Experiment with using Hadoop's FileSystem? class for local  file:// access
(edit) @30937 [30937] 3 years davidb Expanded set of ClusterFileIO methods
(edit) @30936 [30936] 3 years davidb Refinement of Spark Monitor echo statements
(edit) @30935 [30935] 3 years davidb Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
(edit) @30934 [30934] 3 years davidb Providing json-filelist now a compulsory argument, rather than an option
(edit) @30933 [30933] 3 years davidb More careful parsing of file prefix
(edit) @30932 [30932] 3 years davidb Support both  file:// and  hdfs://
(edit) @30931 [30931] 3 years davidb Version that runs using  fil:// tested
(edit) @30930 [30930] 3 years davidb Expansion of useful alias commands for Hadoop and Spark
(edit) @30929 [30929] 3 years davidb Tweaks made while testing the script
(edit) @30928 [30928] 3 years davidb Forgot to set json_filelist
(edit) @30927 [30927] 3 years davidb Fixed silly typo in stdout redirect
(edit) @30926 [30926] 3 years davidb Restructuring of RUN scripts to be more flexible
(edit) @30925 [30925] 3 years davidb Improved instrutions
(edit) @30924 [30924] 3 years davidb Tidy up of code. Removed commented out code
(edit) @30923 [30923] 3 years davidb Rough cut version that reads in each JSON file over HDFS
(edit) @30922 [30922] 3 years davidb Additional rough-cut notes
(edit) @30921 [30921] 3 years davidb Code change to read in JSON file over HDFS
(edit) @30919 [30919] 3 years davidb More consistent naming of folders used
(edit) @30918 [30918] 3 years davidb More flexible command-line args
(edit) @30917 [30917] 3 years davidb Changes resulting from a fresh run at provisioning, which yielded the …
(edit) @30916 [30916] 3 years davidb Some additional details -- note form
(edit) @30915 [30915] 3 years davidb Initial cut at instructions to follow to get code set up and running
(edit) @30914 [30914] 3 years davidb Tidy up of setup description
(edit) @30913 [30913] 3 years davidb Renaming to better represent what the cluster is designed for
(edit) @30912 [30912] 3 years davidb Changed to Unix style line-endings
(edit) @30911 [30911] 3 years davidb Changed name of input directory
(edit) @30910 [30910] 3 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30909 [30909] 3 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30908 [30908] 3 years davidb Additional finesse added in as a result of further testing on Vagrant …
(edit) @30907 [30907] 3 years davidb Name change to reflect need for 'bash' not 'sh'
(edit) @30906 [30906] 3 years davidb Bash version of BAT script
(edit) @30905 [30905] 3 years davidb Additional resources
(edit) @30904 [30904] 3 years davidb Extra resource/links added
(edit) @30903 [30903] 3 years davidb Vagrant provisioning files for a 4-node Hadoop cluster. See README.txt …
(edit) @30902 [30902] 3 years davidb Details of what packages are needed
(edit) @30901 [30901] 3 years davidb Template setup file
(edit) @30900 [30900] 3 years davidb For support Java packages
(edit) @30899 [30899] 3 years davidb Files for compilation using Eclipse
(edit) @30898 [30898] 3 years davidb Scripts for downloading sample JSON data from public domain extracted …
(edit) @30897 [30897] 3 years davidb Sub-project for converted HTRC Extract Feature dataset into a form that …
(add) @30890 [30890] 3 years davidb folder to group together hathitrust related projects
Note: See TracRevisionLog for help on using the revision log.