source: other-projects/hathitrust/solr-extracted-features/trunk/src

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @31013   7 years davidb Accumulator for PerPageMap
(edit) @31011   7 years davidb Further RDD flatMap/map restructuring and refactoring, for per-page
(edit) @31010   7 years davidb Tidy up on generating Spark App name
(edit) @31008   7 years davidb Additional detail added into Spark app name
(edit) @31007   7 years davidb Class name refactoring
(edit) @31006   7 years davidb Further reversal of Base class. Switch to PerPage
(edit) @31005   7 years davidb Reversal of Base class in PerVolumeJSON
(edit) @31004   7 years davidb added debug
(edit) @31003   7 years davidb Explicity default constructors added
(edit) @31002   7 years davidb Need to separate flatMap and foreach calls in PagedJSON
(edit) @31001   7 years davidb Code to work per-volume and per-page
(edit) @30998   7 years davidb Class name refactoring
(edit) @30997   7 years davidb Verbosity control over printing
(edit) @30996   7 years davidb Code refactoring
(edit) @30995   7 years davidb Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit) @30990   7 years davidb opt name change
(edit) @30988   7 years davidb Changed flag to 'read-only' and changed the filed name full text saved …
(edit) @30986   7 years davidb Debugging for double accumulator added
(edit) @30985   7 years davidb Changed to run main processing method as action rather than transform. …
(edit) @30984   7 years davidb Introduction of Spark accumulator to measure progress. Output of POST …
(edit) @30980   7 years davidb Code added to read response
(edit) @30979   7 years davidb _solr_url needs to be stored in class!
(edit) @30978   7 years davidb Additional debug statements
(edit) @30977   7 years davidb Only have RDD if an output directory was specified on the command-line …
(edit) @30976   7 years davidb Change to reflect changed order of command-line arguments
(edit) @30975   7 years davidb Introduction of new solr-url command line argument, leading to some …
(edit) @30974   7 years davidb update/add/doc JSON structure needed
(edit) @30973   7 years davidb Changed to saving Solr JSON file for debugging purposes
(edit) @30971   7 years davidb Adding in post to Solr cloud. Changed text_t to _text_
(edit) @30970   7 years davidb Added in mapping of EF-JSON to Solr 'add' JSON format
(edit) @30953   7 years davidb Need to specify _output_dir as part of output JSON filename
(edit) @30951   7 years davidb Save a JSONObject as a file in the output directory
(edit) @30949   7 years davidb Use better name than 'foo'. Further fix to JSON name generated
(edit) @30947   7 years davidb Correction to 'pages-' part of JSON.bz2 output filename used
(edit) @30946   7 years davidb Correction to output JSON.bz2 name generated
(edit) @30945   7 years davidb Getting closer to writing out JSON files
(edit) @30944   7 years davidb Forcer higher partition (6) than default, which seems to be 2
(edit) @30943   7 years davidb Extra debug info
(edit) @30942   7 years davidb Improved output printing for slave node
(edit) @30941   7 years davidb Moved to getFileSystemInstance() method to play nice on cluster
(edit) @30940   7 years davidb Change to using URI not fileIn directly
(edit) @30938   7 years davidb Experiment with using Hadoop's FileSystem class for local file:// access
(edit) @30937   7 years davidb Expanded set of ClusterFileIO methods
(edit) @30934   7 years davidb Providing json-filelist now a compulsory argument, rather than an option
(edit) @30933   7 years davidb More careful parsing of file prefix
(edit) @30932   7 years davidb Support both file:// and hdfs://
(edit) @30924   7 years davidb Tidy up of code. Removed commented out code
(edit) @30921   7 years davidb Code change to read in JSON file over HDFS
(edit) @30918   7 years davidb More flexible command-line args
(add) @30898   7 years davidb Scripts for downloading sample JSON data from public domain extracted …
Note: See TracRevisionLog for help on using the revision log.