root/other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @31044 [31044] 3 years davidb Fixed up error when output_dir is empty
(edit) @31043 [31043] 3 years davidb Version for processing full EF set
(edit) @31042 [31042] 3 years davidb Name changes, preparing the way for FULL-RUN versions
(edit) @31041 [31041] 3 years davidb Test needs to be more careful if -read-only specified
(edit) @31036 [31036] 3 years davidb Renaming to prepare way for YARN version of script
(edit) @31035 [31035] 3 years davidb Changes after testing scripts
(edit) @31034 [31034] 3 years davidb Development of scripts for working with Full EF dataset
(edit) @31033 [31033] 3 years davidb Development of scripts for working with Full EF dataset
(edit) @31030 [31030] 3 years davidb Tweak to some verbosity level 2 printing
(edit) @31029 [31029] 3 years davidb Newline at end of file added
(edit) @31028 [31028] 3 years davidb Support for randonly choosing Solr endpoints added in
(edit) @31027 [31027] 3 years davidb Mixed typo in property name used
(edit) @31026 [31026] 3 years davidb Corrected flag setting
(edit) @31025 [31025] 3 years davidb Use property process-json-mode to determine which sort of Spark mapping to …
(edit) @31024 [31024] 3 years davidb Support for Java properties file
(edit) @31022 [31022] 3 years davidb No longer used
(edit) @31021 [31021] 3 years davidb Folder restructure to remove 'trunk' part
(edit) @31020 [31020] 3 years davidb No longer used
(edit) @31019 [31019] 3 years davidb Part 2 or two-step folder restructure
(edit) @31018 [31018] 3 years davidb Part 1 or two-step folder restructure
(edit) @31017 [31017] 3 years davidb Moved to correct position
(edit) @31016 [31016] 3 years davidb No longer used
(edit) @31015 [31015] 3 years davidb Restructuring of projects into one
(edit) @31013 [31013] 3 years davidb Accumulator for PerPageMap?
(edit) @31011 [31011] 3 years davidb Further RDD flatMap/map restructuring and refactoring, for per-page
(edit) @31010 [31010] 3 years davidb Tidy up on generating Spark App name
(edit) @31009 [31009] 3 years davidb Adjustments after latest fresh 'vagrant up' trial
(edit) @31008 [31008] 3 years davidb Additional detail added into Spark app name
(edit) @31007 [31007] 3 years davidb Class name refactoring
(edit) @31006 [31006] 3 years davidb Further reversal of Base class. Switch to PerPage?
(edit) @31005 [31005] 3 years davidb Reversal of Base class in PerVolumeJSON
(edit) @31004 [31004] 3 years davidb added debug
(edit) @31003 [31003] 3 years davidb Explicity default constructors added
(edit) @31002 [31002] 3 years davidb Need to separate flatMap and foreach calls in PagedJSON
(edit) @31001 [31001] 3 years davidb Code to work per-volume and per-page
(edit) @31000 [31000] 3 years davidb Class name refactoring
(edit) @30999 [30999] 3 years davidb Class name refactoring
(edit) @30998 [30998] 3 years davidb Class name refactoring
(edit) @30997 [30997] 3 years davidb Verbosity control over printing
(edit) @30996 [30996] 3 years davidb Code refactoring
(edit) @30995 [30995] 3 years davidb Adjustment of NUM_PARTITIONS to be based on Spark recommended calculation
(edit) @30994 [30994] 3 years davidb Additional useful links. Links open in new tab
(edit) @30993 [30993] 3 years davidb Placeholder page to provide useful links to hadoop and solr cluster …
(edit) @30992 [30992] 3 years davidb Additional adjustments after test run on cluster
(edit) @30991 [30991] 3 years davidb Inital cut at README notes, and supporting links
(edit) @30990 [30990] 3 years davidb opt name change
(edit) @30989 [30989] 3 years davidb Changes to better suit EF set used with solr
(edit) @30988 [30988] 3 years davidb Changed flag to 'read-only' and changed the filed name full text saved …
(edit) @30986 [30986] 3 years davidb Debugging for double accumulator added
(edit) @30985 [30985] 3 years davidb Changed to run main processing method as action rather than transform. …
(edit) @30984 [30984] 3 years davidb Introduction of Spark accumulator to measure progress. Output of POST …
(edit) @30983 [30983] 3 years davidb Useful helper script
(edit) @30982 [30982] 3 years davidb Fixed to host_name for solr2 and solr3
(edit) @30981 [30981] 3 years davidb Useful folder for 'on-the-side' packages
(edit) @30980 [30980] 3 years davidb Code added to read response
(edit) @30979 [30979] 3 years davidb _solr_url needs to be stored in class!
(edit) @30978 [30978] 3 years davidb Additional debug statements
(edit) @30977 [30977] 3 years davidb Only have RDD if an output directory was specified on the command-line …
(edit) @30976 [30976] 3 years davidb Change to reflect changed order of command-line arguments
(edit) @30975 [30975] 3 years davidb Introduction of new solr-url command line argument, leading to some other …
(edit) @30974 [30974] 3 years davidb update/add/doc JSON structure needed
(edit) @30973 [30973] 3 years davidb Changed to saving Solr JSON file for debugging purposes
(edit) @30972 [30972] 3 years davidb addition of useful command needed before re-running
(edit) @30971 [30971] 3 years davidb Adding in post to Solr cloud. Changed text_t to _text_
(edit) @30970 [30970] 3 years davidb Added in mapping of EF-JSON to Solr 'add' JSON format
(edit) @30969 [30969] 3 years davidb Fine tuning resulting from testing the cloud/cluster
(edit) @30962 [30962] 3 years davidb Corrections and improvements made after initial testing between zookeeper …
(edit) @30960 [30960] 3 years davidb Switch to using Puppet to provision machine. Strongly based on files …
(edit) @30957 [30957] 3 years davidb No longer needed. (Local copy taken on Windows laptop.)
(edit) @30956 [30956] 3 years davidb Initial commit of files for setting up with Vagrant a Solr cloud
(edit) @30953 [30953] 3 years davidb Need to specify _output_dir as part of output JSON filename
(edit) @30952 [30952] 3 years davidb Further text tidy up
(edit) @30951 [30951] 3 years davidb Save a JSONObject as a file in the output directory
(edit) @30950 [30950] 3 years davidb Tweak to text
(edit) @30949 [30949] 3 years davidb Use better name than 'foo'. Further fix to JSON name generated
(edit) @30947 [30947] 3 years davidb Correction to 'pages-' part of JSON.bz2 output filename used
(edit) @30946 [30946] 3 years davidb Correction to output JSON.bz2 name generated
(edit) @30945 [30945] 3 years davidb Getting closer to writing out JSON files
(edit) @30944 [30944] 3 years davidb Forcer higher partition (6) than default, which seems to be 2
(edit) @30943 [30943] 3 years davidb Extra debug info
(edit) @30942 [30942] 3 years davidb Improved output printing for slave node
(edit) @30941 [30941] 3 years davidb Moved to getFileSystemInstance() method to play nice on cluster
(edit) @30940 [30940] 3 years davidb Change to using URI not fileIn directly
(edit) @30939 [30939] 3 years davidb Minor tweaks
(edit) @30938 [30938] 3 years davidb Experiment with using Hadoop's FileSystem? class for local  file:// access
(edit) @30937 [30937] 3 years davidb Expanded set of ClusterFileIO methods
(edit) @30936 [30936] 3 years davidb Refinement of Spark Monitor echo statements
(edit) @30935 [30935] 3 years davidb Fixed variable name typo, plus added a couple of 'sleep' pauses of 1 sec
(edit) @30934 [30934] 3 years davidb Providing json-filelist now a compulsory argument, rather than an option
(edit) @30933 [30933] 3 years davidb More careful parsing of file prefix
(edit) @30932 [30932] 3 years davidb Support both  file:// and  hdfs://
(edit) @30931 [30931] 3 years davidb Version that runs using  fil:// tested
(edit) @30930 [30930] 3 years davidb Expansion of useful alias commands for Hadoop and Spark
(edit) @30929 [30929] 3 years davidb Tweaks made while testing the script
(edit) @30928 [30928] 3 years davidb Forgot to set json_filelist
(edit) @30927 [30927] 3 years davidb Fixed silly typo in stdout redirect
(edit) @30926 [30926] 3 years davidb Restructuring of RUN scripts to be more flexible
(edit) @30925 [30925] 3 years davidb Improved instrutions
(edit) @30924 [30924] 3 years davidb Tidy up of code. Removed commented out code
(edit) @30923 [30923] 3 years davidb Rough cut version that reads in each JSON file over HDFS
Note: See TracRevisionLog for help on using the revision log.