source: gs2-extensions/parallel-building

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @29276   10 years jmt12 I need to measure the time spent on generating the initial manifest, …
(edit) @29261   10 years jmt12 Removing some of the extraneous IO from high cpu importing... altering …
(edit) @29260   10 years jmt12 Replacing the obsolete call to util::file_lastmodified() with the …
(edit) @29259   10 years jmt12 Kea override allowing for fixed processor affinity if necessary …
(edit) @29258   10 years jmt12 Initial checkin of a new TDB infodb that allows each worker thread in …
(edit) @29257   10 years jmt12 Allow for collection configuration to be passed down to parallel …
(edit) @29243   10 years jmt12 Allowing for file linking to be disabled
(edit) @29162   10 years jmt12 The Lingua module for detecting syllables - used when determining …
(edit) @29161   10 years jmt12 Some modules aren't availalbe on cluster... add test and include path …
(edit) @29160   10 years jmt12 Adding blowfish encryption package to give text processing some work to do
(edit) @29158   10 years jmt12 Initial checkin of script to convert a number of Greenstone|| logs …
(edit) @29106   10 years jmt12 Check-in of script to symlink lorem files to matching files in another …
(edit) @29104   10 years jmt12 A script for extracting textual metrics from a collection of text …
(edit) @29103   10 years jmt12 updated - not any more efficient (Schlemiel the painter performance) …
(edit) @28779   10 years jmt12 Making timing message all sorts of purty
(edit) @28778   10 years jmt12 Typo - underscore where I meant hyphen
(edit) @28777   10 years jmt12 Need to include path to mpiimport on Medusa
(edit) @28771   10 years jmt12 A version of BasePlugout where the RSS feed update attempts to write …
(edit) @28770   10 years jmt12 Adding microtiming... a little tricky what with TDBServer taking …
(edit) @28769   10 years jmt12 No longer used. import.pl now smart enough to dynamically load …
(edit) @28768   10 years jmt12 Initially added microtime to this script, but then remembered it isn't …
(edit) @28767   10 years jmt12 Drastically increased the script to allow 1) battery of imports backed …
(edit) @28766   10 years jmt12 Removing an occasional few characters of garbage that turn up in the …
(edit) @28764   10 years jmt12 Adding microsecond timing messages
(edit) @28666   10 years jmt12 A script to transform a strace.out into a Tab separated file worthy of …
(edit) @28665   10 years jmt12 Latest changes to workaround resumed syscalls massive duration problem
(edit) @28654   10 years jmt12 Removed recordEarliestDatestamp() function as that no lurks in the …
(edit) @28653   10 years jmt12 Changed the way a require was 'eval'd - but I have no idea why
(edit) @28652   10 years jmt12 Changes to support running the reports over logs produced from …
(edit) @28649   10 years jmt12 A version of a Textfile reading plugin that has a configurable load …
(edit) @28648   10 years jmt12 Adding a short delay after writing to the flush_cache file just to …
(edit) @28647   10 years jmt12 Adding progress messages and making a debug message optional
(edit) @28646   10 years jmt12 A script that uses strace to produce IO metrics of a Greenstone import
(edit) @28645   10 years jmt12 Script to generate a report on data locality from GreenstoneHadoop logs
(edit) @28358   11 years jmt12 Replacing my earlier decision to only have data locality information …
(edit) @28357   11 years jmt12 used to update the data_locality.csv file in the case where other …
(edit) @28356   11 years jmt12 Support the legacy version of taskno in the data_locality.csv file (we …
(edit) @28312   11 years jmt12 Working on finer control over data locality - so I can configure a run …
(edit) @28192   11 years jmt12 Need to still output Greenstone messages to log otherwise I can't …
(edit) @28191   11 years jmt12 Removing redundant error stream redirect - this wasn't causing the …
(edit) @28190   11 years jmt12 Had accidently hardcoded the max replication number - allow it to be …
(edit) @28189   11 years jmt12 Replace the newer (and faster) while(@file) loop with the older (and …
(edit) @28188   11 years jmt12 Minor fix to allow for tasks that start in the same second (now each …
(edit) @28187   11 years jmt12 A customized version of Kea.pm that looks in the correct place for …
(edit) @28186   11 years jmt12 A (failed) attempt to use the unix iotop tool to determine IO percentage
(edit) @28018   11 years jmt12 Try really hard to capture the output from 'time' function as Medusa …
(edit) @28017   11 years jmt12 Forgot to add processing comment before call to hadoop_import.pl
(edit) @28016   11 years jmt12 Allow the hadoop report generator to parse start and end times …
(edit) @28015   11 years jmt12 Add an extra option that allows me to pass in the directory to write …
(edit) @28014   11 years jmt12 Remove tasks that have had data locality established from the array of …
(edit) @28013   11 years jmt12 A new script to run a battery of Hadoop ingests at varying replication …
(edit) @28012   11 years jmt12 Express start time as a double as well
(edit) @28011   11 years jmt12 Turn off debugging in the copy in SVN
(edit) @28010   11 years jmt12 Correctly set up the environment for calls to txt2tdb and also replace …
(edit) @28001   11 years jmt12 Write datestamp using dbutil if applicable
(edit) @27996   11 years jmt12 A new version of the archive with minor changes to log4j configuration
(edit) @27995   11 years jmt12 Just adding some code comments
(edit) @27915   11 years jmt12 A new PlugOut that doesn't write any intermediate files (bar those …
(edit) @27914   11 years jmt12 Trying to get around a couple of divide-by-zero issues when generating …
(edit) @27913   11 years jmt12 Made the ingester to be used (version 1 without reduce phase, or …
(edit) @27912   11 years jmt12 Modified the compilation to include the new ingester and its co-requisites.
(edit) @27911   11 years jmt12 Modified the compilation to include the new ingester and its co-requisites
(edit) @27910   11 years jmt12 Extended the existing HadoopGreenstoneIngest with proper Reduce phase …
(edit) @27753   11 years jmt12 Adding Handbrake's percentage complete to report - although this is …
(edit) @27752   11 years jmt12 Data locality file not being found is no longer fatal (HDFS-NFS-Proxy …
(edit) @27732   11 years jmt12 Nice the copy itself too
(edit) @27686   11 years jmt12 A little more progress comments
(edit) @27685   11 years jmt12 in the case of multiple attempts you need to retain the information …
(edit) @27684   11 years jmt12 Adding natural sorting into report generation - so also needed to add …
(edit) @27683   11 years jmt12 moving a few more headings around to help with information block layout
(edit) @27682   11 years jmt12 Copying makeAllDirectories() from vanilla FileUtils.pm
(edit) @27669   11 years jmt12 Sort compute nodes naturally before labelling them with incremental …
(edit) @27654   11 years jmt12 Add the ability to stagger the starting of Mappers by placing a …
(edit) @27653   11 years jmt12 Forgot to pull self off the head of arguments
(edit) @27652   11 years jmt12 Changing buffer to 128K (slightly faster) and adding a comment …
(edit) @27651   11 years jmt12
(edit) @27650   11 years jmt12
(edit) @27649   11 years jmt12 No longer in SVN control
(edit) @27648   11 years jmt12 Template for setup.bash - a user will have to populate Hadoop fields
(edit) @27645   11 years jmt12
(edit) @27644   11 years jmt12 Extended to support HDFS-access via NFS. This applies to both the call …
(edit) @27643   11 years jmt12 Changed the script generator so it can recurse through directories and …
(edit) @27642   11 years jmt12 A script I downloaded that successfully splits video files - something …
(edit) @27641   11 years jmt12 Altered order of arguments and allow archives dir to be passed as …
(edit) @27640   11 years jmt12
(edit) @27638   11 years jmt12 Change it so failure to open a filehandle isn't fatal - leave it up to …
(edit) @27631   11 years jmt12 A proxy to allow NFS access to HDFS
(edit) @27595   11 years jmt12 Updating list of untarred directories to ignore
(edit) @27594   11 years jmt12 Extend hadoop_import.pl to be able to start and stop the Thrift server(s)
(edit) @27593   11 years jmt12 Need Class Accessor for Thrift client under Rocks
(edit) @27592   11 years jmt12 Adding in a script to allow a daemon version of Thrift to be started …
(edit) @27591   11 years jmt12 Ensure Thrift will, be default, attempt to connect to the local …
(edit) @27590   11 years jmt12 Adding statistics about data locality, and highlighting tasks where …
(edit) @27589   11 years jmt12 Fixing up some minor bugs in regex's
(edit) @27588   11 years jmt12 Extend parser to support jobs that are split over several logs. Also …
(edit) @27587   11 years jmt12 Allow debug mode to be enabled from the command line
(edit) @27586   11 years jmt12 Updating script to date date of hadoop job into account when searching …
(edit) @27585   11 years jmt12 The perl on Medusa won't let you immediately treat a returned array in …
(edit) @27584   11 years jmt12 I wasn't doing -r when attempting to clear directories left in /tmp by …
(edit) @27583   11 years jmt12 Adding code to differentiate between workers in a cluster - all of …
Note: See TracRevisionLog for help on using the revision log.