source: gs2-extensions/parallel-building/trunk/src/bin

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @30354   8 years jmt12 Extending manifest v2 support to allow for directories to be listed in …
(edit) @30306   9 years jmt12 Making the setup of CPAN path more robust based on the better control …
(edit) @29663   9 years jmt12 Supporting grayscale printing, fixing mismatched tags and speechmarks, …
(edit) @29662   9 years jmt12 Now removes building and index directories if found
(edit) @29661   9 years jmt12 A helper script to clean-up the bogus directories sometimes created by …
(edit) @29158   10 years jmt12 Initial checkin of script to convert a number of Greenstone|| logs …
(edit) @29106   10 years jmt12 Check-in of script to symlink lorem files to matching files in another …
(edit) @29104   10 years jmt12 A script for extracting textual metrics from a collection of text …
(edit) @29103   10 years jmt12 updated - not any more efficient (Schlemiel the painter performance) …
(edit) @28769   10 years jmt12 No longer used. import.pl now smart enough to dynamically load …
(edit) @28768   10 years jmt12 Initially added microtime to this script, but then remembered it isn't …
(edit) @28767   10 years jmt12 Drastically increased the script to allow 1) battery of imports backed …
(edit) @28766   10 years jmt12 Removing an occasional few characters of garbage that turn up in the …
(edit) @28764   10 years jmt12 Adding microsecond timing messages
(edit) @28666   10 years jmt12 A script to transform a strace.out into a Tab separated file worthy of …
(edit) @28665   10 years jmt12 Latest changes to workaround resumed syscalls massive duration problem
(edit) @28652   10 years jmt12 Changes to support running the reports over logs produced from …
(edit) @28648   10 years jmt12 Adding a short delay after writing to the flush_cache file just to …
(edit) @28647   10 years jmt12 Adding progress messages and making a debug message optional
(edit) @28646   10 years jmt12 A script that uses strace to produce IO metrics of a Greenstone import
(edit) @28645   10 years jmt12 Script to generate a report on data locality from GreenstoneHadoop logs
(edit) @28358   11 years jmt12 Replacing my earlier decision to only have data locality information …
(edit) @28357   11 years jmt12 used to update the data_locality.csv file in the case where other …
(edit) @28356   11 years jmt12 Support the legacy version of taskno in the data_locality.csv file (we …
(edit) @28191   11 years jmt12 Removing redundant error stream redirect - this wasn't causing the …
(edit) @28190   11 years jmt12 Had accidently hardcoded the max replication number - allow it to be …
(edit) @28189   11 years jmt12 Replace the newer (and faster) while(@file) loop with the older (and …
(edit) @28188   11 years jmt12 Minor fix to allow for tasks that start in the same second (now each …
(edit) @28186   11 years jmt12 A (failed) attempt to use the unix iotop tool to determine IO percentage
(edit) @28018   11 years jmt12 Try really hard to capture the output from 'time' function as Medusa …
(edit) @28017   11 years jmt12 Forgot to add processing comment before call to hadoop_import.pl
(edit) @28016   11 years jmt12 Allow the hadoop report generator to parse start and end times …
(edit) @28015   11 years jmt12 Add an extra option that allows me to pass in the directory to write …
(edit) @28014   11 years jmt12 Remove tasks that have had data locality established from the array of …
(edit) @28013   11 years jmt12 A new script to run a battery of Hadoop ingests at varying replication …
(edit) @27914   11 years jmt12 Trying to get around a couple of divide-by-zero issues when generating …
(edit) @27913   11 years jmt12 Made the ingester to be used (version 1 without reduce phase, or …
(edit) @27753   11 years jmt12 Adding Handbrake's percentage complete to report - although this is …
(edit) @27752   11 years jmt12 Data locality file not being found is no longer fatal (HDFS-NFS-Proxy …
(edit) @27732   11 years jmt12 Nice the copy itself too
(edit) @27686   11 years jmt12 A little more progress comments
(edit) @27685   11 years jmt12 in the case of multiple attempts you need to retain the information …
(edit) @27684   11 years jmt12 Adding natural sorting into report generation - so also needed to add …
(edit) @27683   11 years jmt12 moving a few more headings around to help with information block layout
(edit) @27669   11 years jmt12 Sort compute nodes naturally before labelling them with incremental …
(edit) @27654   11 years jmt12 Add the ability to stagger the starting of Mappers by placing a …
(edit) @27644   11 years jmt12 Extended to support HDFS-access via NFS. This applies to both the call …
(edit) @27643   11 years jmt12 Changed the script generator so it can recurse through directories and …
(edit) @27642   11 years jmt12 A script I downloaded that successfully splits video files - something …
(edit) @27594   11 years jmt12 Extend hadoop_import.pl to be able to start and stop the Thrift server(s)
(edit) @27590   11 years jmt12 Adding statistics about data locality, and highlighting tasks where …
(edit) @27589   11 years jmt12 Fixing up some minor bugs in regex's
(edit) @27588   11 years jmt12 Extend parser to support jobs that are split over several logs. Also …
(edit) @27587   11 years jmt12 Allow debug mode to be enabled from the command line
(edit) @27586   11 years jmt12 Updating script to date date of hadoop job into account when searching …
(edit) @27585   11 years jmt12 The perl on Medusa won't let you immediately treat a returned array in …
(edit) @27584   11 years jmt12 I wasn't doing -r when attempting to clear directories left in /tmp by …
(edit) @27583   11 years jmt12 Adding code to differentiate between workers in a cluster - all of …
(edit) @27560   11 years jmt12 Fixing typo in regexp that meant filenames sometimes ignored
(edit) @27559   11 years jmt12 Changed mime-type away from binary - I hope. Meanwhile, generate …
(edit) @27551   11 years jmt12 Altered so that it expects to be given a CSV containing parallel …
(edit) @27550   11 years jmt12 Ensure the hostname is added to the Hadoop logs so we can identify the …
(edit) @27549   11 years jmt12 Extract information from the logs generated by parallel Greenstone …
(edit) @27548   11 years jmt12 Extract information from the logs generated by parallel Greenstone …
(edit) @27543   11 years jmt12 Adding generate_gantt.pl script in its original form - i.e. directly …
(edit) @27530   11 years jmt12 Clear out old logs, and adding more comments about what the script is …
(edit) @27515   11 years jmt12 Making the file used durig buffertes be configurable
(edit) @27512   11 years jmt12 Adding in a special test for measuring the effect of altering ThriftFS …
(edit) @27495   11 years jmt12 removing doubled up debug comments and putting some paths in …
(edit) @27481   11 years jmt12 Adding makeAllDirectories() (which I'd only implemented in LocalFS) to …
(edit) @27480   11 years jmt12 Removing DateTime dependency (so HDFSShell will always fail …
(edit) @27436   11 years jmt12 Adding the actual script - rather than a symlink to my dropbox. doh
(edit) @27435   11 years jmt12 Gah - only a symbolic link
(edit) @27414   11 years jmt12 Allowing more processing arguments to be configured at the call, and …
(edit) @27412   11 years jmt12 I obviously hadn't run this script on Karearea before - assumed all …
(edit) @27409   11 years jmt12 Unit test like testing for the FileUtils class and LocalFS, HDFSShell, …
(edit) @27408   11 years jmt12 A symbolic link to the actual script in the packages directory
(edit) @27378   11 years jmt12 Parallel processing support now added (via buildcolutil subclass) to …
(edit) @27126   11 years jmt12 Extra clean up commands (like removing cached versions of video …
(edit) @27125   11 years jmt12 A script to try and flush all caches - I'm certain it's flushing disk …
(edit) @27124   11 years jmt12 Use the new perl version script to extract the version number - so as …
(edit) @27119   11 years jmt12 Merging version finder from Medusa with the one lurking on Karearea
(edit) @27058   11 years jmt12 Adding data locality report generation to Hadoop greenstone imports
(edit) @27052   11 years jmt12 Turns out the Perl on Medusa doesn't support $V, so I've had to …
(edit) @27041   11 years jmt12 INC path now includes the installed extensions perl path (including …
(edit) @27040   11 years jmt12 A simple script that returns just the version number of Perl
(edit) @27036   11 years jmt12 A script to extract data locality and other task information from the …
(edit) @27006   11 years jmt12 A companion script to stop-hadoop-processes that just reports running …
(edit) @27005   11 years jmt12 Similar to stop-impt.pl, this script uses kill to stop runaway Hadoop …
(edit) @27004   11 years jmt12 A script to stop (using kill) a runaway import process and any related …
(edit) @27001   11 years jmt12 Passing more environment variables (HADOOPPREFIX, HDFSHOST, HDFSPORT) …
(edit) @26999   11 years jmt12 Ensuring MPI binds to correct interface, and passing through …
(edit) @26998   11 years jmt12 Adding maxdocs variable, lots of debug comments, added some tests for …
(edit) @26953   11 years jmt12 Checking in the script rather than a symbolic link to the script :P
(edit) @26952   11 years jmt12 Accidentally checked in symbolic link rather than script
(edit) @26949   11 years jmt12 Parallel import using Hadoop
(edit) @26930   11 years jmt12 Randomized order of files, and added the ability to specify a maximum …
(edit) @26929   11 years jmt12 A script to comprehensively clean up a collection between imports... …
(edit) @26923   11 years jmt12 Generates a specficied-size subset of a larger import directory
(edit) @26242   12 years jmt12 Modifications to progress messages to improve extracting information …
Note: See TracRevisionLog for help on using the revision log.