root/gs2-extensions/parallel-building

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @29661 [29661] 5 years jmt12 A helper script to clean-up the bogus directories sometimes created by …
(edit) @29660 [29660] 5 years jmt12 making the debug variable global... can't remember why though
(edit) @29649 [29649] 5 years jmt12 Perseus was an attempt to add functionality to automatically and remote …
(edit) @29276 [29276] 5 years jmt12 I need to measure the time spent on generating the initial manifest, as …
(edit) @29261 [29261] 5 years jmt12 Removing some of the extraneous IO from high cpu importing... altering the …
(edit) @29260 [29260] 5 years jmt12 Replacing the obsolete call to util::file_lastmodified() with the newer …
(edit) @29259 [29259] 5 years jmt12 Kea override allowing for fixed processor affinity if necessary (commented …
(edit) @29258 [29258] 5 years jmt12 Initial checkin of a new TDB infodb that allows each worker thread in a …
(edit) @29257 [29257] 5 years jmt12 Allow for collection configuration to be passed down to parallel import …
(edit) @29243 [29243] 5 years jmt12 Allowing for file linking to be disabled
(edit) @29162 [29162] 5 years jmt12 The Lingua module for detecting syllables - used when determining …
(edit) @29161 [29161] 5 years jmt12 Some modules aren't availalbe on cluster... add test and include path to …
(edit) @29160 [29160] 5 years jmt12 Adding blowfish encryption package to give text processing some work to do
(edit) @29158 [29158] 5 years jmt12 Initial checkin of script to convert a number of Greenstone|| logs into a …
(edit) @29106 [29106] 5 years jmt12 Check-in of script to symlink lorem files to matching files in another …
(edit) @29104 [29104] 5 years jmt12 A script for extracting textual metrics from a collection of text files …
(edit) @29103 [29103] 5 years jmt12 updated - not any more efficient (Schlemiel the painter performance) but …
(edit) @28779 [28779] 6 years jmt12 Making timing message all sorts of purty
(edit) @28778 [28778] 6 years jmt12 Typo - underscore where I meant hyphen
(edit) @28777 [28777] 6 years jmt12 Need to include path to mpiimport on Medusa
(edit) @28771 [28771] 6 years jmt12 A version of BasePlugout? where the RSS feed update attempts to write …
(edit) @28770 [28770] 6 years jmt12 Adding microtiming... a little tricky what with TDBServer taking forever …
(edit) @28769 [28769] 6 years jmt12 No longer used. import.pl now smart enough to dynamically load …
(edit) @28768 [28768] 6 years jmt12 Initially added microtime to this script, but then remembered it isn't …
(edit) @28767 [28767] 6 years jmt12 Drastically increased the script to allow 1) battery of imports backed by …
(edit) @28766 [28766] 6 years jmt12 Removing an occasional few characters of garbage that turn up in the log …
(edit) @28764 [28764] 6 years jmt12 Adding microsecond timing messages
(edit) @28666 [28666] 6 years jmt12 A script to transform a strace.out into a Tab separated file worthy of …
(edit) @28665 [28665] 6 years jmt12 Latest changes to workaround resumed syscalls massive duration problem
(edit) @28654 [28654] 6 years jmt12 Removed recordEarliestDatestamp() function as that no lurks in the …
(edit) @28653 [28653] 6 years jmt12 Changed the way a require was 'eval'd - but I have no idea why
(edit) @28652 [28652] 6 years jmt12 Changes to support running the reports over logs produced from multicore …
(edit) @28649 [28649] 6 years jmt12 A version of a Textfile reading plugin that has a configurable load …
(edit) @28648 [28648] 6 years jmt12 Adding a short delay after writing to the flush_cache file just to ensure …
(edit) @28647 [28647] 6 years jmt12 Adding progress messages and making a debug message optional
(edit) @28646 [28646] 6 years jmt12 A script that uses strace to produce IO metrics of a Greenstone import
(edit) @28645 [28645] 6 years jmt12 Script to generate a report on data locality from GreenstoneHadoop? logs
(edit) @28358 [28358] 6 years jmt12 Replacing my earlier decision to only have data locality information …
(edit) @28357 [28357] 6 years jmt12 used to update the data_locality.csv file in the case where other …
(edit) @28356 [28356] 6 years jmt12 Support the legacy version of taskno in the data_locality.csv file (we now …
(edit) @28312 [28312] 6 years jmt12 Working on finer control over data locality - so I can configure a run …
(edit) @28192 [28192] 6 years jmt12 Need to still output Greenstone messages to log otherwise I can't …
(edit) @28191 [28191] 6 years jmt12 Removing redundant error stream redirect - this wasn't causing the issue I …
(edit) @28190 [28190] 6 years jmt12 Had accidently hardcoded the max replication number - allow it to be …
(edit) @28189 [28189] 6 years jmt12 Replace the newer (and faster) while(@file) loop with the older (and more …
(edit) @28188 [28188] 6 years jmt12 Minor fix to allow for tasks that start in the same second (now each …
(edit) @28187 [28187] 6 years jmt12 A customized version of Kea.pm that looks in the correct place for newer …
(edit) @28186 [28186] 6 years jmt12 A (failed) attempt to use the unix iotop tool to determine IO percentage
(edit) @28018 [28018] 6 years jmt12 Try really hard to capture the output from 'time' function as Medusa lets …
(edit) @28017 [28017] 6 years jmt12 Forgot to add processing comment before call to hadoop_import.pl
(edit) @28016 [28016] 6 years jmt12 Allow the hadoop report generator to parse start and end times expressed …
(edit) @28015 [28015] 6 years jmt12 Add an extra option that allows me to pass in the directory to write log …
(edit) @28014 [28014] 6 years jmt12 Remove tasks that have had data locality established from the array of …
(edit) @28013 [28013] 6 years jmt12 A new script to run a battery of Hadoop ingests at varying replication …
(edit) @28012 [28012] 6 years jmt12 Express start time as a double as well
(edit) @28011 [28011] 6 years jmt12 Turn off debugging in the copy in SVN
(edit) @28010 [28010] 6 years jmt12 Correctly set up the environment for calls to txt2tdb and also replace …
(edit) @28001 [28001] 6 years jmt12 Write datestamp using dbutil if applicable
(edit) @27996 [27996] 6 years jmt12 A new version of the archive with minor changes to log4j configuration
(edit) @27995 [27995] 6 years jmt12 Just adding some code comments
(edit) @27915 [27915] 6 years jmt12 A new PlugOut? that doesn't write any intermediate files (bar those …
(edit) @27914 [27914] 6 years jmt12 Trying to get around a couple of divide-by-zero issues when generating …
(edit) @27913 [27913] 6 years jmt12 Made the ingester to be used (version 1 without reduce phase, or version 2 …
(edit) @27912 [27912] 6 years jmt12 Modified the compilation to include the new ingester and its …
(edit) @27911 [27911] 6 years jmt12 Modified the compilation to include the new ingester and its co-requisites
(edit) @27910 [27910] 6 years jmt12 Extended the existing HadoopGreenstoneIngest? with proper Reduce phase - …
(edit) @27753 [27753] 6 years jmt12 Adding Handbrake's percentage complete to report - although this is …
(edit) @27752 [27752] 6 years jmt12 Data locality file not being found is no longer fatal (HDFS-NFS-Proxy …
(edit) @27732 [27732] 6 years jmt12 Nice the copy itself too
(edit) @27686 [27686] 6 years jmt12 A little more progress comments
(edit) @27685 [27685] 6 years jmt12 in the case of multiple attempts you need to retain the information about …
(edit) @27684 [27684] 6 years jmt12 Adding natural sorting into report generation - so also needed to add INC …
(edit) @27683 [27683] 6 years jmt12 moving a few more headings around to help with information block layout
(edit) @27682 [27682] 6 years jmt12 Copying makeAllDirectories() from vanilla FileUtils?.pm
(edit) @27669 [27669] 6 years jmt12 Sort compute nodes naturally before labelling them with incremental worker …
(edit) @27654 [27654] 6 years jmt12 Add the ability to stagger the starting of Mappers by placing a 'delay.me' …
(edit) @27653 [27653] 6 years jmt12 Forgot to pull self off the head of arguments
(edit) @27652 [27652] 6 years jmt12 Changing buffer to 128K (slightly faster) and adding a comment explaining …
(edit) @27651 [27651] 6 years jmt12
(edit) @27650 [27650] 6 years jmt12
(edit) @27649 [27649] 6 years jmt12 No longer in SVN control
(edit) @27648 [27648] 6 years jmt12 Template for setup.bash - a user will have to populate Hadoop fields
(edit) @27645 [27645] 6 years jmt12
(edit) @27644 [27644] 6 years jmt12 Extended to support HDFS-access via NFS. This applies to both the call to …
(edit) @27643 [27643] 6 years jmt12 Changed the script generator so it can recurse through directories and …
(edit) @27642 [27642] 6 years jmt12 A script I downloaded that successfully splits video files - something I …
(edit) @27641 [27641] 6 years jmt12 Altered order of arguments and allow archives dir to be passed as argument …
(edit) @27640 [27640] 6 years jmt12
(edit) @27638 [27638] 6 years jmt12 Change it so failure to open a filehandle isn't fatal - leave it up to the …
(edit) @27631 [27631] 6 years jmt12 A proxy to allow NFS access to HDFS
(edit) @27595 [27595] 7 years jmt12 Updating list of untarred directories to ignore
(edit) @27594 [27594] 7 years jmt12 Extend hadoop_import.pl to be able to start and stop the Thrift server(s)
(edit) @27593 [27593] 7 years jmt12 Need Class Accessor for Thrift client under Rocks
(edit) @27592 [27592] 7 years jmt12 Adding in a script to allow a daemon version of Thrift to be started (and …
(edit) @27591 [27591] 7 years jmt12 Ensure Thrift will, be default, attempt to connect to the local machine …
(edit) @27590 [27590] 7 years jmt12 Adding statistics about data locality, and highlighting tasks where file …
(edit) @27589 [27589] 7 years jmt12 Fixing up some minor bugs in regex's
(edit) @27588 [27588] 7 years jmt12 Extend parser to support jobs that are split over several logs. Also …
(edit) @27587 [27587] 7 years jmt12 Allow debug mode to be enabled from the command line
(edit) @27586 [27586] 7 years jmt12 Updating script to date date of hadoop job into account when searching for …
Note: See TracRevisionLog for help on using the revision log.