# # ChangeLog for gs2-extensions/parallel-building # # Generated by Trac 1.4.2 # 2024-04-20T05:40:09+12:00 Thu, 29 Aug 2013 21:08:04 GMT jmt12 [28190] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Had accidently hardcoded the max replication number - allow it to be ... Thu, 29 Aug 2013 21:06:56 GMT jmt12 [28189] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Replace the newer (and faster) while(@file) loop with the older (and ... Thu, 29 Aug 2013 20:58:33 GMT jmt12 [28188] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Minor fix to allow for tasks that start in the same second (now each ... Thu, 29 Aug 2013 20:56:57 GMT jmt12 [28187] * gs2-extensions/parallel-building/trunk/src/perllib/Kea.pm (added) A customized version of Kea.pm that looks in the correct place for ... Thu, 29 Aug 2013 20:55:57 GMT jmt12 [28186] * gs2-extensions/parallel-building/trunk/src/bin/script/iotop_report.pl (added) A (failed) attempt to use the unix iotop tool to determine IO percentage Fri, 09 Aug 2013 01:30:35 GMT jmt12 [28018] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Try really hard to capture the output from 'time' function as Medusa ... Fri, 09 Aug 2013 01:26:02 GMT jmt12 [28017] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Forgot to add processing comment before call to hadoop_import.pl Fri, 09 Aug 2013 01:16:44 GMT jmt12 [28016] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Allow the hadoop report generator to parse start and end times ... Fri, 09 Aug 2013 01:16:06 GMT jmt12 [28015] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Add an extra option that allows me to pass in the directory to write ... Fri, 09 Aug 2013 01:15:02 GMT jmt12 [28014] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Remove tasks that have had data locality established from the array ... Fri, 09 Aug 2013 01:14:22 GMT jmt12 [28013] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (added) A new script to run a battery of Hadoop ingests at varying ... Fri, 09 Aug 2013 01:13:50 GMT jmt12 [28012] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest2.java (modified) Express start time as a double as well Fri, 09 Aug 2013 01:13:01 GMT jmt12 [28011] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSInfoDB.java (modified) Turn off debugging in the copy in SVN Fri, 09 Aug 2013 01:11:46 GMT jmt12 [28010] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSInfoDB.java (modified) Correctly set up the environment for calls to txt2tdb and also ... Thu, 08 Aug 2013 00:46:06 GMT jmt12 [28001] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Write datestamp using dbutil if applicable Wed, 07 Aug 2013 22:13:59 GMT jmt12 [27996] * gs2-extensions/parallel-building/trunk/src/packages/hdfs-nfs-proxy-release-0.8.1.tar.gz (modified) A new version of the archive with minor changes to log4j configuration Wed, 07 Aug 2013 22:12:52 GMT jmt12 [27995] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Just adding some code comments Sun, 21 Jul 2013 22:40:02 GMT jmt12 [27915] * gs2-extensions/parallel-building/trunk/src/perllib/dbutil/stdoutxml.pm (added) A new PlugOut that doesn't write any intermediate files (bar those ... Sun, 21 Jul 2013 22:38:06 GMT jmt12 [27914] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Trying to get around a couple of divide-by-zero issues when ... Sun, 21 Jul 2013 22:37:02 GMT jmt12 [27913] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Made the ingester to be used (version 1 without reduce phase, or ... Sun, 21 Jul 2013 22:36:02 GMT jmt12 [27912] * gs2-extensions/parallel-building/trunk/src/src/CASCADE-MAKE/HADOOPGREENSTONEINGEST.sh (modified) Modified the compilation to include the new ingester and its co- ... Sun, 21 Jul 2013 22:35:43 GMT jmt12 [27911] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/compile.sh (modified) Modified the compilation to include the new ingester and its co- ... Sun, 21 Jul 2013 22:35:04 GMT jmt12 [27910] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSGroupingComparator.java (added) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSInfoDB.java (added) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSPartitioner.java (added) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest2.java (added) Extended the existing HadoopGreenstoneIngest with proper Reduce phase ... Thu, 04 Jul 2013 01:45:08 GMT jmt12 [27753] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Adding Handbrake's percentage complete to report - although this is ... Thu, 04 Jul 2013 01:44:22 GMT jmt12 [27752] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Data locality file not being found is no longer fatal (HDFS-NFS-Proxy ... Tue, 02 Jul 2013 02:35:42 GMT jmt12 [27732] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Nice the copy itself too Fri, 21 Jun 2013 00:25:32 GMT jmt12 [27686] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) A little more progress comments Fri, 21 Jun 2013 00:24:54 GMT jmt12 [27685] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) in the case of multiple attempts you need to retain the information ... Fri, 21 Jun 2013 00:22:25 GMT jmt12 [27684] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Adding natural sorting into report generation - so also needed to add ... Fri, 21 Jun 2013 00:20:27 GMT jmt12 [27683] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) moving a few more headings around to help with information block layout Fri, 21 Jun 2013 00:19:57 GMT jmt12 [27682] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Copying makeAllDirectories() from vanilla FileUtils.pm Wed, 19 Jun 2013 21:26:05 GMT jmt12 [27669] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Sort compute nodes naturally before labelling them with incremental ... Mon, 17 Jun 2013 22:59:52 GMT jmt12 [27654] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Add the ability to stagger the starting of Mappers by placing a ... Mon, 17 Jun 2013 22:52:36 GMT jmt12 [27653] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS/ThriftFH.pm (modified) Forgot to pull self off the head of arguments Mon, 17 Jun 2013 22:51:56 GMT jmt12 [27652] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Changing buffer to 128K (slightly faster) and adding a comment ... Mon, 17 Jun 2013 22:50:04 GMT jmt12 [27651] * gs2-extensions/parallel-building/trunk/src (modified) Mon, 17 Jun 2013 22:49:15 GMT jmt12 [27650] * gs2-extensions/parallel-building/trunk/src/.svnignore (modified) Mon, 17 Jun 2013 22:48:47 GMT jmt12 [27649] * gs2-extensions/parallel-building/trunk/src/setup.bash (deleted) No longer in SVN control Mon, 17 Jun 2013 22:48:22 GMT jmt12 [27648] * gs2-extensions/parallel-building/trunk/src/setup.bash.in (added) Template for setup.bash - a user will have to populate Hadoop fields Mon, 17 Jun 2013 22:31:51 GMT jmt12 [27645] * gs2-extensions/parallel-building/trunk/src/packages (modified) Mon, 17 Jun 2013 22:31:34 GMT jmt12 [27644] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Extended to support HDFS-access via NFS. This applies to both the ... Mon, 17 Jun 2013 22:30:13 GMT jmt12 [27643] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Changed the script generator so it can recurse through directories ... Mon, 17 Jun 2013 22:28:53 GMT jmt12 [27642] * gs2-extensions/parallel-building/trunk/src/bin/script/ffsplit.sh (added) A script I downloaded that successfully splits video files - ... Mon, 17 Jun 2013 22:12:53 GMT jmt12 [27641] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Altered order of arguments and allow archives dir to be passed as ... Mon, 17 Jun 2013 22:11:58 GMT jmt12 [27640] * gs2-extensions/parallel-building/trunk/src/packages/.svnignore (modified) Mon, 17 Jun 2013 22:09:38 GMT jmt12 [27638] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Change it so failure to open a filehandle isn't fatal - leave it up ... Mon, 17 Jun 2013 00:36:54 GMT jmt12 [27631] * gs2-extensions/parallel-building/trunk/src/packages/hdfs-nfs-proxy-release-0.8.1.tar.gz (added) A proxy to allow NFS access to HDFS Mon, 10 Jun 2013 05:10:48 GMT jmt12 [27595] * gs2-extensions/parallel-building/trunk/src/packages/cpan (modified) * gs2-extensions/parallel-building/trunk/src/packages/cpan/.svnignore (modified) Updating list of untarred directories to ignore Mon, 10 Jun 2013 05:09:36 GMT jmt12 [27594] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Extend hadoop_import.pl to be able to start and stop the Thrift server(s) Mon, 10 Jun 2013 04:50:33 GMT jmt12 [27593] * gs2-extensions/parallel-building/trunk/src/packages/cpan/Class-Accessor-0.34.tar.gz (added) Need Class Accessor for Thrift client under Rocks Mon, 10 Jun 2013 04:34:38 GMT jmt12 [27592] * gs2-extensions/parallel-building/trunk/src/packages/ThriftFS-0.9.0.tar.gz (modified) Adding in a script to allow a daemon version of Thrift to be started ... Mon, 10 Jun 2013 04:32:41 GMT jmt12 [27591] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Ensure Thrift will, be default, attempt to connect to the local ... Mon, 10 Jun 2013 04:27:49 GMT jmt12 [27590] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Adding statistics about data locality, and highlighting tasks where ... Mon, 10 Jun 2013 02:19:21 GMT jmt12 [27589] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Fixing up some minor bugs in regex's Mon, 10 Jun 2013 02:12:28 GMT jmt12 [27588] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Extend parser to support jobs that are split over several logs. Also ... Sun, 09 Jun 2013 23:29:03 GMT jmt12 [27587] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Allow debug mode to be enabled from the command line Sun, 09 Jun 2013 23:15:36 GMT jmt12 [27586] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Updating script to date date of hadoop job into account when ... Sun, 09 Jun 2013 22:25:10 GMT jmt12 [27585] * gs2-extensions/parallel-building/trunk/src/bin/script/test_fileutils.pl (modified) The perl on Medusa won't let you immediately treat a returned array ... Sun, 09 Jun 2013 22:23:46 GMT jmt12 [27584] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) I wasn't doing -r when attempting to clear directories left in /tmp ... Sun, 09 Jun 2013 22:22:19 GMT jmt12 [27583] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Adding code to differentiate between workers in a cluster - all of ... Thu, 06 Jun 2013 23:27:08 GMT jmt12 [27571] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) increase timeout to 4 hours per map Thu, 06 Jun 2013 22:53:10 GMT jmt12 [27570] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS/ThriftFH.pm (modified) Make the warning about binmode() not being applicable more ... Thu, 06 Jun 2013 22:48:39 GMT jmt12 [27569] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Trying to streamline the error messages from failing to link ... Thu, 06 Jun 2013 22:24:29 GMT jmt12 [27568] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Testing on Medusa suggests optimal buffer size around 128K Thu, 06 Jun 2013 22:20:38 GMT jmt12 [27567] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Found a printWarning that I handed changed to use the FileUtils version Thu, 06 Jun 2013 04:21:10 GMT jmt12 [27566] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Making the getcpu optional - as it isn't available on Medusa (but ... Wed, 05 Jun 2013 23:23:49 GMT jmt12 [27561] * gs2-extensions/parallel-building/trunk/src/src/CASCADE-MAKE.sh (modified) * gs2-extensions/parallel-building/trunk/src/src/CASCADE-MAKE/GETCPU.sh (added) Adding very basic compile file for getcpu - can't be bothered going ... Wed, 05 Jun 2013 23:16:31 GMT jmt12 [27560] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Fixing typo in regexp that meant filenames sometimes ignored Wed, 05 Jun 2013 23:15:28 GMT jmt12 [27559] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Changed mime-type away from binary - I hope. Meanwhile, generate ... Wed, 05 Jun 2013 23:11:10 GMT jmt12 [27558] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Forgot that Hadoop Map processes no longer have the environment ... Wed, 05 Jun 2013 01:07:43 GMT jmt12 [27551] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Altered so that it expects to be given a CSV containing parallel ... Wed, 05 Jun 2013 01:06:32 GMT jmt12 [27550] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Ensure the hostname is added to the Hadoop logs so we can identify ... Wed, 05 Jun 2013 01:04:58 GMT jmt12 [27549] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (added) Extract information from the logs generated by parallel Greenstone ... Wed, 05 Jun 2013 01:04:30 GMT jmt12 [27548] * gs2-extensions/parallel-building/trunk/src/bin/script/openmpi_report.pl (added) Extract information from the logs generated by parallel Greenstone ... Wed, 05 Jun 2013 01:03:03 GMT jmt12 [27547] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Rejigging some processing comments Wed, 05 Jun 2013 01:02:06 GMT jmt12 [27546] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Adding the ability for the Hadoop Mapper to determine what CPU number ... Wed, 05 Jun 2013 01:00:33 GMT jmt12 [27545] * gs2-extensions/parallel-building/trunk/src/src/getcpu-src/.svnignore (added) Ignoring just the compiled file (for now) Wed, 05 Jun 2013 01:00:05 GMT jmt12 [27544] * gs2-extensions/parallel-building/trunk/src/src/getcpu-src (added) * gs2-extensions/parallel-building/trunk/src/src/getcpu-src/getcpu.cpp (added) A tiny C script to guesstimate the CPU the calling Process is on Tue, 04 Jun 2013 23:53:16 GMT jmt12 [27543] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (added) Adding generate_gantt.pl script in its original form - i.e. directly ... Mon, 03 Jun 2013 23:12:36 GMT jmt12 [27532] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Add the ability to configure the Thrift connector using a ... Mon, 03 Jun 2013 23:11:39 GMT jmt12 [27531] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils.pm (modified) Only output the message about using copy instead of hard/soft link once Mon, 03 Jun 2013 23:08:37 GMT jmt12 [27530] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Clear out old logs, and adding more comments about what the script is ... Mon, 03 Jun 2013 21:28:48 GMT jmt12 [27526] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils.pm (modified) Adding in a 'isHDFS()' function so that some plugins ... Mon, 03 Jun 2013 21:27:12 GMT jmt12 [27525] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDFSShell.pm (modified) * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Adding in a 'isHDFS()' function so that some plugins ... Thu, 30 May 2013 00:15:06 GMT jmt12 [27515] * gs2-extensions/parallel-building/trunk/src/bin/script/test_fileutils.pl (modified) Making the file used durig buffertes be configurable Wed, 29 May 2013 23:08:57 GMT jmt12 [27514] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Altering code to allow configurable length of read/write buffer when ... Wed, 29 May 2013 22:16:22 GMT jmt12 [27512] * gs2-extensions/parallel-building/trunk/src/bin/script/test_fileutils.pl (modified) Adding in a special test for measuring the effect of altering ... Mon, 27 May 2013 23:39:12 GMT jmt12 [27496] * gs2-extensions/parallel-building/trunk/src/perllib/plugins/DirectoryPlugin.pm (modified) Replacing a smelly old util::file_exists() with a snazzy new ... Mon, 27 May 2013 23:38:08 GMT jmt12 [27495] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) removing doubled up debug comments and putting some paths in ... Mon, 27 May 2013 23:36:00 GMT jmt12 [27494] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Fixing a truncated comment - or maybe I never wrote an end to it... Mon, 27 May 2013 23:35:07 GMT jmt12 [27493] * gs2-extensions/parallel-building/trunk/src/perllib/plugouts/GreenstoneXMLPlugout.pm (deleted) No longer required - not that sure why it was required in the first place Mon, 27 May 2013 23:34:27 GMT jmt12 [27492] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Some versions of Hadoop add host and protocol information into paths ... Mon, 27 May 2013 23:33:26 GMT jmt12 [27491] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils.pm (modified) Repairing three bugs in makeAllDirectories - incorrect pattern meant ... Mon, 27 May 2013 22:53:27 GMT jmt12 [27490] * gs2-extensions/parallel-building/trunk/src/perllib/plugouts/BasePlugout.pm (deleted) No longer requires Mon, 27 May 2013 21:39:07 GMT jmt12 [27489] * gs2-extensions/parallel-building/trunk/src/packages/cpan/Thrift-0.9.0.tar.gz (deleted) Shouldn't have been here Mon, 27 May 2013 21:37:48 GMT jmt12 [27488] * gs2-extensions/parallel-building/trunk/src/packages/CASCADE-MAKE/CPAN.sh (modified) Since I've got rid of the thousand DateTime prereq modules, I can ... Mon, 27 May 2013 21:33:33 GMT jmt12 [27487] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Ensure Parallel Building path in environment (for ThriftFS) and that ... Mon, 27 May 2013 00:27:31 GMT jmt12 [27481] * gs2-extensions/parallel-building/trunk/src/bin/script/test_fileutils.pl (modified) * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils.pm (modified) * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Adding makeAllDirectories() (which I'd only implemented in LocalFS) ... Mon, 27 May 2013 00:22:03 GMT jmt12 [27480] * gs2-extensions/parallel-building/trunk/src/bin/script/test_fileutils.pl (modified) Removing DateTime dependency (so HDFSShell will always fail ... Mon, 27 May 2013 00:14:51 GMT jmt12 [27479] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDFSShell.pm (modified) Remove time parsing as DateTime is a fricking nightmare to install ...