# # ChangeLog for gs2-extensions/parallel-building # # Generated by Trac 1.4.2 # 2024-04-17T01:37:01+12:00 Thu, 24 Jul 2014 01:56:47 GMT jmt12 [29160] * gs2-extensions/parallel-building/trunk/src/packages/cpan/Crypt-Blowfish_PP-1.12.tar.gz (added) Adding blowfish encryption package to give text processing some work ... Mon, 21 Jul 2014 22:46:42 GMT jmt12 [29158] * gs2-extensions/parallel-building/trunk/src/bin/script/logreportinator.pl (added) Initial checkin of script to convert a number of Greenstone|| logs ... Thu, 19 Jun 2014 05:28:20 GMT jmt12 [29106] * gs2-extensions/parallel-building/trunk/src/bin/script/linkinator.pl (added) Check-in of script to symlink lorem files to matching files in ... Wed, 18 Jun 2014 23:26:28 GMT jmt12 [29104] * gs2-extensions/parallel-building/trunk/src/bin/script/text_metricinator.pl (added) A script for extracting textual metrics from a collection of text ... Wed, 18 Jun 2014 23:26:01 GMT jmt12 [29103] * gs2-extensions/parallel-building/trunk/src/bin/script/importsubsetinator.pl (modified) updated - not any more efficient (Schlemiel the painter performance) ... Wed, 18 Dec 2013 00:02:19 GMT jmt12 [28779] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Making timing message all sorts of purty Tue, 17 Dec 2013 23:58:04 GMT jmt12 [28778] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Typo - underscore where I meant hyphen Tue, 17 Dec 2013 23:56:49 GMT jmt12 [28777] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Need to include path to mpiimport on Medusa Tue, 17 Dec 2013 22:11:16 GMT jmt12 [28771] * gs2-extensions/parallel-building/trunk/src/perllib/plugouts/BasePlugout.pm (added) A version of BasePlugout where the RSS feed update attempts to write ... Tue, 17 Dec 2013 22:08:13 GMT jmt12 [28770] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Adding microtiming... a little tricky what with TDBServer taking ... Tue, 17 Dec 2013 21:53:57 GMT jmt12 [28769] * gs2-extensions/parallel-building/trunk/src/bin/script/parallel_import.pl (deleted) No longer used. import.pl now smart enough to dynamically load ... Tue, 17 Dec 2013 21:53:15 GMT jmt12 [28768] * gs2-extensions/parallel-building/trunk/src/bin/script/parallel_import.pl (modified) Initially added microtime to this script, but then remembered it ... Tue, 17 Dec 2013 21:21:53 GMT jmt12 [28767] * gs2-extensions/parallel-building/trunk/src/bin/script/import_with_io_metric.pl (modified) Drastically increased the script to allow 1) battery of imports ... Tue, 17 Dec 2013 21:20:09 GMT jmt12 [28766] * gs2-extensions/parallel-building/trunk/src/bin/script/strace_to_tsv.pl (modified) Removing an occasional few characters of garbage that turn up in the ... Mon, 16 Dec 2013 23:08:10 GMT jmt12 [28764] * gs2-extensions/parallel-building/trunk/src/bin/script/parallel_dspace_filtermedia.pl (modified) Adding microsecond timing messages Thu, 21 Nov 2013 00:36:40 GMT jmt12 [28666] * gs2-extensions/parallel-building/trunk/src/bin/script/strace_to_tsv.pl (added) A script to transform a strace.out into a Tab separated file worthy ... Thu, 21 Nov 2013 00:35:52 GMT jmt12 [28665] * gs2-extensions/parallel-building/trunk/src/bin/script/import_with_io_metric.pl (modified) Latest changes to workaround resumed syscalls massive duration problem Wed, 20 Nov 2013 00:00:09 GMT jmt12 [28654] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Removed recordEarliestDatestamp() function as that no lurks in the ... Tue, 19 Nov 2013 23:58:26 GMT jmt12 [28653] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils.pm (modified) Changed the way a require was 'eval'd - but I have no idea why Tue, 19 Nov 2013 23:57:27 GMT jmt12 [28652] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Changes to support running the reports over logs produced from ... Tue, 19 Nov 2013 23:53:02 GMT jmt12 [28649] * gs2-extensions/parallel-building/trunk/src/perllib/plugins/CPULoadTextPlugin.pm (added) A version of a Textfile reading plugin that has a configurable load ... Tue, 19 Nov 2013 23:51:45 GMT jmt12 [28648] * gs2-extensions/parallel-building/trunk/src/bin/script/flush_caches.pl (modified) Adding a short delay after writing to the flush_cache file just to ... Tue, 19 Nov 2013 23:49:26 GMT jmt12 [28647] * gs2-extensions/parallel-building/trunk/src/bin/script/update_data_locality.pl (modified) Adding progress messages and making a debug message optional Tue, 19 Nov 2013 22:31:31 GMT jmt12 [28646] * gs2-extensions/parallel-building/trunk/src/bin/script/import_with_io_metric.pl (added) A script that uses strace to produce IO metrics of a Greenstone import Tue, 19 Nov 2013 22:31:07 GMT jmt12 [28645] * gs2-extensions/parallel-building/trunk/src/bin/script/dlreport.pl (added) Script to generate a report on data locality from GreenstoneHadoop logs Sun, 06 Oct 2013 21:04:32 GMT jmt12 [28358] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Replacing my earlier decision to only have data locality information ... Sun, 06 Oct 2013 21:02:54 GMT jmt12 [28357] * gs2-extensions/parallel-building/trunk/src/bin/script/update_data_locality.pl (added) used to update the data_locality.csv file in the case where other ... Sun, 06 Oct 2013 21:01:39 GMT jmt12 [28356] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Support the legacy version of taskno in the data_locality.csv file ... Wed, 25 Sep 2013 23:13:14 GMT jmt12 [28312] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest2.java (modified) Working on finer control over data locality - so I can configure a ... Thu, 29 Aug 2013 21:21:30 GMT jmt12 [28192] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest2.java (modified) Need to still output Greenstone messages to log otherwise I can't ... Thu, 29 Aug 2013 21:18:21 GMT jmt12 [28191] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Removing redundant error stream redirect - this wasn't causing the ... Thu, 29 Aug 2013 21:08:04 GMT jmt12 [28190] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Had accidently hardcoded the max replication number - allow it to be ... Thu, 29 Aug 2013 21:06:56 GMT jmt12 [28189] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Replace the newer (and faster) while(@file) loop with the older (and ... Thu, 29 Aug 2013 20:58:33 GMT jmt12 [28188] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Minor fix to allow for tasks that start in the same second (now each ... Thu, 29 Aug 2013 20:56:57 GMT jmt12 [28187] * gs2-extensions/parallel-building/trunk/src/perllib/Kea.pm (added) A customized version of Kea.pm that looks in the correct place for ... Thu, 29 Aug 2013 20:55:57 GMT jmt12 [28186] * gs2-extensions/parallel-building/trunk/src/bin/script/iotop_report.pl (added) A (failed) attempt to use the unix iotop tool to determine IO percentage Fri, 09 Aug 2013 01:30:35 GMT jmt12 [28018] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Try really hard to capture the output from 'time' function as Medusa ... Fri, 09 Aug 2013 01:26:02 GMT jmt12 [28017] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (modified) Forgot to add processing comment before call to hadoop_import.pl Fri, 09 Aug 2013 01:16:44 GMT jmt12 [28016] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Allow the hadoop report generator to parse start and end times ... Fri, 09 Aug 2013 01:16:06 GMT jmt12 [28015] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Add an extra option that allows me to pass in the directory to write ... Fri, 09 Aug 2013 01:15:02 GMT jmt12 [28014] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Remove tasks that have had data locality established from the array ... Fri, 09 Aug 2013 01:14:22 GMT jmt12 [28013] * gs2-extensions/parallel-building/trunk/src/bin/script/replication_tests.pl (added) A new script to run a battery of Hadoop ingests at varying ... Fri, 09 Aug 2013 01:13:50 GMT jmt12 [28012] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest2.java (modified) Express start time as a double as well Fri, 09 Aug 2013 01:13:01 GMT jmt12 [28011] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSInfoDB.java (modified) Turn off debugging in the copy in SVN Fri, 09 Aug 2013 01:11:46 GMT jmt12 [28010] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSInfoDB.java (modified) Correctly set up the environment for calls to txt2tdb and also ... Thu, 08 Aug 2013 00:46:06 GMT jmt12 [28001] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Write datestamp using dbutil if applicable Wed, 07 Aug 2013 22:13:59 GMT jmt12 [27996] * gs2-extensions/parallel-building/trunk/src/packages/hdfs-nfs-proxy-release-0.8.1.tar.gz (modified) A new version of the archive with minor changes to log4j configuration Wed, 07 Aug 2013 22:12:52 GMT jmt12 [27995] * gs2-extensions/parallel-building/trunk/src/perllib/parallelbuildinginexport.pm (modified) Just adding some code comments Sun, 21 Jul 2013 22:40:02 GMT jmt12 [27915] * gs2-extensions/parallel-building/trunk/src/perllib/dbutil/stdoutxml.pm (added) A new PlugOut that doesn't write any intermediate files (bar those ... Sun, 21 Jul 2013 22:38:06 GMT jmt12 [27914] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Trying to get around a couple of divide-by-zero issues when ... Sun, 21 Jul 2013 22:37:02 GMT jmt12 [27913] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Made the ingester to be used (version 1 without reduce phase, or ... Sun, 21 Jul 2013 22:36:02 GMT jmt12 [27912] * gs2-extensions/parallel-building/trunk/src/src/CASCADE-MAKE/HADOOPGREENSTONEINGEST.sh (modified) Modified the compilation to include the new ingester and its co- ... Sun, 21 Jul 2013 22:35:43 GMT jmt12 [27911] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/compile.sh (modified) Modified the compilation to include the new ingester and its co- ... Sun, 21 Jul 2013 22:35:04 GMT jmt12 [27910] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSGroupingComparator.java (added) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSInfoDB.java (added) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/GSPartitioner.java (added) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest2.java (added) Extended the existing HadoopGreenstoneIngest with proper Reduce phase ... Thu, 04 Jul 2013 01:45:08 GMT jmt12 [27753] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Adding Handbrake's percentage complete to report - although this is ... Thu, 04 Jul 2013 01:44:22 GMT jmt12 [27752] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Data locality file not being found is no longer fatal (HDFS-NFS-Proxy ... Tue, 02 Jul 2013 02:35:42 GMT jmt12 [27732] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Nice the copy itself too Fri, 21 Jun 2013 00:25:32 GMT jmt12 [27686] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) A little more progress comments Fri, 21 Jun 2013 00:24:54 GMT jmt12 [27685] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) in the case of multiple attempts you need to retain the information ... Fri, 21 Jun 2013 00:22:25 GMT jmt12 [27684] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Adding natural sorting into report generation - so also needed to add ... Fri, 21 Jun 2013 00:20:27 GMT jmt12 [27683] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) moving a few more headings around to help with information block layout Fri, 21 Jun 2013 00:19:57 GMT jmt12 [27682] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Copying makeAllDirectories() from vanilla FileUtils.pm Wed, 19 Jun 2013 21:26:05 GMT jmt12 [27669] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Sort compute nodes naturally before labelling them with incremental ... Mon, 17 Jun 2013 22:59:52 GMT jmt12 [27654] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Add the ability to stagger the starting of Mappers by placing a ... Mon, 17 Jun 2013 22:52:36 GMT jmt12 [27653] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS/ThriftFH.pm (modified) Forgot to pull self off the head of arguments Mon, 17 Jun 2013 22:51:56 GMT jmt12 [27652] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Changing buffer to 128K (slightly faster) and adding a comment ... Mon, 17 Jun 2013 22:50:04 GMT jmt12 [27651] * gs2-extensions/parallel-building/trunk/src (modified) Mon, 17 Jun 2013 22:49:15 GMT jmt12 [27650] * gs2-extensions/parallel-building/trunk/src/.svnignore (modified) Mon, 17 Jun 2013 22:48:47 GMT jmt12 [27649] * gs2-extensions/parallel-building/trunk/src/setup.bash (deleted) No longer in SVN control Mon, 17 Jun 2013 22:48:22 GMT jmt12 [27648] * gs2-extensions/parallel-building/trunk/src/setup.bash.in (added) Template for setup.bash - a user will have to populate Hadoop fields Mon, 17 Jun 2013 22:31:51 GMT jmt12 [27645] * gs2-extensions/parallel-building/trunk/src/packages (modified) Mon, 17 Jun 2013 22:31:34 GMT jmt12 [27644] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Extended to support HDFS-access via NFS. This applies to both the ... Mon, 17 Jun 2013 22:30:13 GMT jmt12 [27643] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Changed the script generator so it can recurse through directories ... Mon, 17 Jun 2013 22:28:53 GMT jmt12 [27642] * gs2-extensions/parallel-building/trunk/src/bin/script/ffsplit.sh (added) A script I downloaded that successfully splits video files - ... Mon, 17 Jun 2013 22:12:53 GMT jmt12 [27641] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Altered order of arguments and allow archives dir to be passed as ... Mon, 17 Jun 2013 22:11:58 GMT jmt12 [27640] * gs2-extensions/parallel-building/trunk/src/packages/.svnignore (modified) Mon, 17 Jun 2013 22:09:38 GMT jmt12 [27638] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Change it so failure to open a filehandle isn't fatal - leave it up ... Mon, 17 Jun 2013 00:36:54 GMT jmt12 [27631] * gs2-extensions/parallel-building/trunk/src/packages/hdfs-nfs-proxy-release-0.8.1.tar.gz (added) A proxy to allow NFS access to HDFS Mon, 10 Jun 2013 05:10:48 GMT jmt12 [27595] * gs2-extensions/parallel-building/trunk/src/packages/cpan (modified) * gs2-extensions/parallel-building/trunk/src/packages/cpan/.svnignore (modified) Updating list of untarred directories to ignore Mon, 10 Jun 2013 05:09:36 GMT jmt12 [27594] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) Extend hadoop_import.pl to be able to start and stop the Thrift server(s) Mon, 10 Jun 2013 04:50:33 GMT jmt12 [27593] * gs2-extensions/parallel-building/trunk/src/packages/cpan/Class-Accessor-0.34.tar.gz (added) Need Class Accessor for Thrift client under Rocks Mon, 10 Jun 2013 04:34:38 GMT jmt12 [27592] * gs2-extensions/parallel-building/trunk/src/packages/ThriftFS-0.9.0.tar.gz (modified) Adding in a script to allow a daemon version of Thrift to be started ... Mon, 10 Jun 2013 04:32:41 GMT jmt12 [27591] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Ensure Thrift will, be default, attempt to connect to the local ... Mon, 10 Jun 2013 04:27:49 GMT jmt12 [27590] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Adding statistics about data locality, and highlighting tasks where ... Mon, 10 Jun 2013 02:19:21 GMT jmt12 [27589] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Fixing up some minor bugs in regex's Mon, 10 Jun 2013 02:12:28 GMT jmt12 [27588] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Extend parser to support jobs that are split over several logs. Also ... Sun, 09 Jun 2013 23:29:03 GMT jmt12 [27587] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Allow debug mode to be enabled from the command line Sun, 09 Jun 2013 23:15:36 GMT jmt12 [27586] * gs2-extensions/parallel-building/trunk/src/bin/script/parse_task_info_from_hadoop_log.pl (modified) Updating script to date date of hadoop job into account when ... Sun, 09 Jun 2013 22:25:10 GMT jmt12 [27585] * gs2-extensions/parallel-building/trunk/src/bin/script/test_fileutils.pl (modified) The perl on Medusa won't let you immediately treat a returned array ... Sun, 09 Jun 2013 22:23:46 GMT jmt12 [27584] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_import.pl (modified) I wasn't doing -r when attempting to clear directories left in /tmp ... Sun, 09 Jun 2013 22:22:19 GMT jmt12 [27583] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Adding code to differentiate between workers in a cluster - all of ... Thu, 06 Jun 2013 23:27:08 GMT jmt12 [27571] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) increase timeout to 4 hours per map Thu, 06 Jun 2013 22:53:10 GMT jmt12 [27570] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS/ThriftFH.pm (modified) Make the warning about binmode() not being applicable more ... Thu, 06 Jun 2013 22:48:39 GMT jmt12 [27569] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Trying to streamline the error messages from failing to link ... Thu, 06 Jun 2013 22:24:29 GMT jmt12 [27568] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/HDThriftFS.pm (modified) Testing on Medusa suggests optimal buffer size around 128K Thu, 06 Jun 2013 22:20:38 GMT jmt12 [27567] * gs2-extensions/parallel-building/trunk/src/perllib/FileUtils/LocalFS.pm (modified) Found a printWarning that I handed changed to use the FileUtils version Thu, 06 Jun 2013 04:21:10 GMT jmt12 [27566] * gs2-extensions/parallel-building/trunk/src/src/java/org/nzdl/gsdl/HadoopGreenstoneIngest.java (modified) Making the getcpu optional - as it isn't available on Medusa (but ... Wed, 05 Jun 2013 23:23:49 GMT jmt12 [27561] * gs2-extensions/parallel-building/trunk/src/src/CASCADE-MAKE.sh (modified) * gs2-extensions/parallel-building/trunk/src/src/CASCADE-MAKE/GETCPU.sh (added) Adding very basic compile file for getcpu - can't be bothered going ... Wed, 05 Jun 2013 23:16:31 GMT jmt12 [27560] * gs2-extensions/parallel-building/trunk/src/bin/script/hadoop_report.pl (modified) Fixing typo in regexp that meant filenames sometimes ignored Wed, 05 Jun 2013 23:15:28 GMT jmt12 [27559] * gs2-extensions/parallel-building/trunk/src/bin/script/generate_gantt.pl (modified) Changed mime-type away from binary - I hope. Meanwhile, generate ...