05/24/13 09:34:37 (8 years ago)

Updated to reflect changes in the perllib customization (most of it being removed by changes in GS2 and proper inheritence/overriding) and also adding a listing of the packages in this extension and why they are there (hopefully)

1 edited


  • gs2-extensions/parallel-building/trunk/src/README.txt

    r24675 r27420  
    4545===== bin/script and perllib =====
     47**Note:** The following is including for historic reasons - these changes have now been merged (or otherwise dealt with) by major changes to the way import and build scripts are run.  The number of actual customized files in Parallel Buildings perllib are now fewer in number, and tend to depend upon proper class inheritence and overriding.
    4749In order to try and make this compatible with the latest advances in the main trunk (so not the 64bit version I've been testing on), I've implemented the parallel building using a SVN head version of import.pl, buildcol.pl and perllib. I'll try to keep a list of the files I've changed here to aid in merging this code back into Greenstone:
    6264* perllib/plugin.pm: see IncrementalBuildTools
    6365* perllib/util.pm: made it only complain about periods (.) in the Identifier once - rather than once per document (which is a PITA when building one million documents).
    6566* perllib/dbutil/gdbm.pm: changed to call lock enabled versions of txt2db and db2txt.
    6667* perllib/dbutil/sqlite.pm: added WAL Pragma (for all the good it did). Also needed to redirect output (like for db_fast) as the WAL reports each type of action ("add","update", and "delete") that it has queued - very quickly becoming annoying.
    6868* perllib/plugins/DirectoryPlugin.pm: making the "Global file scan..." comment obey verbosity.
    6969* perllib/plugins/MARCPlugin.pm: see IncrementalBuildTools (in this case the path to cpan)
    7171* perllib/plugins/OAIMetadataXMLPlugin.pm: see IncrementalBuildTools (in this case the path to cpan)
    7272* perllib/plugins/ReadXMLPlugin.pm: see IncrementalBuildTools (in this case the path to cpan)
     74===== Packages =====
     76==== Bit-Vector-7.2 ====
     78Required by Thrift's Perl API.
     80==== Hadoop-1.1.0 ====
     82Provides Hadoop capabilities to the extension - you can then either run Greenstone in parallel (using OpenMPI as the parallel framework) pulling the files out of HDFS, or you can run the alternate Hadoop framework import (and maybe build if I can be bothered) and make even better use of HDFS.
     84==== IPC-Run-0.90 ====
     86Used in the server daemons (GDBM and TDB) to provide a handle to running applications that allows bi-directional piping and better process control (get child PIDs etc).
     88==== OpenMPI-1.4.3 ====
     90Provides a framework within which to run Greenstone in parallel.
     92==== Proc-Daemon-0.14 ====
     94Perl module to allow proper daemonization of perl processes.
     96==== Sort-Key-1.32 ====
     98Perl module providing better sorting algorithms include natural sort of keys.
     100==== ThriftFS-0.9.0 ====
     102A custom collection of files extracted from a src install of Hadoop and Thrift providing a persistent Hadoop-Thrift server (in Java), and an API for communicating with the server from Perl.
     104Includes a java file providing slightly more efficient Base91 encoding/decoding (as compared to Base64). Required by tweaks to Thrift to allow binary data to be passed around as Java Strings without UTF8 encoding accidentally mangling things (if only they'd used Java Byte[]s instead).
     106==== Tinyxml-gs-2.6.2 ====
     108Used to parse XML 'build recipes' in the parallel version of buildcol.
Note: See TracChangeset for help on using the changeset viewer.