Ticket #855 (closed enhancement: fixed)

Opened 6 years ago

Last modified 6 years ago

Dynamic InExport/BuildCol, Extended FileSystem, and other cool stuff

Reported by: jmt12 Owned by: jmt12
Priority: high Milestone:
Component: Collection Building Severity: major
Keywords: inexport buildcol fileutils wal environment extensions Cc:

Description

Checking in a swath of changes that I made to support the parallel building extension, the three major changes being:

1. More care when appending to INC and PATH environment variables, so as to allow the order of values to be used as precedence for overriding version of plugins, plugouts, and classifiers

2. Proper OO and tricksie runtime loading for the import.pl/inexport.pm and buildcol.pl/buildcolutils.pm pairs so as to allow extensions to provide overriding functionality during importing and building

3. Encapsulation of local file system functions into a new Perl module with the intention of allowing a later extension to provide access to several file systems (including HDFS and Thrift as needed by Hadoop)

I'll attempt to include details of each changeset below.

Attachments

battle_cry.jpg Download (27.8 KB) - added by jmt12 6 years ago.
inspiration

Change History

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/26206

Comment: Add hidden support for the new experimental multiple reader/writer functionality available in SQLite.

Files:

  • perllib/dbutils/sqlite.pm

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27299

Comment: Extending manifest parsing with the idea of a manifest having a version number as an attribute on the <Manifest> element. Mainfest version 2 is expected to be followed verbatim - i.e. disable global file scan and metadata search.

Files:

perllib/manifest.pm

(p.s. The changeset number for the sqlite change above should've been http://trac.greenstone.org/changeset/27298)

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27300

Comment: Some strings for the next options to buildcol.pl allowing for finer control over what indexes are being built.

Files:

  • perllib/strings.properties

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27301

Comment: You can now use the indexname and indexlevel options to buildcol to selectively build lucene indexes.

Files:

  • perllib/lucenebuilder.pm

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27302

Comment: Removed parallel processing stuff as that now lives in an extension. Restructured to better support overriding by extensions. Checks for manifest version, and processes files accordingly. Conditional addition to INC and PATH environment variables (explained elsewhere). Replace deprecated util.pm calls with FileUtils?.pm ones

Files:

  • perllib/inexport.pm

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27303

Comment: Replacing hard-coded additions to INC and PATH environment variables with conditional ones - this allows us to use the order of values in these variables for precedence, thus allows better support for extensions that override classifiers, plugins etc. ENV and PATH functions already exists in util, but augmentINC() is a new function

Files:

  • perllib/IncrementalBuildUtils.pm
  • perllib/classify.pm
  • perllib/inexport.pm
  • perllib/parse2.pm
  • perllib/parse3.pm
  • perllib/plugin.pm
  • perllib/util.pm

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27304

Comment: Moved the heavy lifting from the buildcol.pl script into a proper Perl class, buildcolutils.pm, thus allowing parts of the build process to be overridden by versions loaded by extensions. Replace deprecated util.pm calls with FileUtils?.pm ones.

Files:

  • perllib/buildcolutils.pm

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27305

Comment: Add code to allow importing and building to load overriding versions of inexport.pm and buildcolutils.pm from extensions at runtime. When an extension provides a possible override, Greenstone will dynamically detect and add additional options (visible via "--help"). When a user specifies one of these options the appropriate inexport/buildcolutils subclass will be loaded.

Files:

  • bin/script/buildcol.pl
  • bin/script/import.pl

Changed 6 years ago by jmt12

Changeset: http://trac.greenstone.org/changeset/27306

Comments: Moving the critical file-related functions (copy, rm, etc) out of util.pm into their own proper class FileUtils?. Use of the old functions in util.pm will prompt deprecated warning messages. There may be further functions that could be moved across in the future, but these are the critical ones when considering supporting other filesystems (HTTP, HDFS, WebDav?, etc). Updated some key files to use the new functions so now deprecated messages thrown when importing/building demo collection 'out of the box'.

Files:

  • bin/script/import.pl
  • bin/script/buildcol.pl
  • perllib/FileUtils.pm
  • perllib/basebuilder.pm
  • perllib/basebuildproc.pm
  • perllib/classify.pm
  • perllib/colcfg.pm
  • perllib/doc.pm
  • perllib/gsprintf.pm
  • perllib/mgppbuilder.pm
  • perllib/plugin.pm
  • perllib/plugout.pm
  • perllib/unicode.pm
  • perllib/util.pm
  • perllib/dbutil/gdbm.pm
  • perllib/plugins/BasePlugin.pm
  • perllib/plugins/DirectoryPlugin.pm
  • perllib/plugins/HTMLPlugin.pm
  • perllib/plugins/MetadataXMLPlugin.pm
  • perllib/plugins/ArchivesInfPlugin.pm
  • perllib/plugins/GreenstoneXMLPlugin.pm
  • perllib/plugouts/BasePlugout.pm
  • perllib/plugouts/GreenstoneXMLPlugout.pm

Note: Deprecated warning on util::filename_cat() currently commented out as otherwise you drown in a sea of deprecated warnings as every single Perl script/module in Greenstone appears to use it.

Changed 6 years ago by jmt12

  • status changed from new to closed
  • resolution set to fixed

So, yeah, that's a scary bunch of changes - although some of them come from my parallel processing code that has been running for a year or so, so they at least should be stable. It's a little unfortunate that I didn't submit the import.pl/buildcol.pl cascade of changes prior to the file system stuff - still, nothing that a determined person and a global search and replace can't rollback. Please contact me (jmt12@…) should any bugs arise during testing.

Changed 6 years ago by jmt12

inspiration

Changed 6 years ago by jmt12

  • priority changed from high to very low
  • severity changed from major to minor

Changeset: http://trac.greenstone.org/changeset/27392

Comments: Resolving bug reported by Anu, in that Greenstone no longer copies across RSS feeds file. She discovered this was due to there being no $self->{'archivedir'} set at that part of the code. I investigated and found that the code that actually assigned a default value to archivesdir was lurking in the section that handles building from cached version - and consequently was only acting on a local copy of the variable. Moved this code into a more logical place in the function that configures arguments/options and also ensured RSS copying works and doesn't throw deprecated errors (due to it's use of older util.pm file actions).

Files:

perllib/buildcolutils.pm

Changed 6 years ago by jmt12

  • priority changed from very low to high
  • severity changed from minor to major
Note: See TracTickets for help on using tickets.