#855 closed enhancement (fixed)
Dynamic InExport/BuildCol, Extended FileSystem, and other cool stuff
Reported by: | jmt12 | Owned by: | jmt12 |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Collection Building | Severity: | major |
Keywords: | inexport buildcol fileutils wal environment extensions | Cc: |
Description
Checking in a swath of changes that I made to support the parallel building extension, the three major changes being:
- More care when appending to INC and PATH environment variables, so as to allow the order of values to be used as precedence for overriding version of plugins, plugouts, and classifiers
- Proper OO and tricksie runtime loading for the import.pl/inexport.pm and buildcol.pl/buildcolutils.pm pairs so as to allow extensions to provide overriding functionality during importing and building
- Encapsulation of local file system functions into a new Perl module with the intention of allowing a later extension to provide access to several file systems (including HDFS and Thrift as needed by Hadoop)
I'll attempt to include details of each changeset below.
Attachments (1)
Change History (13)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27299
Comment: Extending manifest parsing with the idea of a manifest having a version number as an attribute on the <Manifest> element. Mainfest version 2 is expected to be followed verbatim - i.e. disable global file scan and metadata search.
Files:
perllib/manifest.pm
(p.s. The changeset number for the sqlite change above should've been http://trac.greenstone.org/changeset/27298)
comment:3 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27300
Comment: Some strings for the next options to buildcol.pl allowing for finer control over what indexes are being built.
Files:
- perllib/strings.properties
comment:4 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27301
Comment: You can now use the indexname and indexlevel options to buildcol to selectively build lucene indexes.
Files:
- perllib/lucenebuilder.pm
comment:5 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27302
Comment: Removed parallel processing stuff as that now lives in an extension. Restructured to better support overriding by extensions. Checks for manifest version, and processes files accordingly. Conditional addition to INC and PATH environment variables (explained elsewhere). Replace deprecated util.pm calls with FileUtils.pm ones
Files:
- perllib/inexport.pm
comment:6 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27303
Comment: Replacing hard-coded additions to INC and PATH environment variables with conditional ones - this allows us to use the order of values in these variables for precedence, thus allows better support for extensions that override classifiers, plugins etc. ENV and PATH functions already exists in util, but augmentINC() is a new function
Files:
- perllib/IncrementalBuildUtils.pm
- perllib/classify.pm
- perllib/inexport.pm
- perllib/parse2.pm
- perllib/parse3.pm
- perllib/plugin.pm
- perllib/util.pm
comment:7 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27304
Comment: Moved the heavy lifting from the buildcol.pl script into a proper Perl class, buildcolutils.pm, thus allowing parts of the build process to be overridden by versions loaded by extensions. Replace deprecated util.pm calls with FileUtils.pm ones.
Files:
- perllib/buildcolutils.pm
comment:8 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27305
Comment: Add code to allow importing and building to load overriding versions of inexport.pm and buildcolutils.pm from extensions at runtime. When an extension provides a possible override, Greenstone will dynamically detect and add additional options (visible via "--help"). When a user specifies one of these options the appropriate inexport/buildcolutils subclass will be loaded.
Files:
- bin/script/buildcol.pl
- bin/script/import.pl
comment:9 by , 11 years ago
Changeset: http://trac.greenstone.org/changeset/27306
Comments: Moving the critical file-related functions (copy, rm, etc) out of util.pm into their own proper class FileUtils. Use of the old functions in util.pm will prompt deprecated warning messages. There may be further functions that could be moved across in the future, but these are the critical ones when considering supporting other filesystems (HTTP, HDFS, WebDav, etc). Updated some key files to use the new functions so now deprecated messages thrown when importing/building demo collection 'out of the box'.
Files:
- bin/script/import.pl
- bin/script/buildcol.pl
- perllib/FileUtils.pm
- perllib/basebuilder.pm
- perllib/basebuildproc.pm
- perllib/classify.pm
- perllib/colcfg.pm
- perllib/doc.pm
- perllib/gsprintf.pm
- perllib/mgppbuilder.pm
- perllib/plugin.pm
- perllib/plugout.pm
- perllib/unicode.pm
- perllib/util.pm
- perllib/dbutil/gdbm.pm
- perllib/plugins/BasePlugin.pm
- perllib/plugins/DirectoryPlugin.pm
- perllib/plugins/HTMLPlugin.pm
- perllib/plugins/MetadataXMLPlugin.pm
- perllib/plugins/ArchivesInfPlugin.pm
- perllib/plugins/GreenstoneXMLPlugin.pm
- perllib/plugouts/BasePlugout.pm
- perllib/plugouts/GreenstoneXMLPlugout.pm
Note: Deprecated warning on util::filename_cat() currently commented out as otherwise you drown in a sea of deprecated warnings as every single Perl script/module in Greenstone appears to use it.
comment:10 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
So, yeah, that's a scary bunch of changes - although some of them come from my parallel processing code that has been running for a year or so, so they at least should be stable. It's a little unfortunate that I didn't submit the import.pl/buildcol.pl cascade of changes prior to the file system stuff - still, nothing that a determined person and a global search and replace can't rollback. Please contact me (jmt12@…) should any bugs arise during testing.
comment:11 by , 11 years ago
Priority: | high → very low |
---|---|
Severity: | major → minor |
Changeset: http://trac.greenstone.org/changeset/27392
Comments: Resolving bug reported by Anu, in that Greenstone no longer copies across RSS feeds file. She discovered this was due to there being no $self->{'archivedir'} set at that part of the code. I investigated and found that the code that actually assigned a default value to archivesdir was lurking in the section that handles building from cached version - and consequently was only acting on a local copy of the variable. Moved this code into a more logical place in the function that configures arguments/options and also ensured RSS copying works and doesn't throw deprecated errors (due to it's use of older util.pm file actions).
Files:
perllib/buildcolutils.pm
comment:12 by , 11 years ago
Priority: | very low → high |
---|---|
Severity: | minor → major |
Changeset: http://trac.greenstone.org/changeset/26206
Comment: Add hidden support for the new experimental multiple reader/writer functionality available in SQLite.
Files: