Changeset 34130 for main


Ignore:
Timestamp:
2020-05-30T01:27:03+12:00 (4 years ago)
Author:
ak19
Message:

Some more tidying up while isMRI filtered collection rebuilding

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/NutchTextDumpPlugin.pm

    r34129 r34130  
    576576       
    577577        # add meta to docObject if both metaname and metavalue are non-empty strings
    578         if($metaname ne "" && $metavalue ne "") { # && $metaname ne "rs" && $metaname ne "csh") {
    579              # when no namespace is provided as here, adds as ex. meta.
     578        if($metaname ne "" && $metavalue ne "") {
     579            # when no namespace is provided as here, adds as ex. meta.
    580580            # Don't explicitly prefix ex., as things becomes convoluted when retrieving meta
    581581            $doc_obj->add_utf8_metadata ($cursection, $metaname, $metavalue);
     
    679679}
    680680
     681# returns siteID when file in import of form siteID.txt
     682# returns siteID when import contains siteID/dump.txt (as happens when OIDtype=dirname)
     683# Returns whatever baseOID in other situations, not sure if meaningful, but shouldn't have
     684# passed can_process_this_file() test for anything other than siteID/dump.txt and siteID.txt anyway
    681685sub get_siteID {
    682686    my $self = shift(@_);
     
    687691    # file name without extension is site ID, e.g. 00001.txt
    688692    $siteID = $1;
    689     #$siteID = $file;
    690     #$siteID =~ s@\.txt$@@;
    691693    }
    692694    else { # if($doc_obj->{'OIDtype'} eq "dirname") or even otherwise, just use baseOID
     
    704706
    705707# SplitTextFile::get_base_OID() has the side-effect of calling SUPER::add_OID()
    706 # inorder to initialise it. This then ultimately results in calling util::tidy_up_OID() to print warning messages
    707 # about all-numeric baseOID requiring the D prefix prepended.
    708 # When the base_OID is already set and we want to get the baseOID without that side-effect, because siteID = baseOID
    709 # in cases where OIDtype=dirname.
    710 # We don't want to recalculate baseOID for each segment, only once per dump.txt file as the superclass SplitTextFile
    711 # did it. However, we need access to the baseOID from this plugin
    712 # So we override this method to store the calculated baseOID in a variable for use and check if it's set before
    713 # calling this method.
    714 # CANNOT override this method in the usual way though: to calculate baseOID once per dump.txt, store it and return
    715 # the stored value for each segment because the superclass version of get_base_OID has a side-effect and needs to
    716 # continue doing everything it usually does each time the superclass calls this method.
     708# in order to initialise segment IDs.
     709# This then ultimately results in calling util::tidy_up_OID() to print warning messages
     710# about siteIDs forming all-numeric baseOIDs that require the D prefix prepended.
     711# In cases where site ID is the same as baseOID and is needed to set siteID meta, we want to avoid
     712# the warning messages but don't want to prevent the important side-effects of SplitTextFile::get_base_OID()
     713# So instead of overriding this method to calculate and store baseOID the first time and return
     714# the stored value subsequent times (which has the undesirable result that the side-effect from
     715# ALWAYS calling super's get_base_OID() even when there's a stored value), we just always store
     716# the return value before returning it. Next, we push the check for first testing for a stored value
     717# to use, else forcing it to be computed by calling this get_base_OID(), onto a separate function that
     718# calls this one, get_siteID(). Problem solved.
    717719sub get_base_OID {
    718720    my $self = shift(@_);
    719721    my ($doc_obj) = @_;
    720722
    721 
    722     # Let this method do what it always did, as it does more than return a value and has important side-effects!
    723     # SplitTextPlugin calls this method once for every segment, not just for the base document, with the side-effect
    724     # of calculating and adding the OID for each segment.
    725     # Therefore, do not return the stored dirname_siteID if already set, as otherwise this method will have
    726     # the ominous side-effect of "Warning: D00001s1 already exists with index status I" messages for every segment!
    727     # Instead, when trying to work out $siteID (when OIDtype=dirname), check if $self->{'dirname_siteID'} already set
    728     # and use that else call this method.
    729     #if(!defined $self->{'dirname_siteID'}) {
     723    #if(!defined $self->{'dirname_siteID'}) { # DON'T DO THIS: loses essential side-effect of always calling super's get_base_OID()
     724    # this method is overridden, so it's not just called by this NutchTextDumpPlugin
     725
    730726    $self->{'dirname_siteID'} = $self->SUPER::get_base_OID($doc_obj); # store for NutchTextDumpPlugin's internal use
    731727    #}
Note: See TracChangeset for help on using the changeset viewer.