- Timestamp:
- 2020-05-30T01:27:03+12:00 (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/greenstone2/perllib/plugins/NutchTextDumpPlugin.pm
r34129 r34130 576 576 577 577 # add meta to docObject if both metaname and metavalue are non-empty strings 578 if($metaname ne "" && $metavalue ne "") { # && $metaname ne "rs" && $metaname ne "csh") {579 578 if($metaname ne "" && $metavalue ne "") { 579 # when no namespace is provided as here, adds as ex. meta. 580 580 # Don't explicitly prefix ex., as things becomes convoluted when retrieving meta 581 581 $doc_obj->add_utf8_metadata ($cursection, $metaname, $metavalue); … … 679 679 } 680 680 681 # returns siteID when file in import of form siteID.txt 682 # returns siteID when import contains siteID/dump.txt (as happens when OIDtype=dirname) 683 # Returns whatever baseOID in other situations, not sure if meaningful, but shouldn't have 684 # passed can_process_this_file() test for anything other than siteID/dump.txt and siteID.txt anyway 681 685 sub get_siteID { 682 686 my $self = shift(@_); … … 687 691 # file name without extension is site ID, e.g. 00001.txt 688 692 $siteID = $1; 689 #$siteID = $file;690 #$siteID =~ s@\.txt$@@;691 693 } 692 694 else { # if($doc_obj->{'OIDtype'} eq "dirname") or even otherwise, just use baseOID … … 704 706 705 707 # SplitTextFile::get_base_OID() has the side-effect of calling SUPER::add_OID() 706 # in order to initialise it. This then ultimately results in calling util::tidy_up_OID() to print warning messages707 # about all-numeric baseOID requiring the D prefix prepended.708 # When the base_OID is already set and we want to get the baseOID without that side-effect, because siteID = baseOID709 # in cases where OIDtype=dirname.710 # We don't want to recalculate baseOID for each segment, only once per dump.txt file as the superclass SplitTextFile711 # did it. However, we need access to the baseOID from this plugin712 # So we override this method to store the calculated baseOID in a variable for use and check if it's set before713 # calling this method.714 # CANNOT override this method in the usual way though: to calculate baseOID once per dump.txt, store it and return715 # t he stored value for each segment because the superclass version of get_base_OID has a side-effect and needs to716 # c ontinue doing everything it usually does each time the superclass calls this method.708 # in order to initialise segment IDs. 709 # This then ultimately results in calling util::tidy_up_OID() to print warning messages 710 # about siteIDs forming all-numeric baseOIDs that require the D prefix prepended. 711 # In cases where site ID is the same as baseOID and is needed to set siteID meta, we want to avoid 712 # the warning messages but don't want to prevent the important side-effects of SplitTextFile::get_base_OID() 713 # So instead of overriding this method to calculate and store baseOID the first time and return 714 # the stored value subsequent times (which has the undesirable result that the side-effect from 715 # ALWAYS calling super's get_base_OID() even when there's a stored value), we just always store 716 # the return value before returning it. Next, we push the check for first testing for a stored value 717 # to use, else forcing it to be computed by calling this get_base_OID(), onto a separate function that 718 # calls this one, get_siteID(). Problem solved. 717 719 sub get_base_OID { 718 720 my $self = shift(@_); 719 721 my ($doc_obj) = @_; 720 722 721 722 # Let this method do what it always did, as it does more than return a value and has important side-effects! 723 # SplitTextPlugin calls this method once for every segment, not just for the base document, with the side-effect 724 # of calculating and adding the OID for each segment. 725 # Therefore, do not return the stored dirname_siteID if already set, as otherwise this method will have 726 # the ominous side-effect of "Warning: D00001s1 already exists with index status I" messages for every segment! 727 # Instead, when trying to work out $siteID (when OIDtype=dirname), check if $self->{'dirname_siteID'} already set 728 # and use that else call this method. 729 #if(!defined $self->{'dirname_siteID'}) { 723 #if(!defined $self->{'dirname_siteID'}) { # DON'T DO THIS: loses essential side-effect of always calling super's get_base_OID() 724 # this method is overridden, so it's not just called by this NutchTextDumpPlugin 725 730 726 $self->{'dirname_siteID'} = $self->SUPER::get_base_OID($doc_obj); # store for NutchTextDumpPlugin's internal use 731 727 #}
Note:
See TracChangeset
for help on using the changeset viewer.