Changeset 32643

30.11.2018 22:02:15 (2 weeks ago)

1. Previous commit (r32640) reintroduced an earlier bug in attempting to fix another. It reintroduced the bug whereby a reindex operation upon a doc rename (implemented as delete and and) would result in the deleted doc turning up on browsing. Despite being deleted from the SQL db, a ref to its oid remained in the index. This bug had been fixed by a call to doc_obj.set_OID() which presumably helped to identify the OID of any doc marked for deletion after which the indexing part of the process would proceed to delete it. The recent commit had tried to prevent the assignment of 2 OIDs for renamed documents (the deleted oid and the new oid) by selectively calling set_OID(). But this reintroduced the older bug. The solution was to call set_OID at the end, AFTER reading into the doc_obj from the SQL db, which both prevents 2 OIDs for a renamed doc and properly gets the doc deleted. 2. A further modification is reintroducing an improvement that existed in earlier uncommitted attempts of the GS SQL Plugin. When a doc is marked for deletion, its oid didn't exist in the MySQL db after, yet during the code still attempted to read all the records for any and all docoids, including those marked for deletion, back in from the MySQL db. Now the code just does doc_obj.set_OID() if an oid is for a doc marked for deletion (as required to get the doc actually deleted from the index) and then calls the super class method to let the indexing part process the doc to get it deleted and then returns, skipping attempting to read in info on that oid from the SQL db when nothing exists for it. For any non-deleted oid, the code of course continues to read in the entries from the MySQL db for that oid to reconstruct the doc_object.

1 modified


  • main/trunk/greenstone2/perllib/plugins/

    r32640 r32643  
    374374    print STDERR "   GreenstoneSQLPlugin processing doc $oid (reading into docobj from SQL db)\n" 
    375     if $self->{'verbosity'}; 
     375        if $self->{'verbosity'}; 
     377    my $build_proc_mode = $self->{'processor'}->get_mode(); # can be "text" as per basebuildproc or infodb 
     378    if($build_proc_mode =~ m/(delete)$/) { 
     379    # build_proc_mode could be "(infodb|text)(delete|reindex)" 
     380    # "...delete" or "...reindex" as per ArchivesInfPlugin 
     381    # But reindex is implemented as delete for GreenstoneSQLPlugs, so that's all we see here? 
     382    print STDERR "   DOC $oid WAS MARKED FOR DELETION. Won't attempt to retrieve from SQL db.\n" if $self->{'verbosity'}; 
     383    $self->{'doc_obj'}->set_OID($oid); # oid is all we care about for a doc marked for deletion 
     384    $self->SUPER::close_document(@_); # at the end of this method, doc will have been deleted 
     385    return;  # oid of doc marked for deletion is not in the SQL db, don't bother looking it up 
     386    } 
     388    # else, doc denoted by oid was not marked for deletion, look up its oid in db and read it into doc obj 
    377390    if($proc_mode eq "all" || $proc_mode eq "meta_only") { 
    384397    foreach my $row (@$records) { 
    385398        my ($primary_key, $did, $sid, $metaname, $metaval) = @$row; 
    387         # don't allow duplicate toplevel OID, as can happen when a doc has been renamed and requires reindexing 
    388         # TODO: any other meta that should not be duplicated, but can have been changed between rebuilds so that we need to call set_meta instead of add_meta? 
    389         # e.g. FileSize, SourceFile. But Language can be multiple, so gs meta is not guaranteed to be unique either. Whereas of dc metadata we know 
    390         # that some if not all should allow multiple entires for the same meta name, e.g. dc.Author/dc.Creator 
    391         if($sid =~ m@^root@ && $metaname eq "Identifier") { 
    392             # doc_obj's toplevel Identifier metadata is a special case: 
    393             # it should have only one value, so instead of add_meta() that will allow old Identifier meta to linger 
    394             # Need to do set_meta(). We then break out of the loop, to prevent duplicates (older values from DB) to be inserted for Identifier into doc_obj 
    395             # Handles the case where a doc was renamed and rebuilding triggers re-indexing case: old identifier is now overwritten with new one 
    396             $self->{'doc_obj'}->set_OID($oid); # calls doc_obj->set_metadata_element(top_section, Identifier, $oid). Sets OID if one doesn't exist. 
    397             next; # ensures Identifier set only once, and ensure Identifier is set to current docOID for the doc, a.o.t. allowing it to be set to any expired docOID from before a doc got renamed.          
    398         } 
    400         # process all other metadata the normal way: 
    402400        # get rid of the artificial "root" introduced in section id when saving to sql db 
    450448    } 
     451    # setting OID here instead of before reading from SQL db into docobj, will prevent duplicate values for Identifier 
     452    # since doc::set_OID() calls doc::set_metadata_element() for metadata that can't occur more than once 
     453    $self->{'doc_obj'}->set_OID($oid); # may only be necessary if doc was marked for deletion so that SUPER::close_document knows 
     454                                       # the oid of marked doc to remove from index 
    452456    # done reading into docobj from SQL db