Changeset 3369


Ignore:
Timestamp:
2002-08-20T17:09:03+12:00 (22 years ago)
Author:
sjboddie
Message:

HTMLPlug will no longer prevent metadata extraction when the
-description_tags option is set unless it actually finds at least one
<Section> tag in the document. This allows documents with <section> tags
and those without to be built in the same collection without getting an
unexpected lack of metadata for the latter.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/perllib/plugins/HTMLPlug.pm

    r3349 r3369  
    8181                          occur. Note that by setting this option you
    8282                          implicitly set -no_metadata, as all metadata should
    83                           be included within the <Section> tags. Also,
    84                           '-keep_head' will have no effect when this option
    85                           is set.\n";
     83                          be included within the <Section> tags (this is only
     84                          true for documents that actually contain <Section> tags
     85                          however). Also, '-keep_head' will have no effect when
     86                          this option is set, regardless of whether a document
     87                          contains Section tags.\n";
    8688}
    8789
     
    186188        my $comment = $2;
    187189        if (defined $text) {
     190        # text before a comment - note that getting to here
     191        # doesn't necessarily mean there are Section tags in
     192        # the document
    188193        $self->process_section(\$text, $base_dir, $file, $doc_obj, $cursection);
    189194        }
     
    231236        print $outhandle "HTMLPlug: WARNING: $file appears to contain no Section tags so\n";
    232237        print $outhandle "          will be processed as a single section document\n";
     238       
     239        # go ahead and process single-section document
    233240        $self->process_section($textref, $base_dir, $file, $doc_obj, $cursection);
     241
     242        # if document contains no Section tags we'll go ahead
     243        # and extract metadata (this won't have been done
     244        # above as the -description_tags option prevents it)
     245        $self->extract_metadata (\$doc_obj->get_text($cursection), $metadata, $doc_obj, $cursection)
     246            unless $self->{'no_metadata'};
     247
    234248        } else {
    235249        print $outhandle "HTMLPlug: WARNING: $file contains the following text outside\n";
     
    245259        print $outhandle " ($text)\n";
    246260        }
     261    } elsif (!$found_something) {
     262
     263        # may get to here if document contained no valid Section
     264        # tags but did contain some comments. The text will have
     265        # been processed already but we should print the warning
     266        # as above and extract metadata
     267        print $outhandle "HTMLPlug: WARNING: $file appears to contain no Section tags so\n";
     268        print $outhandle "          will be processed as a single section document\n";
     269
     270        $self->extract_metadata (\$doc_obj->get_text($cursection), $metadata, $doc_obj, $cursection)
     271        unless $self->{'no_metadata'};
    247272    }
    248273   
Note: See TracChangeset for help on using the changeset viewer.