Changeset 6143


Ignore:
Timestamp:
2003-12-08T14:17:38+13:00 (20 years ago)
Author:
jmt12
Message:

Archive parser now notices the presence of a SourceSegment piece of metadata - a sure sign that the doc.xml has been generated from a bibliographic source, and also a very good reason not to extract metadata from this file given that it will not only not make sense, but will also end up with thirty-thousand bits of metadata attached to one file

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gli/src/org/greenstone/gatherer/msm/GreenstoneArchiveParser.java

    r6051 r6143  
    9797    // Retrieve the DOM of the file.
    9898    Document document = Utility.parse(file, false);
     99
     100    Gatherer.println("Parsed greenstone archive document: " + file.getAbsolutePath());
    99101    // If we successfully parsed the document, then it is time to search through the DOM for the Metadata tags.
    100102    if(document != null) {
     
    103105        // Retrieve all of the Metadata sections.
    104106        NodeList metadata_elements = archive_element.getElementsByTagName("Metadata");
     107        // We first zip through the retrieved metadata, and if we encounter the element 'SourceSegment' - a sure sign this collection came from a bibliographic type file - we break out of extracted metadata parsing as no sense could be made of the data extracted anyway (plus we suffer a death of thirty-thousand pointy bits of metadata!)
     108        for(int i = 0; i < metadata_elements.getLength(); i++) {
     109        Element metadata_element = (Element) metadata_elements.item(i);
     110        String name = metadata_element.getAttribute("name");
     111        if(name.equalsIgnoreCase(StaticStrings.SOURCESEGMENT_VALUE)) {
     112            return 0;
     113        }
     114        }
    105115        // Now for each Metadata entry retrieved...
    106116        for(int i = 0; i < metadata_elements.getLength(); i++) {
    107117        Element metadata_element = (Element) metadata_elements.item(i);
    108118        String name = metadata_element.getAttribute("name");
    109         // There is a special case when the metadata name is gsdlsourcefilename, as we use this to find the FileRecord we want to add metadata to.
     119        // There is also a special case when the metadata name is gsdlsourcefilename, as we use this to find the FileRecord we want to add metadata to.
    110120        if(name.equals("gsdlsourcefilename")) {
    111121            file_path = MSMUtils.getValue(metadata_element);
Note: See TracChangeset for help on using the changeset viewer.