Ignore:
Timestamp:
2014-02-12T18:03:06+13:00 (7 years ago)
Author:
ak19
Message:

A question on the mailing list involved accented characters in custom metadata set names (not metadata set values). This exposed an issue in greenstone that could not cope with utf8 characters in metaset names. The cause was the sub Char { use bytes; ... lines when reading XML. These needed to be commented out in both MetadataXMLPlugin and ReadXMLFile (as GreenstoneXMLPlugin inherits from ReadXMLFile). Doing so showed that extra Encode::decode() operations to decode strings read in from XML into utf8 were no longer needed. As a result MetaXMLPlug and GreenstoneXMLPlug no longer call decode on the metadaname name and value read in from XML, or for the full-text, since GreenstoneXMLPlugin in entirety now no longer does the 'use bytes' part. Tested with text and html collections where metadata set nanes created in custom .mds files, their assigned metadata values and a document's full-text all contained the utf-8 specific character of a-macron.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/ReadXMLFile.pm

    r24348 r28836  
    291291# things down significantly in some cases.
    292292sub Char {
    293     use bytes;  # Necessary to prevent encoding issues with XML::Parser 2.31+
     293#    use bytes;  # Necessary to prevent encoding issues with XML::Parser 2.31+
    294294    $_[0]->{'Text'} .= $_[1];
    295295    return undef;
Note: See TracChangeset for help on using the changeset viewer.