Ignore:
Timestamp:
2014-02-12T18:03:06+13:00 (10 years ago)
Author:
ak19
Message:

A question on the mailing list involved accented characters in custom metadata set names (not metadata set values). This exposed an issue in greenstone that could not cope with utf8 characters in metaset names. The cause was the sub Char { use bytes; ... lines when reading XML. These needed to be commented out in both MetadataXMLPlugin and ReadXMLFile (as GreenstoneXMLPlugin inherits from ReadXMLFile). Doing so showed that extra Encode::decode() operations to decode strings read in from XML into utf8 were no longer needed. As a result MetaXMLPlug and GreenstoneXMLPlug no longer call decode on the metadaname name and value read in from XML, or for the full-text, since GreenstoneXMLPlugin in entirety now no longer does the 'use bytes' part. Tested with text and html collections where metadata set nanes created in custom .mds files, their assigned metadata values and a document's full-text all contained the utf-8 specific character of a-macron.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/GreenstoneXMLPlugin.pm

    r28267 r28836  
    224224
    225225
    226     my $metadata_name = decode("utf-8",$self->{'metadata_name'});
    227     my $metadata_value = decode("utf-8",$self->{'metadata_value'});
     226    my $metadata_name = $self->{'metadata_name'};
     227    my $metadata_value = $self->{'metadata_value'};
     228    #my $metadata_name = decode("utf-8",$self->{'metadata_name'});
     229    #my $metadata_value = decode("utf-8",$self->{'metadata_value'});
    228230
    229231    $self->{'doc_obj'}->add_utf8_metadata($self->{'section'},
     
    271273    # text read in by XML::Parser is in Perl's binary byte value
    272274    # form ... need to explicitly make it UTF-8
    273     my $content = decode("utf-8",$self->{'content'});
     275    #my $content = decode("utf-8",$self->{'content'});
     276    my $content = $self->{'content'};
    274277
    275278    $self->{'doc_obj'}->add_utf8_text($self->{'section'}, $content);
Note: See TracChangeset for help on using the changeset viewer.