Changeset 28803


Ignore:
Timestamp:
2014-01-30T15:14:48+13:00 (10 years ago)
Author:
ak19
Message:

Testing with accented characters in MARC data showed up problems in how text strings were being handled in the XML-Parsing MARCXMLPlugin. These changes fix this.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/MARCXMLPlugin.pm

    r28783 r28803  
    8181                                 'PluginObj' => $self,
    8282                 'Namespaces' => 1, # strip out namespaces
    83                  'Handlers' => {'Char' => \&ReadXMLFile::Char,
     83                 'Handlers' => {'Char' => \&Char,
    8484                        'XMLDecl' => \&ReadXMLFile::XMLDecl,
    8585                        'Entity'  => \&ReadXMLFile::Entity,
     
    157157}
    158158
     159
     160sub Char {
     161    # ReadXMLPlugin currently has 'use bytes' here, apparently to sort out
     162    # an encoding issue.  Possible that the time that 'use bytes' was
     163    # added in (to fix a problem) our understanding of Unicode in Perl
     164    # wasn't completely correct
     165
     166    # Trialing out this new version (without 'use bytes') here for MarcXML data
     167
     168    $_[0]->{'Text'} .= $_[1];
     169    return undef;
     170}
     171
    159172# Called for DOCTYPE declarations - use die to bail out if this doctype
    160173# is not meant for this plugin
     
    319332    my $tmp_marcxml_filename = &util::get_tmp_filename("xml");
    320333    if (open (XMLOUT,">$tmp_marcxml_filename")) {
     334        binmode(XMLOUT,":utf8");
    321335
    322336        print XMLOUT "<?xml-stylesheet type=\"text/xsl\" href=\"MARC21slim2English.xsl\"?>\n";     
Note: See TracChangeset for help on using the changeset viewer.