Changeset 28803

Show
Ignore:
Timestamp:
30.01.2014 15:14:48 (6 years ago)
Author:
ak19
Message:

Testing with accented characters in MARC data showed up problems in how text strings were being handled in the XML-Parsing MARCXMLPlugin. These changes fix this.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/MARCXMLPlugin.pm

    r28783 r28803  
    8181                                 'PluginObj' => $self, 
    8282                 'Namespaces' => 1, # strip out namespaces 
    83                  'Handlers' => {'Char' => \&ReadXMLFile::Char, 
     83                 'Handlers' => {'Char' => \&Char, 
    8484                        'XMLDecl' => \&ReadXMLFile::XMLDecl, 
    8585                        'Entity'  => \&ReadXMLFile::Entity, 
     
    157157} 
    158158 
     159 
     160sub Char { 
     161    # ReadXMLPlugin currently has 'use bytes' here, apparently to sort out 
     162    # an encoding issue.  Possible that the time that 'use bytes' was 
     163    # added in (to fix a problem) our understanding of Unicode in Perl 
     164    # wasn't completely correct 
     165 
     166    # Trialing out this new version (without 'use bytes') here for MarcXML data 
     167 
     168    $_[0]->{'Text'} .= $_[1]; 
     169    return undef; 
     170} 
     171 
    159172# Called for DOCTYPE declarations - use die to bail out if this doctype 
    160173# is not meant for this plugin 
     
    319332    my $tmp_marcxml_filename = &util::get_tmp_filename("xml"); 
    320333    if (open (XMLOUT,">$tmp_marcxml_filename")) { 
     334        binmode(XMLOUT,":utf8"); 
    321335 
    322336        print XMLOUT "<?xml-stylesheet type=\"text/xsl\" href=\"MARC21slim2English.xsl\"?>\n";