Changeset 30593


Ignore:
Timestamp:
06/27/16 18:11:27 (5 years ago)
Author:
ak19
Message:

Dr Bainbridge found another point in the code where the UTF-16 Surrogate pairs (that lead to malformed UTF-8 character errors) are encountered in HTMLPlugin. This part of the code is encountered when the PDFPlugin has the pdfbox_conversion set. PDFBox would have produced the HTML containing entities that represent characters not considered valid in UTF-8 and this then failed on Diego's test PDF until Dr Bainbridge's bugfix.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/ghtml.pm

    r23371 r30593  
    219219    }
    220220
     221
    221222    if (defined $code) {
     223   
     224    # malformed UTF-8 character used in UTF-16
     225    if($code >= 0xD800 && $code <= 0xDFFF) {
     226        print STDERR "Warning: encountered the HTML entity \&#$code; which represents part of a UTF-16 surrogate pair, which is not supported in ghtml::getcharequiv(). Replacing with '?'.\n";
     227        $code = ord("?");
     228    }
     229
    222230    # non-standard Microsoft breakage, as usual
    223231    if ($code < 0x9f) { # code page 1252 uses reserved bytes
Note: See TracChangeset for help on using the changeset viewer.