Changeset 30491


Ignore:
Timestamp:
2016-05-04T15:48:49+12:00 (6 years ago)
Author:
Georgiy Litvinov
Message:

Removed high and low surrogates from converted html

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/PDFPlugin.pm

    r29102 r30491  
    314314    # be useful to give an indication of document length in browser through setting
    315315    # num_pages as metadata.
     316    # Clean html from low and hight surrogates D800–DFFF
     317    $text =~ s@[\N{U+D800}-\N{DFFF}]@\ @g;
    316318    my @pages = ($text =~ m/\<[Aa] name=\"?\w+\"?>/ig); #<div style=\"?page-break-before:always; page-break-after:always\"?>
    317319    my $num_pages = scalar(@pages);
Note: See TracChangeset for help on using the changeset viewer.