Show
Ignore:
Timestamp:
04.05.2016 15:48:49 (4 years ago)
Author:
litvinovg
Message:

Removed high and low surrogates from converted html

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/PDFPlugin.pm

    r29102 r30491  
    314314    # be useful to give an indication of document length in browser through setting 
    315315    # num_pages as metadata. 
     316    # Clean html from low and hight surrogates D800–DFFF 
     317    $text =~ s@[\N{U+D800}-\N{DFFF}]@\ @g; 
    316318    my @pages = ($text =~ m/\<[Aa] name=\"?\w+\"?>/ig); #<div style=\"?page-break-before:always; page-break-after:always\"?> 
    317319    my $num_pages = scalar(@pages);