Ignore:
Timestamp:
2020-06-30T00:19:32+12:00 (4 years ago)
Author:
ak19
Message:

Undid the change of converting tabstops to their entities in docprint.pm (which has gone back to removing them now) and moved this conversion into TextPlugin.pm after all. In case this has an unforeseen effect, wanted to break as little as possible. Also, only want pre tags to preserve tabs and other html can be cleaned of this. TextPlugin definitely adds pre tags when converting txt to html, so it makes sense to always preserve tabstops there, whereas it doesn't make sense to assume the same need in all cases where html is produced as they may not contain pre tags.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/docprint.pm

    r34220 r34221  
    103103    # (XML::Parser will barf on anything it doesn't consider to be
    104104    # valid UTF-8 text, including things like \c@, \cC etc.)
    105     # Will treat tab chars, \x09, as a special case right after this
    106     $all_text =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F]//g;
     105    # and the tab character too (x09)
    107106   
    108     # $all_text gets written out into an xml context and represents the html version of a doc,
    109     # allowing the use of html entities for the tab character (	)
    110     # Tabs (ASCII \x09) may be meaningful spacing in such cases whether the html emanated from a
    111     # text file, original html or other doc. Particularly when tabs are nested in <pre> tags.
    112     # Instead of removing tabs, replacing tabs with their entity reference will allow <pre> tags
    113     # to continue preserving any tabs in the final html display.
    114     # Hopefully with this, XML::Parser will not choke on tabs, and we get tab stop spaces preserved
    115     # in the html output.
    116     # This may be the best location to do this replacement and not in TextPlugin, because an html
    117     # source doc may contain <pre> elements with tab stops, so then HTMLPlugin would have to do the
    118     # replacement too.
    119     $all_text =~ s/\x09/&#09;/g;
    120    
     107    $all_text =~ s/[\x00-\x09\x0B\x0C\x0E-\x1F]//g;
     108   
    121109    return $all_text;
    122110}
Note: See TracChangeset for help on using the changeset viewer.