Changeset 34221 for main


Ignore:
Timestamp:
2020-06-30T00:19:32+12:00 (4 years ago)
Author:
ak19
Message:

Undid the change of converting tabstops to their entities in docprint.pm (which has gone back to removing them now) and moved this conversion into TextPlugin.pm after all. In case this has an unforeseen effect, wanted to break as little as possible. Also, only want pre tags to preserve tabs and other html can be cleaned of this. TextPlugin definitely adds pre tags when converting txt to html, so it makes sense to always preserve tabstops there, whereas it doesn't make sense to assume the same need in all cases where html is produced as they may not contain pre tags.

Location:
main/trunk/greenstone2/perllib
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/docprint.pm

    r34220 r34221  
    103103    # (XML::Parser will barf on anything it doesn't consider to be
    104104    # valid UTF-8 text, including things like \c@, \cC etc.)
    105     # Will treat tab chars, \x09, as a special case right after this
    106     $all_text =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F]//g;
     105    # and the tab character too (x09)
    107106   
    108     # $all_text gets written out into an xml context and represents the html version of a doc,
    109     # allowing the use of html entities for the tab character (	)
    110     # Tabs (ASCII \x09) may be meaningful spacing in such cases whether the html emanated from a
    111     # text file, original html or other doc. Particularly when tabs are nested in <pre> tags.
    112     # Instead of removing tabs, replacing tabs with their entity reference will allow <pre> tags
    113     # to continue preserving any tabs in the final html display.
    114     # Hopefully with this, XML::Parser will not choke on tabs, and we get tab stop spaces preserved
    115     # in the html output.
    116     # This may be the best location to do this replacement and not in TextPlugin, because an html
    117     # source doc may contain <pre> elements with tab stops, so then HTMLPlugin would have to do the
    118     # replacement too.
    119     $all_text =~ s/\x09/&#09;/g;
    120    
     107    $all_text =~ s/[\x00-\x09\x0B\x0C\x0E-\x1F]//g;
     108   
    121109    return $all_text;
    122110}
  • main/trunk/greenstone2/perllib/plugins/TextPlugin.pm

    r34220 r34221  
    138138    $$textref =~ s/</&lt;/g;
    139139    $$textref =~ s/>/&gt;/g;
     140
     141 
     142    # $all_text gets written out into an xml context and represents the html version of a doc,
     143    # allowing the use of html entities for the tab character (&#09;)
     144    # But docprint.pm, which writes the doc_obj into doc.xml, removes tabs for XMLParser reasons
     145    # Tabs (ASCII \x09) may be meaningful spacing in text files to preserve whitespace formatting
     146    # as we're trying to do by nesting tabs in <pre> tags.
     147    # So before docprint.pm removes tabs stops, replacing them here with their entity reference
     148    # to allow <pre> tags to continue preserving any tabs in the final html display.
     149    $$textref =~ s/\x09/&#09;/g;
     150   
    140151   
    141152    # insert preformat tags and add text to document object
Note: See TracChangeset for help on using the changeset viewer.