Changeset 34221

Show
Ignore:
Timestamp:
30.06.2020 00:19:32 (10 days ago)
Author:
ak19
Message:

Undid the change of converting tabstops to their entities in docprint.pm (which has gone back to removing them now) and moved this conversion into TextPlugin?.pm after all. In case this has an unforeseen effect, wanted to break as little as possible. Also, only want pre tags to preserve tabs and other html can be cleaned of this. TextPlugin? definitely adds pre tags when converting txt to html, so it makes sense to always preserve tabstops there, whereas it doesn't make sense to assume the same need in all cases where html is produced as they may not contain pre tags.

Location:
main/trunk/greenstone2/perllib
Files:
2 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/docprint.pm

    r34220 r34221  
    103103    # (XML::Parser will barf on anything it doesn't consider to be 
    104104    # valid UTF-8 text, including things like \c@, \cC etc.) 
    105     # Will treat tab chars, \x09, as a special case right after this 
    106     $all_text =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F]//g; 
     105    # and the tab character too (x09) 
    107106     
    108     # $all_text gets written out into an xml context and represents the html version of a doc, 
    109     # allowing the use of html entities for the tab character (	) 
    110     # Tabs (ASCII \x09) may be meaningful spacing in such cases whether the html emanated from a 
    111     # text file, original html or other doc. Particularly when tabs are nested in <pre> tags. 
    112     # Instead of removing tabs, replacing tabs with their entity reference will allow <pre> tags 
    113     # to continue preserving any tabs in the final html display. 
    114     # Hopefully with this, XML::Parser will not choke on tabs, and we get tab stop spaces preserved 
    115     # in the html output. 
    116     # This may be the best location to do this replacement and not in TextPlugin, because an html 
    117     # source doc may contain <pre> elements with tab stops, so then HTMLPlugin would have to do the 
    118     # replacement too. 
    119     $all_text =~ s/\x09/&#09;/g; 
    120      
     107    $all_text =~ s/[\x00-\x09\x0B\x0C\x0E-\x1F]//g; 
     108    
    121109    return $all_text; 
    122110} 
  • main/trunk/greenstone2/perllib/plugins/TextPlugin.pm

    r34220 r34221  
    138138    $$textref =~ s/</&lt;/g; 
    139139    $$textref =~ s/>/&gt;/g; 
     140 
     141  
     142    # $all_text gets written out into an xml context and represents the html version of a doc, 
     143    # allowing the use of html entities for the tab character (&#09;) 
     144    # But docprint.pm, which writes the doc_obj into doc.xml, removes tabs for XMLParser reasons 
     145    # Tabs (ASCII \x09) may be meaningful spacing in text files to preserve whitespace formatting 
     146    # as we're trying to do by nesting tabs in <pre> tags. 
     147    # So before docprint.pm removes tabs stops, replacing them here with their entity reference 
     148    # to allow <pre> tags to continue preserving any tabs in the final html display. 
     149    $$textref =~ s/\x09/&#09;/g; 
     150     
    140151     
    141152    # insert preformat tags and add text to document object