- Timestamp:
- 2020-06-29T23:54:16+12:00 (4 years ago)
- Location:
- main/trunk/greenstone2/perllib
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/greenstone2/perllib/docprint.pm
r32575 r34220 103 103 # (XML::Parser will barf on anything it doesn't consider to be 104 104 # valid UTF-8 text, including things like \c@, \cC etc.) 105 $all_text =~ s/[\x00-\x09\x0B\x0C\x0E-\x1F]//g; 106 105 # Will treat tab chars, \x09, as a special case right after this 106 $all_text =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F]//g; 107 108 # $all_text gets written out into an xml context and represents the html version of a doc, 109 # allowing the use of html entities for the tab character (	) 110 # Tabs (ASCII \x09) may be meaningful spacing in such cases whether the html emanated from a 111 # text file, original html or other doc. Particularly when tabs are nested in <pre> tags. 112 # Instead of removing tabs, replacing tabs with their entity reference will allow <pre> tags 113 # to continue preserving any tabs in the final html display. 114 # Hopefully with this, XML::Parser will not choke on tabs, and we get tab stop spaces preserved 115 # in the html output. 116 # This may be the best location to do this replacement and not in TextPlugin, because an html 117 # source doc may contain <pre> elements with tab stops, so then HTMLPlugin would have to do the 118 # replacement too. 119 $all_text =~ s/\x09/	/g; 120 107 121 return $all_text; 108 122 } -
main/trunk/greenstone2/perllib/plugins/TextPlugin.pm
r31492 r34220 111 111 $title =~ s/$self->{'title_sub'}//; 112 112 } 113 $title =~ /^\s*([^\n]*)/s; $title=$1; 113 # A series of spaces and/or punctuation too can be skipped to get at a meaningful title? 114 # https://www.geeksforgeeks.org/perl-special-character-classes-in-regular-expressions/ 115 $title =~ /^[\s|[:punct:]]*([^\n]*)/s; $title=$1; 114 116 $title =~ s/\t/ /g; 115 117 $title =~ s/\r?\n?$//s; # remove any carriage returns and/or line feeds at line end,
Note:
See TracChangeset
for help on using the changeset viewer.