Changeset 33309 for main/trunk


Ignore:
Timestamp:
2019-07-08T15:46:18+12:00 (5 years ago)
Author:
ak19
Message:

More workarounds for HTML conversion results from Word's windows_scripting. If there were newlines between headings in the original word doc that accidentally had heading formatting, then the windows_scripting conversion creates a skeleton heading containing no actual text but space. The result being that when the doc.xml is generated, there's an empty subsection reflecting that empty heading. So adding further cleanup into StructuredHTMLPlugin to look for and remove these empty headings resulting from common Word user errors.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/StructuredHTMLPlugin.pm

    r33301 r33309  
    165165    #$body_text =~ s/(<div.*)/$top_section_tag$1/i;
    166166    my $body = "<body".$body_text;
     167   
     168    # remove empty headings that Word's windows_scripting may insert for multiple new lines around headings
     169    # have heading markup, e.g. <h2><o:p>&nbsp;</o:p></h2>
     170    $body =~ s@<h[1-6]>(<o:p>)?(&nbsp;)+(</o:p>)?</h[1-6]>@@gis;
    167171   
    168172    my $section_text = $head;
Note: See TracChangeset for help on using the changeset viewer.