Changeset 33309 for main

Show
Ignore:
Timestamp:
08.07.2019 15:46:18 (7 weeks ago)
Author:
ak19
Message:

More workarounds for HTML conversion results from Word's windows_scripting. If there were newlines between headings in the original word doc that accidentally had heading formatting, then the windows_scripting conversion creates a skeleton heading containing no actual text but space. The result being that when the doc.xml is generated, there's an empty subsection reflecting that empty heading. So adding further cleanup into StructuredHTMLPlugin to look for and remove these empty headings resulting from common Word user errors.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/StructuredHTMLPlugin.pm

    r33301 r33309  
    165165    #$body_text =~ s/(<div.*)/$top_section_tag$1/i; 
    166166    my $body = "<body".$body_text; 
     167     
     168    # remove empty headings that Word's windows_scripting may insert for multiple new lines around headings 
     169    # have heading markup, e.g. <h2><o:p>&nbsp;</o:p></h2> 
     170    $body =~ s@<h[1-6]>(<o:p>)?(&nbsp;)+(</o:p>)?</h[1-6]>@@gis; 
    167171     
    168172    my $section_text = $head;