Changeset 33301 for main

Show
Ignore:
Timestamp:
05.07.2019 20:12:10 (7 weeks ago)
Author:
ak19
Message:

Incorporating Dr Bainbridge's suggested fix for dealing with Word docs processed by windows scripting that produce HTMLs with an all-encompassing div artificially added (by Word's doc to html conversion process) inside the body element before any heading elements. The presence of this div in the html would result in the html representing each section of the overall html being broken (each section had to be valid html standalone). This then had the side-effect of breaking ckeditor's interaction with Greenstone's doc online editing facilities, metadata and map editing. The map and meta editing buttons would disappear (or if ckeditor initialisation code was shifted around) become uninteractive since ckeditor would think the div's entire contents are to be marked up for editing, which included the meta and map edit buttons.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/StructuredHTMLPlugin.pm

    r33299 r33301  
    105105    my @head_and_body = split(/<body/i,$$textref); 
    106106    my $head = shift(@head_and_body); 
    107     my $body_text = join("<body", @head_and_body); 
     107    my $body_text = join("<body", @head_and_body); # won't actually work to prefix "<body" to just the body remaining in @head_and_body array, since only 1 element (the body) remains in @head_and_body 
    108108    $head =~ m/<title>(.+)<\/title>/i; 
    109109    my $doctitle = $1 if defined $1; 
     
    158158    $body_text =~ s/(<p[^>]*><span[^>]*><o:p>&nbsp;<\/o:p><\/span><\/p>)//isg; 
    159159    $body_text =~ s/(<p[^>]*><o:p>&nbsp;<\/o:p><\/p>)//isg; 
    160      
    161     # what was the following line for. effectively unused. do we need it?? 
     160     
     161    # what was the following line for. effectively unused. do we need it?? 
    162162    #$section_text .= "<!--\n<Section>\n-->\n"; 
    163163    #my $top_section_tag = "<!--\n<Section>\n-->\n"; 
     
    165165    #$body_text =~ s/(<div.*)/$top_section_tag$1/i; 
    166166    my $body = "<body".$body_text; 
    167      
     167     
    168168    my $section_text = $head; 
    169169     
     
    175175    my $sectionh1 = 0; 
    176176    $section_text .= shift(@h_split); 
    177      
     177 
     178    # When windows_scripting is on, WordPlugin invokes Word to convert the doc(x) file to HTML which is then 
     179    # processed by this StructuredHTMLPlugin. However, Word will embed the entire HTML body content inside a <div> 
     180    # This <div> becomes problematic, since in sectioned documents, the first section would end up starting with a div 
     181    # but not contain a matching closing div, while the final section will end with an unmatched closing div. 
     182    # So, as a hack, we remove any opening <div> appearing immediately after the <body> before the first <h>eading. 
     183    # And we'll set a flag to remember to remove any corresponding closing </div> before the closing </body>. 
     184    # So, now we look for any unclosed <div> elements in the preamble (pre-Headings) html that is in $section_text 
     185    my $remove_global_div = 0; 
     186    if($section_text =~ m/^(.*?)\s*<div[^>]*>\s*$/is) { 
     187        $section_text = $1; 
     188        $remove_global_div = 1;      
     189        print $outhandle "********** Found and removed a global opening <div> at start of html body, will monitor for closing div too.\n" 
     190            if $self->{'verbosity'} > 2; 
     191    } 
     192     
    178193    my $hc; 
    179194    foreach $hc ( @h_split ) 
     
    260275        } 
    261276    } 
    262  
    263     while ($hnum >= 1) 
     277     
     278    if($remove_global_div) { # then need to also handle a closing </div> tag for the global div too, and if one is present, remove it    
     279        $section_text =~ s@\s*</div[^>]*>(\s*</body>\s*</html>\s*)$@$1@is; 
     280        print $outhandle "********** Removing any matching closing global divider element\n" 
     281            if $self->{'verbosity'} > 2; 
     282    } 
     283     
     284     
     285    while ($hnum >= 1) 
    264286    { 
    265287    my $spacing = "  " x $hnum; 
     
    271293 
    272294    $section_text .= "<!--\n</Section>\n-->\n"; 
    273  
     295     
    274296    $$textref = $section_text; 
    275297