Changeset 27990 for other-projects


Ignore:
Timestamp:
2013-08-06T22:20:41+12:00 (11 years ago)
Author:
ak19
Message:

2 fixes: 1. The Tudor collections' html source documents have stray carriage returns (r) that are not cleaned up by html tidy and make it into the linux doc.xml which for the rest use only linefeed chars. In contrast, the windows doc.xml was explicitly processed to convert all carriage-return-line-feed (rn) into linefeed by removing the carriage returns. So there were stray carriage returns in the linux doc.xml but these had been removed in the windows doc.xml, resulting in differences. 2. Partly truncated ampersand entities in the xml report are now completed so that things don't break when the xslt is applied during the summarise command that generates the report.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/nightly-tasks/diffcol/trunk/diffcol/diffcol.pl

    r27971 r27990  
    654654           
    655655                my $win_contents = $testIsWin ? \$test_contents : \$model_contents;
    656            
     656                my $lin_contents = $testIsWin ? \$model_contents : \$test_contents;
     657               
    657658                # remove all carriage returns \r - introduced into doc.xml by multiread after pdf converted to html
    658659                $$win_contents =~ s@[\r]@@g;
     
    665666                #FOR MAC: old macs use CR carriage return (see http://www.perlmonks.org/?node_id=745018), so replace with \n?)
    666667                # $$win_contents =~ s@\r@\n@mg;
     668               
     669                # remove solitary, stray carriage returns \r in the linux doc.xml, as occurs in the tudor collection owing to the source material
     670                # containing solitary carriage returns instead of linefeed
     671                $$lin_contents =~ s@[\r]@@g; #$$lin_contents =~ s@[\r][^\n]@@g;
    667672            }
    668673           
     
    862867
    863868        # make sure there are no stray ampersands/partial ampersands that need to be completed as < or >
    864         if($strOutput =~ m/&(.{1,2})?$/) { # &lt => < or &g => >
     869        if($strOutput =~ m/&(.{1,2})?$/ || $strOutput =~ m/&amp$/) { # &lt => < or &g => > or &a(m)=> & or &amp => &
    865870        if(defined $1 && $1) {
    866871            my $rest = $1;
    867             if($rest eq "g" || $rest eq "l") {
     872            if($rest =~ m/^a/) {
     873                $strOutput =~ s@am?p?$@amp;@;
     874            }
     875            elsif($rest eq "g" || $rest eq "l") {
    868876            $strOutput .= "t;"; # close the known tag
    869877            }
    870             elsif($1 eq "gt" || $1 eq "lt") {
     878            elsif($rest eq "gt" || $rest eq "lt") {
    871879            $strOutput .= ";";
    872             }
     880            }           
    873881        } else { # & on its own
    874882            #$strOutput = substr( $strOutput, 0, 977); # lop off the &
Note: See TracChangeset for help on using the changeset viewer.