Ignore:
Timestamp:
2022-10-16T18:51:49+13:00 (19 months ago)
Author:
anupama
Message:

We used to run diffcol as a nightly task only for GS2. Commit 36655 was the first stage of getting diffcol to work for GS3, but skipped a lot of important code branches (like comparing the index\text\j/gdb files) in order to fix up the easier parts of the code. Now that I think the remainder of the diffcol scripts have been got to work with diffcol for GS3, where the index\text\flatdb files are compared and diffcol works for them, I can commit the important changes as well as commented out debugging statements made to the diffcol scripts that get the full diffcol code to work for GS3 diffcol. I will recommit again after removing the debugging statements. And I still need to do a full local diffcol run again, as well as testing if diffcol still works after locally undoing my sort field changes to some GS3 model cols (the recent commits to Tudor, Word-PDF, Images-GPS and Multimedia collections) to see if Dr Bainbridge's PERL_HASH_SEED env var addition fixes all of those collections diffcol failures, making the extra sorting redundant. In that case, I will recommit those model collections after updating their col configurations to not do the extra sorting.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/nightly-tasks/diffcol/trunk/diffcol/gdbdiff.pm

    r35231 r36807  
    106106    $test_text =~ s@(tmp[\\\/])(\d*[\\\/])@$1@g;
    107107
     108
     109#print STDERR "@@@@ DEBUGGING: $debugging\n";
     110#print STDERR "******** full_modeldb: $full_modeldb\n$model_text\n\n";
     111#print STDERR "******** full_testdb: $full_testdb\n$test_text\n\n";
     112   
    108113    # if the OS doesn't match and one of them is windows, extra work needs to be done to bring the db files
    109114    # in test and model collection to an even base for comparison
     
    231236    } # end of equalising differences between a windows collection's db file and linux coll's db file
    232237   
     238   
     239    # Windows or linux: if index is a flat db file, then ensure the docIDs listed in <contains> field of
     240    # both test and model flat db file are alphabetically sorted. So to the numbers in <mdoffset> field.
     241    # Despite PERL_PERTURB_KEYS envvar being set to 0 on both machine when generating model collections
     242    # and when test collections were generated on test machine, still collections like Images-GPS and some
     243    # other colls list items in <contains> and <mdoffset> in different orders. So reordering alphabetically.   
     244    #if($dbname =~ m/$strColName/) {   
     245        # regex modifiers mge: multi-line, global (replace as many as match), e allows function call in substitution
     246        ##$model_text =~ s@^<contains>(.*)@sort_contains_field($1, "MODEL", $debugging)@mge;
     247        ##$test_text =~ s@^<contains>(.*)@sort_contains_field($1, "TEST", $debugging)@mge;
     248    #   $model_text =~ s@^<(contains|mdoffset)>(.*)@sort_field_value($1, $2, "MODEL", $debugging)@mge;
     249    #   $test_text =~ s@^<(contains|mdoffset)>(.*)@sort_field_value($1, $2, "TEST", $debugging)@mge;       
     250    #}
     251   
    233252    # The following block of code is necessary to deal with tmp (html) source files generated when using PDFBox
    234253    # These tmpdirs are located inside the toplevel *greenstone* directory
     
    302321    # Call diff?
    303322}
     323
     324# Unused, but may come in handy when debugging again: regex substitution helper function
     325sub sort_field_value {
     326    my($fieldname, $fieldvalue, $displayStr, $debugging) = @_;
     327   
     328    print STDERR "\n$displayStr BEFORE sort: <$fieldname>$fieldvalue\n" if($debugging);
     329   
     330    $fieldvalue =~ s@(\r|\n|\\n)*$@@; # get rid of trailing newlines/carriage returns
     331    my @values_list = split(';', $fieldvalue);
     332    @values_list = sort @values_list;
     333    $fieldvalue = "<$fieldname>".join(';', @values_list). "\n";
     334   
     335    print STDERR "$displayStr AFTER  sort: $fieldvalue\n" if($debugging);
     336   
     337    return $fieldvalue;
     338}
     339
    304340
    305341# returns true if the contents are windows AND it matters for the diffing on the db that it's windows
Note: See TracChangeset for help on using the changeset viewer.