Ignore:
Timestamp:
2013-06-12T18:43:04+12:00 (11 years ago)
Author:
ak19
Message:

Fixing up diffcol process so it works better. Current state finds no errors in Small-HTML model-collection. 1. Better handling of gdb database (and ignores .idh) by filtering out fields that are expected to differ such as date before doing the diff. Handles archiveinf-doc.gdb and -src.gdb files and with the sort flag Dr Bainbridge added to db2text and the sorting of keys in perllib/dbutil/gdbmtxtgz, the ordering of keys in the database is no longer affecting the outcome. 2. Better handling of doc.xml files. Once more date fields that will differ are filtered out before performing the diff. EarliestDatestamp file is ignored. 3. The task script now ensures that model-collect is up to date with the svn version when about to perform the diff col testing.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/nightly-tasks/diffcol/trunk/diffcol/gdbdiff.pm

    r27579 r27604  
    3232sub test_gdb
    3333{
    34     my ($full_modeldb, $full_testdb) = @_;
     34    my ($full_modeldb, $full_testdb,$strColName) = @_;
    3535
    3636
    3737   # print "Now is testing database\n";
    3838   
    39     my $model_cmd = "db2txt $full_modeldb 2>&1";
    40     my $test_cmd  = "db2txt $full_testdb 2>&1";
     39    # need to sort text output of both test and model col database files, to normalise them for the comparison
     40    # the -sort option to db2txt was added specifically to support diffcol
     41    my $model_cmd = "db2txt -sort $full_modeldb 2>&1";
     42    my $test_cmd  = "db2txt -sort $full_testdb 2>&1";
    4143
    4244    my $model_text = readin_gdb($model_cmd);
    4345    my $test_text = readin_gdb($test_cmd);
     46
     47
     48    # filter out the fields that can be ignored in the two database files
     49    my $ignore_line_re = "\n<(lastmodified|lastmodifieddate|oailastmodified|oailastmodifieddate)>([^\n])*";
     50    $model_text =~ s/$ignore_line_re//g;
     51    $test_text =~ s/$ignore_line_re//g;
     52
     53
     54    # ignore absolute path prefixes in modelcol and testcol (necessary for archiveinf-doc and -src.gdb files)
     55
     56    # Remember the original model col on SVN could have been built anywhere,
     57    # and in the gdb files, absolute paths are stored to the collection location.
     58    # Crop these paths to the collect/<colname> point.
     59   
     60    # Entries are of the form [Entry] or <Entry>. In order to do a sensible diff,
     61    # need to remove the prefix to the collect/colname folder in any (absolute) path that occurs in Entry
     62    # E.g. [/full/path/collect/colname/import/file.ext] should become [collect/colname/import/file.ext]
     63    # Better regex is of the form /BEGIN((?:(?!BEGIN).)*)END/, see http://docstore.mik.ua/orelly/perl/cookbook/ch06_16.htm
     64
     65    $model_text =~ s@^([^\\//]*).*(\\|/)(collect(\\|/)$strColName)(.*)$@$1$3$5@mg;
     66    $test_text =~ s@^([^\\//]*).*(\\|/)(collect(\\|/)$strColName)(.*)$@$1$3$5@mg;
     67
     68
    4469    my $report_type = "OldStyle"; # Can not change this type.
    4570    my $diff_gdb = diff \$model_text, \$test_text, { STYLE => $report_type };
    4671   
     72    # leaving the ignore regex as it used to be in the following, in case it helps with single line comparisons
    4773    $diff_gdb = &diffutil::GenerateOutput($diff_gdb,"^<(lastmodified|lastmodifieddate|oailastmodified|oailastmodifieddate)>.*");
    4874
Note: See TracChangeset for help on using the changeset viewer.