Context Navigation

← Previous Change
Next Change →

trunk

Timestamp:

2013-08-15T22:09:07+12:00 (11 years ago)

Author:

ak19

Message:

If the tutorial collection involves tmp folders (such as timestamped ones), the equalised txt file version of the test and model GDB files is written out to a gdb file and read back into txt sorted, to get the now-relative paths to the tmp folders in the same order. 2. Square brackets around the random.html filenames that replace the random paths to GS-generated html files. 3. diffcol.pl's processing of doc.xml also had greedy matching where there should have been none. 4. The tmp folders generated for the Multimedia collection contain further subfolders that contain the actual file to be renamed to random, and this wasn't handled properly before in diffcol.pl's test collection case for the OrigSource field.

Location:

other-projects/nightly-tasks/diffcol/trunk/diffcol

Files:

: 2 edited

diffcol.pl (modified) (2 diffs)
gdbdiff.pm (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

other-projects/nightly-tasks/diffcol/trunk/diffcol/diffcol.pl

-              r28019
+              r28071
         $gsdlhome_re = ".*" unless $$ENV{'GSDLHOME'};
         my $tmpfile_regex = "<Metadata name=\"URL\">http://$gsdlhome_re/tmp/([^\.]*)(\..{3,4})</Metadata>"; # $gsdlhome/tmp/randomfilename.html, file ext can be 3 or 4 chars long
         if($test_contents =~ m@$tmpfile_regex@) {
             # found a match, replace the tmp file name with "random", keeping the original file extension
 …
             my $new_tmp_filename = "random";
+            $tmpfile_regex = "(<Metadata name=\"(URL|UTF8URL|gsdlconvertedfilename|OrigSource)\">(http://)?)($gsdlhome_re)?(/tmp/)?$old_tmp_filename($ext</Metadata>)";
+            ## The following does not work in the Multimedia collection, since there's a subfolder to tmp (the timestamp folder) which contains the output file.
+            #$tmpfile_regex = "(<Metadata name=\"(URL|UTF8URL|gsdlconvertedfilename|OrigSource)\">(http://)?)($gsdlhome_re)?(/tmp/)?$old_tmp_filename($ext</Metadata>)";
+            $tmpfile_regex = "(<Metadata name=\"(URL|UTF8URL|gsdlconvertedfilename|OrigSource)\">(http://)?)($gsdlhome_re)?(/tmp/)?.*?($ext</Metadata>)";
             if($5) {
                 $test_contents =~ s@$tmpfile_regex@$1$5$new_tmp_filename$6@g;
+                $test_contents =~ s@$tmpfile_regex@$1$5$new_tmp_filename$6@mg;
             } else { # OrigSource contains only the filename
                 $test_contents =~ s@$tmpfile_regex@$1$new_tmp_filename$6@g;
+                $test_contents =~ s@$tmpfile_regex@$1$new_tmp_filename$6@mg;
+            }
             # modelcol used a different gsdlhome, but also a tmp dir, so make the same changes to its random filename
             $tmpfile_regex = "(<Metadata name=\"(URL|UTF8URL|gsdlconvertedfilename|OrigSource)\">(http://)?)(.*)?(/tmp/)?.*($ext</Metadata>)";
+            $tmpfile_regex = "(<Metadata name=\"(URL|UTF8URL|gsdlconvertedfilename|OrigSource)\">(http://)?)(.*)?(/tmp/)?.*?($ext</Metadata>)";
             if($5) {
                 $model_contents =~ s@$tmpfile_regex@$1$5$new_tmp_filename$6@g;
+                $model_contents =~ s@$tmpfile_regex@$1$5$new_tmp_filename$6@mg;
             } else { # OrigSource contains only the filename
                 $model_contents =~ s@$tmpfile_regex@$1$new_tmp_filename$6@g;
+                $model_contents =~ s@$tmpfile_regex@$1$new_tmp_filename$6@mg;
+            }
+        }
 #       my $savepath = &getcwd."/../"; # TASK_HOME env var does not exist at this stage, but it's one level up from current directory
+#       &gdbdiff::print_string_to_file($model_contents, $savepath."model_doc.xml");
+#       &gdbdiff::print_string_to_file($test_contents, $savepath."test_doc.xml");
+#       if($strModel =~ m/(HASH010d.dir)/) { # list the HASH dirs for which you want the doc.xml file generated
+#       &gdbdiff::print_string_to_file($model_contents, $savepath."$1_model_doc.xml");
+#       &gdbdiff::print_string_to_file($test_contents, $savepath."$1_test_doc.xml");
+#       }
         $strResult = diff \$model_contents, \$test_contents, { STYLE => "OldStyle" };

other-projects/nightly-tasks/diffcol/trunk/diffcol/gdbdiff.pm

-              r28067
+              r28071
         #$model_text =~ s@\[http://.*/tmp/.*(\..{3,4})\]@tmp/random$1@mg;
+        $test_text  =~ s@\[http://[^\n]*?/tmp/.*?(\..{3,4})\]\n<section>([^\n]*?)\n@tmp/random$1\n<section>$2\n@sg;
+        $model_text =~ s@\[http://[^\n]*?/tmp/.*?(\..{3,4})\]\n<section>([^\n]*?)\n@tmp/random$1\n<section>$2\n@sg;
+        $test_text  =~ s@\[http://[^\n]*?/tmp/.*?(\..{3,4})\]\n<section>([^\n]*?)\n@[tmp/random$1\n<section>$2]\n@sg;
+        $model_text =~ s@\[http://[^\n]*?/tmp/.*?(\..{3,4})\]\n<section>([^\n]*?)\n@[tmp/random$1\n<section>$2]\n@sg;
+        # need to re- sort the keys, now that the absolute paths to tmp locations has been removed
+        # so that we get the tmp files in the same order in both model and test collections
+        # http://stackoverflow.com/questions/1909262/how-can-i-pipe-input-into-a-java-command-from-perl
+        open PIPE, "| txt2db model.gdb";
+        print PIPE "$model_text";
+        close(PIPE);
+        open PIPE, "| txt2db test.gdb";
+        print PIPE "$test_text";
+        close(PIPE);
+        $model_cmd = " db2txt -sort model.gdb 2>&1";
+        $test_cmd  = "db2txt -sort test.gdb 2>&1";
+        $model_text = readin_gdb($model_cmd);
+        $test_text = readin_gdb($test_cmd);
+    }

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 28071 for other-projects/nightly-tasks/diffcol/trunk

Legend:

other-projects/nightly-tasks/diffcol/trunk/diffcol/diffcol.pl

other-projects/nightly-tasks/diffcol/trunk/diffcol/gdbdiff.pm

Download in other formats: