Changeset 7586


Ignore:
Timestamp:
2004-06-11T11:13:50+12:00 (20 years ago)
Author:
kjdon
Message:

if we remove the title cos it matches a filename, then we add in a meta tag with orig-title - this makes sure that the generated html files are not identical even when the pdfs have no text (if they are identical they all get the same hash id and end up overwriting each other in the archives dir

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/bin/script/pdftohtml.pl

    r7120 r7586  
    167167        if ($line =~ m@<title>(.*?)</title>@) {
    168168        my $title=$1;
     169       
    169170        # is this title the name of a filename?
    170171        if (-r "$title.pdf" || -r "$title.html") {
    171172            # remove the title
    172             $line =~ s@<title>.*?</title>@<title></title>@;
     173            $line =~ s@<title>.*?</title>@<title></title><META NAME=\"Orig-title\" CONTENT=\"$title\">@;
    173174        }
    174175        }
Note: See TracChangeset for help on using the changeset viewer.