Context Navigation

← Previous Change
Next Change →

ConvertBinaryFile.pm

Timestamp:

2018-06-21T21:41:12+12:00 (6 years ago)

Author:

ak19

Message:

First set of commits to do with implementing the new 'paged_html' output option of PDFPlugin that uses using xpdftools' new pdftohtml. So far tested only on Linux (64 bit), but things work there so I'm optimistically committing the changes since they work. 2. Committing the pre-built Linux binaries of XPDFtools for both 32 and 64 bit built by the XPDF group. 2. To use the correct bitness variant of xpdftools, setup.bash now exports the BITNESS env var, consulted by gsConvert.pl. 3. All the perl code changes to do with using xpdf tools' pdftohtml to generate paged_html and feed it in the desired form into GS(3): gsConvert.pl, PDFPlugin.pm and its parent ConvertBinaryPFile.pm have been modified to make it all work. xpdftools' pdftohtml generates a folder containing an html file and a screenshot for each page in a PDF (as well as an index.html linking to each page's html). However, we want a single html file that contains each individual 'page' html's content in a div, and need to do some further HTML style, attribute and structure modifications to massage the xpdftool output to what we want for GS. In order to parse and manipulate the HTML 'DOM' to do this, we're using the Mojo::DOM package that Dr Bainbridge found and which he's compiled up. Mojo::DOM is therefore also committed in this revision. Some further changes and some display fixes are required, but need to check with the others about that.

File:

: 1 edited

main/trunk/greenstone2/perllib/plugins/ConvertBinaryFile.pm (modified) (3 diffs)

Legend:

: Unmodified
: Added
: Removed

main/trunk/greenstone2/perllib/plugins/ConvertBinaryFile.pm

-              r31766
+              r32205
+    }
     if ($convert_to =~ /^html/) { # may be html or html_multi
+    if ($convert_to =~ /^html/ || $convert_to eq "paged_html") { # may be html or html_multi, or paged_html with the new Xpdf's own pdftohtml
     $self->{'convert_to_plugin'} = "HTMLPlugin";
     $self->{'convert_to_ext'} = "html";
 …
         $output_filename = $tmp_dirname . "\/$utf8_tailname\/" . $utf8_tailname . ".$output_type";
+    }
+    } elsif ($output_type eq "paged_html") {
+    $output_filename =~ s/$lc_suffix$/.html/;
     } else {
     $output_filename =~ s/$lc_suffix$/.$output_type/;
 …
     if ("$conv_filename" eq "") {return -1;} # had an error, will be passed down pipeline
+    if (! -e "$conv_filename") {return -1;}
+    # We used to return -1 here if $conv_filename didn't exist at this stage
+    # However, for "paged_html" convert_to mode, the converted HTML file $conv_filename
+    # will only be created from conversion products *after* convert_post_process() returns
+    my $output_type=$self->{'convert_to'};
+    if ($output_type ne "paged_html" && ! -e "$conv_filename") {return -1;}
     $self->{'conv_filename'} = $conv_filename;
     $self->convert_post_process($conv_filename);
+    if ($output_type eq "paged_html" && ! -e "$conv_filename") {return -1;}
     # Run the "fribidi" (http://fribidi.org) Unicode Bidirectional Algorithm program over the converted file

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 32205 for main/trunk/greenstone2/perllib/plugins/ConvertBinaryFile.pm

Legend:

main/trunk/greenstone2/perllib/plugins/ConvertBinaryFile.pm

Download in other formats: