Changeset 24290 for main/trunk/greenstone2/perllib/plugins/PDFPlugin.pm
- Timestamp:
- 2011-07-19T14:02:17+12:00 (13 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/greenstone2/perllib/plugins/PDFPlugin.pm
r24199 r24290 173 173 if ($secondary_plugin_name eq "HTMLPlugin") { 174 174 # pdftohtml always produces utf8 - What about pdfbox??? 175 push(@$specific_options, "-input_encoding", "utf8");175 # push(@$specific_options, "-input_encoding", "utf8"); 176 176 push(@$specific_options, "-extract_language") if $self->{'extract_language'}; 177 177 push(@$specific_options, "-processing_tmp_files"); … … 238 238 } 239 239 240 240 # By setting hashing to be on ga xml this ensures that two 241 # PDF files that are identical except for the metadata 242 # to hash to different values. Without this, when each PDF 243 # file is converted to HTML there is a chance that they 244 # will both be *identical* if the conversion utility does 245 # not embed the metadata in the generated HTML. This is 246 # certainly the case when PDFBOX is being used. 247 248 # This change makes this convert to based plugin more 249 # consistent with the original vision that the same document 250 # with different metadata should 251 # be seen as different. 252 253 sub get_oid_hash_type { 254 my $self = shift (@_); 255 return "hash_on_ga_xml"; 256 } 257 258 241 259 sub tmp_area_convert_file { 242 260
Note:
See TracChangeset
for help on using the changeset viewer.