Opened 11 years ago
Last modified 11 years ago
#841 new defect
Upgrade to PDFBox 1.7 as it can convert txt pages to images
Reported by: | ak19 | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | Possible 2.88 Release |
Component: | Collection Building | Severity: | major |
Keywords: | PDFBox extension, PDFToImage | Cc: |
Description
The -pagedimg_FORMAT option is now supported when using the PDFBox extension. Howerver, our pdfbox jar file comes to version 1.5, and only "generates pages as images" when PDF pages are actually images.
The pdfbox jar version 1.7 is able to generate pages as images from PDFs containing text. However, the output images aren't always clean: sometimes columns of multi column documents overlap. This may be because the PDFToImage command of PDFBox is still in beta.
Otherwise, including in terms of line spacing (an issue we had in the past), the 1.7 pdfbox jar file appears to perform like the 1.5 version.
Should we upgrade already, or wait until the PDFToImage command works well before bothering to, since not much is gained at present?