Opened 16 years ago
Last modified 13 years ago
#390 new defect
pdf conversion to text
Reported by: | kjdon | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | Possible 2.88 Release |
Component: | Collection Building: Plugins | Severity: | minor |
Keywords: | Cc: |
Description
If you select convert_to text for PDFPlugin, it tries to run pdftotext. But we don't supply this, and the conversion fails.
Should we supply it?
Should we try a different format?
Change History (7)
comment:1 by , 16 years ago
Milestone: | Release 2.81 → Release 2.82 |
---|
comment:2 by , 15 years ago
Milestone: | Release 2.82 → Release 2.83 |
---|
comment:3 by , 15 years ago
Component: | Collection Building → Collection Building: Plugins |
---|---|
Milestone: | Greenstone 2 wishlist → 2.84 Release |
comment:4 by , 14 years ago
Milestone: | 2.84 Release → 2.85 Release |
---|
comment:5 by , 14 years ago
If you are using new PDFBox extension, then it can do both html and text.
comment:6 by , 13 years ago
Milestone: | 2.85 Release → 2.86 Release |
---|
comment:7 by , 13 years ago
Just committed (rev 24199 and r24200) some minor changes that allow PDFBox to convert to text.
The following Perl Module is described as being capable of doing PDF to text conversion:
http://search.cpan.org/~cdolan/CAM-PDF-1.55/lib/CAM/PDF.pm
Don't know yet how it deals with the latest PDF version. Can also see:
Original problem detected on Linux. Probably true for Windows and Mac as well. Look to use ghostscript?