Opened 16 years ago

Last modified 13 years ago

#390 new defect

pdf conversion to text

Reported by: kjdon Owned by: nobody
Priority: moderate Milestone: Possible 2.88 Release
Component: Collection Building: Plugins Severity: minor
Keywords: Cc:

Description

If you select convert_to text for PDFPlugin, it tries to run pdftotext. But we don't supply this, and the conversion fails.

Should we supply it?

Should we try a different format?

Change History (7)

comment:1 by davidb, 16 years ago

Milestone: Release 2.81Release 2.82

Original problem detected on Linux. Probably true for Windows and Mac as well. Look to use ghostscript?

comment:2 by kjdon, 15 years ago

Milestone: Release 2.82Release 2.83

comment:3 by kjdon, 15 years ago

Component: Collection BuildingCollection Building: Plugins
Milestone: Greenstone 2 wishlist2.84 Release

comment:4 by kjdon, 14 years ago

Milestone: 2.84 Release2.85 Release

comment:5 by kjdon, 14 years ago

If you are using new PDFBox extension, then it can do both html and text.

comment:6 by sjm84, 13 years ago

Milestone: 2.85 Release2.86 Release

comment:7 by ak19, 13 years ago

Just committed (rev 24199 and r24200) some minor changes that allow PDFBox to convert to text.

The following Perl Module is described as being capable of doing PDF to text conversion:

http://search.cpan.org/~cdolan/CAM-PDF-1.55/lib/CAM/PDF.pm

Don't know yet how it deals with the latest PDF version. Can also see:

http://search.cpan.org/~antro/PDF-111/PDF.pm

Note: See TracTickets for help on using tickets.