Opened 11 years ago

Last modified 10 years ago

#699 new enhancement

handling sections in PDF

Reported by: kjdon Owned by: sjm84
Priority: moderate Milestone: Possible 2.88 Release
Component: Collection Building: Plugins Severity: major
Keywords: Cc:

Description

Users want the ability to extract section info from PDF like we can from HTML or Word.

Does -complex work for this?

Will new converters handle this better?

Change History (4)

comment:1 by kjdon, 11 years ago

adobe pdf reference manual has sections and toc apparently. test on that.

comment:2 by kjdon, 11 years ago

Milestone: 2.84 Release2.85 Release

PDFBox api has some hooks that will let us get section information out of a PDF (assuming the info is there in the PDF). This goes beyond the default PDFtoHTML/txt utility provided by apache, but should be doable with a bit of programming effort on our part.

comment:3 by sjm84, 10 years ago

Milestone: 2.85 Release2.86 Release

comment:4 by ak19, 10 years ago

PDFBox now works with PDFPlugin's use_sections flag (http://trac.greenstone.org/ticket/753)

Regarding this ticket, what sort of section info is to be extracted (is it metadata embedded in the PDF)?

Note: See TracTickets for help on using tickets.