Context Navigation

← Previous Ticket
Next Ticket →

#699 new enhancement

handling sections in PDF

Reported by:	kjdon	Owned by:	sjm84
Priority:	moderate	Milestone:	Possible 2.88 Release
Component:	Collection Building: Plugins	Severity:	major
Keywords:		Cc:

Description

Users want the ability to extract section info from PDF like we can from HTML or Word.

Does -complex work for this?

Will new converters handle this better?

Change History (4)

comment:1 by kjdon, 14 years ago

adobe pdf reference manual has sections and toc apparently. test on that.

comment:2 by kjdon, 14 years ago

Milestone:	2.84 Release → 2.85 Release

PDFBox api has some hooks that will let us get section information out of a PDF (assuming the info is there in the PDF). This goes beyond the default PDFtoHTML/txt utility provided by apache, but should be doable with a bit of programming effort on our part.

comment:3 by sjm84, 13 years ago

Milestone:	2.85 Release → 2.86 Release

comment:4 by ak19, 13 years ago

PDFBox now works with PDFPlugin's use_sections flag (http://trac.greenstone.org/ticket/753)

Regarding this ticket, what sort of section info is to be extracted (is it metadata embedded in the PDF)?

Note: See TracTickets for help on using tickets.

Download in other formats: