Opened 14 years ago
Last modified 13 years ago
#699 new enhancement
handling sections in PDF
Reported by: | kjdon | Owned by: | sjm84 |
---|---|---|---|
Priority: | moderate | Milestone: | Possible 2.88 Release |
Component: | Collection Building: Plugins | Severity: | major |
Keywords: | Cc: |
Description
Users want the ability to extract section info from PDF like we can from HTML or Word.
Does -complex work for this?
Will new converters handle this better?
Change History (4)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Milestone: | 2.84 Release → 2.85 Release |
---|
PDFBox api has some hooks that will let us get section information out of a PDF (assuming the info is there in the PDF). This goes beyond the default PDFtoHTML/txt utility provided by apache, but should be doable with a bit of programming effort on our part.
comment:3 by , 13 years ago
Milestone: | 2.85 Release → 2.86 Release |
---|
comment:4 by , 13 years ago
PDFBox now works with PDFPlugin's use_sections flag (http://trac.greenstone.org/ticket/753)
Regarding this ticket, what sort of section info is to be extracted (is it metadata embedded in the PDF)?
adobe pdf reference manual has sections and toc apparently. test on that.