Ticket #940 (new defect)

Opened 3 years ago

Apache Tika - see if Sam's GS2 extension works and write up tutorial

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: 3.10 Release
Component: Collection Building Severity: enhancement
Keywords: Cc:


One of questions by Tom Ip on the mailing list was whether there was support in GS for Apache Tika's comprehensive document format conversion tool.

It turns out that Sam had written an extension for Tika, including a document conversion plugin (pm file), see http://trac.greenstone.org/changeset/22690

1. Try to download his jar http://trac.greenstone.org/browser/gs2-extensions/tika/trunk/tika-java.tar.gz

and see if the existing version works

2. Try to get it working otherwise.

3. Maybe upgrade to the latest version of Tika and ensure it still works.

4. Write up a tutorial or else at least a wiki page on how to use this extension with GLI.

Note: See TracTickets for help on using tickets.