Opened 14 years ago
Last modified 13 years ago
#664 new feature
Apache Tika for document Conversion
Reported by: | kjdon | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | Possible 2.88 Release |
Component: | Collection Building: Plugins | Severity: | major |
Keywords: | Cc: |
Description
We have talked about using Open Office to do document conversion. Doug Carter suggested Apache Tika as an alternative:
*
I had a lot of trouble messing with the Open Office for conversion, but found a better solution using Apache Tika:
I put a shell wrapper around a java command line and created a ooxmltohtml.pl script that handles nearly all of the Office 2007 document formats. I hacked gsConvert.pl to include the new doc types, and a single new plugin OOXMLPlug to handle the importing.
It works great, but performance is bit of an issue. If the performance problem was addressed, you could probably dump nearly all of the proprietary document converters.
*
see also #426
Change History (3)
comment:1 by , 14 years ago
Milestone: | Collection building wishlist → 2.84 Release |
---|
comment:2 by , 14 years ago
Milestone: | 2.84 Release → 2.85 Release |
---|
comment:3 by , 13 years ago
Milestone: | 2.85 Release → 2.86 Release |
---|