Opened 13 years ago

Closed 11 years ago

#426 closed defect (fixed)

Investigate new conversion tools for Word and other Office documents

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: 2.84 Release
Component: Collection Building Severity: major
Keywords: Word, office, conversion, wvware Cc:

Description

The WVWare page's latest news was in 2006. Since then Office 2007 has come along and Word 2007 documents are not compatible for conversion with wvware. There has been at least one email saying that PPT conversion in GLI is not going to smoothly either.

Dr Nichols thinks it is time we try to find alternative conversion tools.

Here are some of the URLs he found:

http://www.nativewinds.montana.com/software/docx2rtf.html "NW Docx Converter, Docx2Rtf v3.2" http://swik.net/Word+conversion Some links to conversion software, including of Word http://www.xml.com/pub/a/2003/12/31/qa.html "From Word to XML" http://poi.apache.org/ "Apache POI - Java API To Access Microsoft Format Files" http://drupal.org/node/139851 "Word Doc to HTML Converters?" http://sourceforge.net/projects/wordhtml/ "WordHTML CV" http://pastcounts.wordpress.com/2008/01/30/word-to-latex/ "Word to LaTeX" http://m.linuxjournal.com/article/9493 "Cooking with Linux - Words, Words, Words..."

WvWare related pages: http://www.abisource.com/ "AbiWord" http://wvware.sourceforge.net/

Change History (5)

comment:1 by kjdon, 12 years ago

Milestone: Release 2.82Release 2.83

comment:2 by kjdon, 11 years ago

Milestone: Greenstone 2 wishlistCollection building wishlist

comment:3 by kjdon, 11 years ago

Milestone: Collection building wishlist2.84 Release

comment:4 by kjdon, 11 years ago

Prob will do open office and/or tika. see #430, #664

comment:5 by davidb, 11 years ago

Resolution: fixed
Status: newclosed

Katherine has developed OpenOfficeConverter and OpenOfficePlugin. These are available as an extension to Greenstone. At 200 MB installed, OpenOffice is of a significant size to install, and so we do not bundle this with the extension, rather we expect the user to install this themselves. Apache Tika (100% java) is of a more modest size, and Sam has started a comparable extension based around this ... the difference being the relevant Jar file is included in the extension.

Note: See TracTickets for help on using tickets.