Ticket #426 (closed defect: fixed)

Opened 12 years ago

Last modified 10 years ago

Investigate new conversion tools for Word and other Office documents

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: 2.84 Release
Component: Collection Building Severity: major
Keywords: Word, office, conversion, wvware Cc:


The WVWare page's latest news was in 2006. Since then Office 2007 has come along and Word 2007 documents are not compatible for conversion with wvware. There has been at least one email saying that PPT conversion in GLI is not going to smoothly either.

Dr Nichols thinks it is time we try to find alternative conversion tools.

Here are some of the URLs he found:

 http://www.nativewinds.montana.com/software/docx2rtf.html "NW Docx Converter, Docx2Rtf v3.2"  http://swik.net/Word+conversion Some links to conversion software, including of Word  http://www.xml.com/pub/a/2003/12/31/qa.html "From Word to XML"  http://poi.apache.org/ "Apache POI - Java API To Access Microsoft Format Files"  http://drupal.org/node/139851 "Word Doc to HTML Converters?"  http://sourceforge.net/projects/wordhtml/ "WordHTML CV"  http://pastcounts.wordpress.com/2008/01/30/word-to-latex/ "Word to LaTeX"  http://m.linuxjournal.com/article/9493 "Cooking with Linux - Words, Words, Words..."

WvWare? related pages:  http://www.abisource.com/ "AbiWord?"  http://wvware.sourceforge.net/

Change History

Changed 11 years ago by kjdon

  • milestone changed from Release 2.82 to Release 2.83

Changed 10 years ago by kjdon

  • milestone changed from Greenstone 2 wishlist to Collection building wishlist

Changed 10 years ago by kjdon

  • milestone changed from Collection building wishlist to 2.84 Release

Changed 10 years ago by kjdon

Prob will do open office and/or tika. see #430, #664

Changed 10 years ago by davidb

  • status changed from new to closed
  • resolution set to fixed

Katherine has developed OpenOfficeConverter? and OpenOfficePlugin?. These are available as an extension to Greenstone. At 200 MB installed, OpenOffice? is of a significant size to install, and so we do not bundle this with the extension, rather we expect the user to install this themselves. Apache Tika (100% java) is of a more modest size, and Sam has started a comparable extension based around this ... the difference being the relevant Jar file is included in the extension.

Note: See TracTickets for help on using tickets.