Ticket #758 (closed enhancement: fixed)

Opened 6 years ago

Last modified 6 years ago

Windows Scripting should possibly be able to handle docx

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone:
Component: Collection Building: Plugins Severity: major
Keywords: docx Cc:

Description

Windows Scripting should possibly be able to handle docx since there are word2html tools out there that generate html from docx files that require you to be on a windows computer, so the assumption is that it will be using native windows scripting to accomplish the conversion.

There's a plugin jar file released by maven that seems to do something useful in this respect. If it's open source, we could perhaps modify it to not require maven itself.  http://maven-plugins.sourceforge.net/maven-word2html-plugin/index.html

The following is from a message sent when investigating the above:

Sadly, trying to get the docx processed with Windows Scripting (when one doesn't have open office) doesn't work. MS Word 2007 launches okay, but is looking for a file with a doc extension. The path to the document in the command that the perl code runs is correct, in that it still refers to the extension as docx. This behavious appears to be internal to the word2html executable that gets launched by perl, which then uses Windows Scripting to get it converted.

We might be able to change this to not try to open a "doc" file but a "docx" in MS Word. The question remains whether native windows scripting can handle docx from there onward.

If that fails, there appear to be more third party software for converting docx files to html. Even if they can't be that much better at conversion than using the Open Office plugin, the advantage would still be that users won't be required to install the entire OO suite to process files.

If someone doesn't have the Open Office suite installed, but are on a Windows machine that has word 2007, they can get docx files converted to html in Word and then use that resultant document during importing.

Finally, it may be possible to modify the vbs script that does the word2html conversion (executable and script are in bin/windows) and tell it to launch docx and not doc when appropriate. Maybe things will work smoothly from there on.

Change History

Changed 6 years ago by ak19

  • status changed from new to closed
  • resolution set to fixed

Couldn't locate this ticket when I wanted to close it. Reopened another ticket http://trac.greenstone.org/ticket/761

It's now done.

Note: See TracTickets for help on using tickets.