Windows Scripting should possibly be able to handle docx
|Reported by:||ak19||Owned by:||nobody|
|Component:||Collection Building: Plugins||Severity:||major|
Windows Scripting should possibly be able to handle docx since there are word2html tools out there that generate html from docx files that require you to be on a windows computer, so the assumption is that it will be using native windows scripting to accomplish the conversion.
There's a plugin jar file released by maven that seems to do something useful in this respect. If it's open source, we could perhaps modify it to not require maven itself. http://maven-plugins.sourceforge.net/maven-word2html-plugin/index.html
The following is from a message sent when investigating the above:
Sadly, trying to get the docx processed with Windows Scripting (when one doesn't have open office) doesn't work. MS Word 2007 launches okay, but is looking for a file with a doc extension. The path to the document in the command that the perl code runs is correct, in that it still refers to the extension as docx. This behavious appears to be internal to the word2html executable that gets launched by perl, which then uses Windows Scripting to get it converted.
We might be able to change this to not try to open a "doc" file but a "docx" in MS Word. The question remains whether native windows scripting can handle docx from there onward.
If that fails, there appear to be more third party software for converting docx files to html. Even if they can't be that much better at conversion than using the Open Office plugin, the advantage would still be that users won't be required to install the entire OO suite to process files.
If someone doesn't have the Open Office suite installed, but are on a Windows machine that has word 2007, they can get docx files converted to html in Word and then use that resultant document during importing.
Finally, it may be possible to modify the vbs script that does the word2html conversion (executable and script are in bin/windows) and tell it to launch docx and not doc when appropriate. Maybe things will work smoothly from there on.