Opened 12 years ago
Last modified 8 years ago
#767 new enhancement
AbiWord and perl libraries for converting doc(x) to html
Reported by: | ak19 | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | Possible 2.88 Release |
Component: | Collection Building | Severity: | enhancement |
Keywords: | Cc: |
Description
Max suggested looking over CPAN for docx conversion. We then found the following.
http://search.cpan.org/~amiri/MSWord-ToHTML-0.003/lib/MSWord/ToHTML.pm
(Uses AbiWord.)
http://search.cpan.org/search?query=msword+html&mode=all
http://search.cpan.org/~amiri/MSWord-ToHTML-0.003/lib/MSWord/ToHTML.pm
http://www.abisource.com/wiki/PluginMatrix
http://www.abisource.com/release-notes/2.8.0.phtml
Open-Office is a large download. AbiWord, which now handles docx (converts complex docx to html really well, tried it just now), is only 8MB. Maybe if the user doesn't have OO and Office/Word 2007+ already installed we can resort to using Perl code to look for Abiword to do the conversion (if the user has that installed). This will also work on Mac and Linux, since AbiWord is available for those platforms.
The problem is Windows/IIS permissions, and solution is that both the Perl and Java directories and the cmd.exe file (in winsys/system32) need to be executable by whatever the IIS Application Pool Identity is (in my case "NETWORK SERVER) and possibly readable by whatever the Anonymous Identity for your GSDL is (again, I had something like "IUSR_DC-10DIGITHEX").
Melbourne Web Developer | Melbourne SEO Services