Opened 13 years ago

Last modified 9 years ago

#767 new enhancement

AbiWord and perl libraries for converting doc(x) to html

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: Possible 2.88 Release
Component: Collection Building Severity: enhancement
Keywords: Cc:

Description

Max suggested looking over CPAN for docx conversion. We then found the following.

http://search.cpan.org/~amiri/MSWord-ToHTML-0.003/lib/MSWord/ToHTML.pm

(Uses AbiWord.)

http://search.cpan.org/search?query=msword+html&mode=all

http://search.cpan.org/~amiri/MSWord-ToHTML-0.003/lib/MSWord/ToHTML.pm

http://www.abisource.com/wiki/PluginMatrix

http://www.abisource.com/release-notes/2.8.0.phtml

Open-Office is a large download. AbiWord, which now handles docx (converts complex docx to html really well, tried it just now), is only 8MB. Maybe if the user doesn't have OO and Office/Word 2007+ already installed we can resort to using Perl code to look for Abiword to do the conversion (if the user has that installed). This will also work on Mac and Linux, since AbiWord is available for those platforms.

Change History (2)

comment:1 by ak19, 13 years ago

Milestone: 2.86 Release

comment:2 by domtheo, 9 years ago

The problem is Windows/IIS permissions, and solution is that both the Perl and Java directories and the cmd.exe file (in winsys/system32) need to be executable by whatever the IIS Application Pool Identity is (in my case "NETWORK SERVER) and possibly readable by whatever the Anonymous Identity for your GSDL is (again, I had something like "IUSR_DC-10DIGITHEX").

Melbourne Web Developer | Melbourne SEO Services

Note: See TracTickets for help on using tickets.