Ticket #767 (new enhancement)

Opened 6 years ago

Last modified 22 months ago

AbiWord and perl libraries for converting doc(x) to html

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: 2.87 Release
Component: Collection Building Severity: enhancement
Keywords: Cc:

Description

Max suggested looking over CPAN for docx conversion. We then found the following.

 http://search.cpan.org/~amiri/MSWord-ToHTML-0.003/lib/MSWord/ToHTML.pm

(Uses AbiWord?.)

 http://search.cpan.org/search?query=msword+html&mode=all

 http://search.cpan.org/~amiri/MSWord-ToHTML-0.003/lib/MSWord/ToHTML.pm

 http://www.abisource.com/wiki/PluginMatrix

 http://www.abisource.com/release-notes/2.8.0.phtml

Open-Office is a large download. AbiWord?, which now handles docx (converts complex docx to html really well, tried it just now), is only 8MB. Maybe if the user doesn't have OO and Office/Word 2007+ already installed we can resort to using Perl code to look for Abiword to do the conversion (if the user has that installed). This will also work on Mac and Linux, since AbiWord? is available for those platforms.

Change History

Changed 6 years ago by ak19

  • milestone set to 2.86 Release

Changed 22 months ago by domtheo

The problem is Windows/IIS permissions, and solution is that both the Perl and Java directories and the cmd.exe file (in winsys/system32) need to be executable by whatever the IIS Application Pool Identity is (in my case "NETWORK SERVER) and possibly readable by whatever the Anonymous Identity for your GSDL is (again, I had something like "IUSR_DC-10DIGITHEX").

 Melbourne Web Developer |  Melbourne SEO Services

Note: See TracTickets for help on using tickets.