source: trunk/gsdl/bin/script/gsConvert.pl

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @2977   22 years jrm21 added infrastructure for calling an external powerpoint to html converter.
(edit) @2755   23 years jrm21 import.pl now takes an option for saving file conversion failures to a …
(edit) @2656   23 years jrm21 replaced some of the -e (exist) tests with -s (size>0) tests when …
(edit) @2600   23 years jrm21 when extracting text from postscript, we now wrap the lines, so that a …
(edit) @2574   23 years jrm21 gsConvert now knows about rtftohtml package. wv config file is now in …
(edit) @2512   23 years sjboddie Moved wv's XML config file from packages/wv to etc/wv so that it is …
(edit) @2447   23 years jrm21 Safety check: postscript files start with %! - don't just rely on .ps …
(edit) @2248   23 years sjboddie Fixed another couple of bugs in the pdf and word conversion stuff …
(edit) @2241   23 years sjboddie Tidied up the ConvertToPlug stuff to get it working on Windows 95/98
(edit) @2117   23 years jrm21 Tidied up a bit, error in args passed to pdf_to_text, etc.
(edit) @2060   23 years jrm21 (system($cmd)>0) : system can return -1 if the cmd couldn't be …
(edit) @2032   23 years jrm21 added .ps to the usage, fixed typo in comment.
(edit) @2031   23 years jrm21 Improved postscript to text handling a little bit better. Also, …
(edit) @2023   23 years sjboddie Changed pdftohtml.pl and gsConvert.pl to reflect the fact that their …
(edit) @2012   23 years jrm21 re-added the crappy PS text-stripper, and made the error handling a …
(edit) @1997   23 years dg5 Modified gsConvert.pl and pdftohtml.pl to reflect moving of pdftohtml …
(edit) @1970   23 years sjboddie Added more usage information to all perl programs and removed a few …
(edit) @1960   23 years dg5 Modified pdftohtml.pl to reflect the change in location of …
(edit) @1928   23 years sjboddie Added: pdftohtml.pl - Perl script that handles conversion of PDF …
(edit) @1734   23 years jrm21 For postscript, fall back to some simple text extraction if ps2ascii …
(edit) @1705   23 years say1 fixed to handle filenames with multiple dots.
(edit) @1692   24 years jrm21 pdftohtml can't handle encrypted pdfs, but doesn't return an error …
(edit) @1687   24 years jrm21 modified ulimit timeouts to 40sec from 20sec :)
(edit) @1684   24 years paynter Supports rtf file as an input type, looks in packages dir for pdftohtml
(edit) @1654   24 years paynter Check .doc files to see if they are RTF files, Word 6/7/8 files that …
(edit) @1578   24 years paynter Uses wv version 0.6.0-gs
(edit) @1567   24 years sjboddie force gsConvert.pl to use utf-8 encoding when converting word docs to html
(add) @1445   24 years paynter Replaced gs2html and gs2text with gsConvert.pl
Note: See TracRevisionLog for help on using the revision log.