This is pdftohtml, which was based at:
http://www.ra.informatik.uni-stuttgart.de/~gosho/pdftohtml/
It has recently been picked up again, and is currently based at:
http://pdftohtml.sourceforge.net/
The version is based on version 0.22, with some code included from
version 0.31. It has been modified for Greenstone use, particularly
the file xpdf/HtmlOutputDev.cc, in an attempt to get text and images
in roughly the right place without using javascript or multiple pages.
Known problems:
tables with text.
multi-column pages.
some image types don't get extracted.
John McPherson.
02 May 2001.