This is pdftohtml, which was based at: http://www.ra.informatik.uni-stuttgart.de/~gosho/pdftohtml/ It has recently been picked up again, and is currently based at: http://pdftohtml.sourceforge.net/ The version is based on version 0.22, with some code included from version 0.31. It has been modified for Greenstone use, particularly the file xpdf/HtmlOutputDev.cc, in an attempt to get text and images in roughly the right place without using javascript or multiple pages. Known problems: tables with text. multi-column pages. some image types don't get extracted. John McPherson. 02 May 2001.