root/gs2-extensions/xpdf-tools/trunk/src/packages/GS-README.txt @ 32251

Revision 32251, 43.4 KB (checked in by ak19, 21 months ago)

1. Getting LIBJPEG to compile on Mac requires not passing --enable-static. On Mac too, compiling LIBJPEG also only produces .a (and no .so/.dylib) no matter what, but that happens to be what we want anyway. 2. Turning off building shared libraries (--disable-shared) for all libraries we build that are used by xpdftools, so that on Mac it hopefully chooses to link these in (which won't happen if there's a preferred dylib elsewhere, particularly on the system, which will then get linked in in preference on Macs). Still need to test these changes on Linux again.

Line 
1__________________________________________________________
2CONTENTS
3__________________________________________________________
4
5Xpdf-Tools related
6A. XPDF
7B. Mojo::DOM perl package for parsing HTML
8C. Compiling Xpdf-Tools: statically or dynamically linked
9D. How we got Xpdf-Tools to compile using CASCADE-MAKE
10E. Getting more output when running CMake (verbosity)
11F. APPENDIX - Useful links
12
13LIBJPEG related
14G. LIBJPEG and LIBTIFF
15- Issues building LIBJPEG on 64 bit machines and the patch
16
17H. PDF2DOM
18    unused, replaced by Xpdf-Tools' more suited pdftohtml capabilities
19
20__________________________________________________________
21A. XPDF
22__________________________________________________________
23
24Xpdf's last mod date is in 2017 and it includes its own pdftohtml utility tool, whereas the old "pdftohtml" tool that GS used was last updated 2013 (and itself made use of Xpdf, possible older versions).
25
26The tool takes a PDF and produces an HTML file for each page of the PDF, consisting of selectable HTML text overlaid on top of "screenshot" image of the page. (A page's text is not part of the screenshot.)
27
281. https://www.xpdfreader.com/download.html
29
30As per the Readme file found in the linux binary of Xpdf Tools, the Xpdf Viewer requires the qt toolkit, but not the Xpdf Tools. Have not read the Install file to confirm whether the same is the case for when compiling the command line tools. (But in that case, can't we just include the tools binary available for all 3 OS, instead of compiling on each platform)
31
32    - Using Xpdf's pdftohtml tool:
33    greenstone@bedrock:~/Downloads/xpdf-tools-linux-4.00/bin64$./pdftohtml -z 1.5 ~/Downloads/ApacheLicence.pdf licence
34
35        where licence is a folder.
36
37    - Using Xpdf's pdftotext tool:
38    greenstone@bedrock:~/Downloads/xpdf-tools-linux-4.00/bin64$./pdftotext -nopgbrk ~/Downloads/ApacheLicence.pdf ~/Downloads/ApacheLicence.txt
39
40        where the output text file must be specified with a full path name.
41
42
432. Documentation on Xpdf-Tools:
44- https://www.xpdfreader.com/support.html
45    for example, the pdftohtml man page: https://www.xpdfreader.com/pdftohtml-man.html
46- https://linux.die.net/man/5/xpdfrc
47(Configuration flags you can put into ~/.xpdfrc to use as defaults when running xpdf tool commands)
48
493. We're using Xpdf Tools version: xpdf-tools-linux-4.00
50
514. We started by working with the ready-made Xpdf-tools binaries available for download from the xpdf site for Win, Linux and Mac.
52
535. We're now moving to compiling up Xpdf-tools ourselves using CASCADE-MAKE, which we have so far got to successfully compile statically on Linux (LSB environment inclusive) to build working binaries.
54
55On Mac, I've been unable to get it to produce statically linked libraries: at this stage they're dynamically linked.
56
57
58__________________________________________________________
59B. Mojo::DOM perl package for parsing HTML
60__________________________________________________________
61
62XPDF's pdftohtml conversion of a single PDF document produces multiple HTML files: one for each page in the source PDF.
63We want the output to be "paged_html": a single HTML file that is sectionalised, each section representing a page of the
64original PDF.
65
66We need to be able to parse the many HTML pages produced by XPDF's pdftohtml conversion of a doc, in order to massage the output
67into the single sectionalised HTML file. For this we needed a HTML parser package for Perl.
68
691. Before Dr Bainbridge found Mojo::DOM, he looked at
70* https://en.wikipedia.org/wiki/Comparison_of_HTML_parsers
71* http://radar.oreilly.com/2014/02/parsing-html-with-perl-2.html
72
732. Main links for Mojo::DOM
74* https://mojolicious.org/perldoc/Mojo/DOM
75* https://metacpan.org/pod/Mojo::DOM
76    Dependencies: http://deps.cpantesters.org/?module=Mojo%3A%3ADOM;perl=latest
77
783. Once you've downloaded Mojo::DOM's src, follow Dr Bainbridge's sequence of commands for building the Mojo::DOM CPAN module of perl below.
79We'll be using this module to be used for parsing the HTML output by XPDF tool pdftohtml
80
81
82    mkdir cpan
83     2020  tar xvzf Mojolicious-7.84.tar.gz
84     2021  cd Mojolicious-7.84/
85     2028  perl ./Makefile.PL PREFIX=`pwd`/installed
86     2030  make
87     2031  make install
88     2033  cp -r installed/share/perl/5.18.2 ../cpan
89    cd ..
90     2044  export PERL5LIB=`pwd`/cpan
91
92     2053  emacs -nw test.pl
93
94    #!/usr/bin/perl -w
95    add in 'use v5.10;'
96     
97     2054  chmod a+x test.pl
98     2055  ./test.pl
99
100
101__________________________________________________________
102C. Compiling Xpdf-Tools: statically or dynamically linked
103__________________________________________________________
104
105As explained in detail in section D below, we have a customised gs-CMakeLists.txt file which replaces the one in the xpf-4.00.tar.gz package's xpdf subfolder after this is untarred. This customised CMake configure/make file now allows us to compile xpdf-tools either statically (as we've now set it up for by default) or dynamically (as its CMake makefiles were originally set up for).
106
1071. To compile Xpdf-Tools statically, packages/CASCADE-MAKE/XPDFTOOLS.sh should contain:
108
109    cmake -DCMAKE_BUILD_TYPE=Release \
110        -DCMAKE_INSTALL_PREFIX=$prefix \
111        -DZLIB_LIBRARY=$prefix/lib/libz.a \         # <========= THIS
112        -DPNG_LIBRARY=$prefix/lib/libpng15.a \      # <========= THIS
113        -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.a \  # <========= THIS
114        -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
115        -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
116        -DCMAKE_C_FLAGS="$CFLAGS" \
117        -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
118        -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
119        -DGSDLFLAG_STATIC="$static_flag" \          # <========= THIS
120        $GEXT_XPDFTOOLS/packages/$package$version
121
122In place of FREETYPE_LIBRARY above, could also try the following,
123        -DFREETYPE_DIR=$prefix \
124but then check the built binaries by running "ldd" and "file" over them, to make sure they're not referencing any .so dynamic link libraries:
125
126
1272. To compile Xpdf-Tools dynamically and make it find *our* dynamically linked libraries for its helper packages zlib, libpng and freetype, edit packages/CASCADE-MAKE/XPDFTOOLS.sh to contain:
128
129    cmake -DCMAKE_BUILD_TYPE=Release \
130        -DCMAKE_INSTALL_PREFIX=$prefix \
131        -DZLIB_LIBRARY=$prefix/lib/libz.so.1.2.7 \          # <========= THIS
132        -DPNG_LIBRARY=$prefix/lib/libpng15.so.15.30.0 \     # <========= THIS
133        -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.so.6.3.20 \  # <========= THIS
134        -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
135        -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
136        -DCMAKE_C_FLAGS="$CFLAGS" \
137        -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
138        -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
139        $GEXT_XPDFTOOLS/packages/$package$version       # <=== -DGSDLFLAG_STATIC removed
140
141
142
143    (1) In the above, you could also set
144        -DFREETYPE_DIR=$prefix
145    in place of
146        -DGSDLFLAG_STATIC="$static_flag"
147
148    In that case it makes, xpdf-tools compilation find the "libfreetype.so" (no versioning at end) in our gs2-extension.
149    After successfully building, make sure to have sourced the gs2-extension's setup.bash before running "ldd" over the
150    generated xpdf-tools binaries, in order to let it use the $LD_LIBRARY_PATH we set to find our .so files.
151
152    (2) Note that there are no equivalent for ZLIB and LIBPNG: doing -DZLIB_DIR=$prefix or -DPNG_DIR=$prefix will be
153    ineffective, as neither are recognised by xpdf-tools' CMake set up.
154
155__________________________________________________________
156D. How we got Xpdf-Tools to compile using CASCADE-MAKE
157__________________________________________________________
158
159The process:
160
1611. We set up a CASCADE-MAKE GS2-extension "xpdf-tools" at trac.greenstone.org/browser/gs2-extensions/xpdf-tools/trunk/src
162Be aware that its lowercased "cascade-make" subfolder is an svn external, the original is at http://trac.greenstone.org/browser/other-projects/cascade-make/trunk/
163
164So far, this CASCADE-MAKE project includes the Xpdf-Tools source tarball, its helper packages zlib, libpng and freetype, as well as CMake to compile the Xpdf-Tools source code.
165The next step is to include JPEG and TIFF libraries too.
166
1672a. We downloaded the Xpdf-Tools source tarball, xpdf-4.00.tar.gz, from the xpdf site at https://www.xpdfreader.com/download.html under section "Download the Xpdf source code".
168
169The xpdf-tools source code tarball consists of the source for Xpf-tools and Xpdf (Xpdf-Reader). The Xpdf-Reader additionally requires Qt to build and run, but we don't want the Xpdf-Reader, just Xpdf-Tools.
170
171b. Compiling Xpdf-Tools fron source and running them requires the following packages and libraries, as per the xpdf-tools source code INSTALL file:
172
173To build xpdf-tools:
174- CMake 2.8.8 or newer
175
176Libraries to link against and used by xpdf-tools:
177- FreeType 2.0.5 or newer
178- libpng (for pdftoppm and pdftohtml)
179- zlib (for pdftoppm and pdftohtml)
180
181
1823. Compilation of xpdf-tools worked with CMake 3.11.4 on the linux resnet machine. However, CMake 3.11.3 itself failed to compile in the LSB environment and on the Mac Mountain Lion machine because of a version incompatibility between the older g++ installed there and the advanced version of CMake 3.11.4.
183
184CMake version 3.9.6 however is supposed to be compatible with older versions of g++, as per https://stackoverflow.com/questions/47886400/cmake-configure-error-in-3-10-1-but-not-in-3-9-6
185To avoid installing newer versions of g++ and clang in the LSB virtual machine and the Mac, I've shifted the CMake version back to version 3.9.6, still
186
187
1884a. On building xpdf-tools to work with dynamically linked libs found anywhere.
189
190If compiling xpdf-tools against dynamic linked libraries for these packages, then the basic CMake command in packages/CASECADE-MAKE/XPDFTOOLS.sh can look like:
191    cmake -DCMAKE_BUILD_TYPE=Release \
192        -DCMAKE_INSTALL_PREFIX=$prefix \
193        -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
194        -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
195        -DCMAKE_C_FLAGS="$CFLAGS" \
196        -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
197        -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
198        $GEXT_XPDFTOOLS/packages/$package$version   # Note: no -DGSDLFLAG_STATIC=...
199
200With the above, the xpdf-tools source code and its make files work out of the box.
201
2024b. On building xpdf-tools to work with the dynamically linked libs for freetype libpng, zlib that we produce when cascade-making the xpdf-tools gs2-extension.
203
204Since we're compiling up freetype, libpng and zlib packages as part of the Xpdf-Tools GS2-extension with CASCADE-MAKE, the next step was to compile xpdf-tools by dynamically linking against our .so files for these 3 libraries. To do so, XPDFTOOL.sh should have the following changes
205
206    (1) set up CFLAGS, CXXFLAGS, CPPFLAGS and LDFLAGS to help linkage of xpdf-tools find our .so versions of the necessary libs:
207
208    export CFLAGS="$CFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15"
209    export CPPFLAGS="$CPPFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15"
210    export CXXFLAGS="$CXXFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15"
211    export LDFLAGS="$LDFLAGS -L$GEXTXPDFTOOLS_INSTALLED/lib"
212
213    (2) The CMAKE command we run must pass the full paths to the actual .so library files (the ones with specific
214    versions in their files names) rather than the symbolically linked generally-named .so files (the latter won't
215    be found when building xpdf-tools and CMake will try to look for the .so library files elsewhere on the system):
216
217    cmake -DCMAKE_BUILD_TYPE=Release \
218        -DCMAKE_INSTALL_PREFIX=$prefix \
219        -DZLIB_LIBRARY=$prefix/lib/libz.so.1.2.7 \              # <========= NEW
220        -DPNG_LIBRARY=$prefix/lib/libpng15.so.15.30.0 \         # <========= NEW
221        -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.so.6.3.20 \      # <========= NEW
222        -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
223        -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
224        -DCMAKE_C_FLAGS="$CFLAGS" \
225        -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
226        -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
227        $GEXT_XPDFTOOLS/packages/$package$version   # Again: no -DGSDLFLAG_STATIC=...
228
229Further, the "xpdf/CMakeLists.txt" file within the xpdf-4.00.tar.gz source code tarball needs to be modified to refer to ZLIB_LIBRARIES when linking pdftops and pdftoppm. The linking commands for *both* the "pdftops" and "pdftoppm" executable targets in xpdf/CMakeLists.txt should look like the following,
230
231        target_link_libraries(pdftoppm goo fofi splash
232                        ${FREETYPE_LIBRARY} ${FREETYPE_OTHER_LIBS}
233                        ${DTYPE_LIBRARY}
234                        ${LCMS_LIBRARY}
235            ${ZLIB_LIBRARIES})              # <========= NEW
236
237
238    (3) Since CMakeLists.txt has been modified, we initially renamed the xpdf src tarball to gs-xpdf-4.00.tar.gz.
239    However, the current version works with the regular downloaded xpdf-4.00.tar.gz tarball. But after extraction,
240    XPDFTOOLS.sh copies across the custom packages/gs-CMakeLists.txt into the extracted tarball's xpdf subdirectory,
241    renaming the file as CMakeLists.txt (so the path to it becomes "xpdf-4.00/xpdf/CMakeLists.txt"). In XPDFTOOLS.sh:   
242
243    # patch the original tarball with our custom makefile
244    if [[ -d "$package$version/xpdf" && -f "gs-CMakeLists.txt" ]]; then
245        echo "*******************************************************************"
246        echo "Using our custom gs-CMakeLists.txt instead of the one included in $package$version"
247        echo "Renaming gs-CMakeLists.txt to $package$version/xpdf/CMakeLists.txt"
248        echo "*******************************************************************"
249
250        cp "gs-CMakeLists.txt" "$package$version/xpdf/CMakeLists.txt"
251    fi
252
253
2544c. On building static xpdf-tools binaries using the static *.a freetype libpng, zlib libraries that we produce when cascade-making the xpdf-tools gs2-extension.
255
256In order to compile up xpdf-tools *statically*, so that it builds against the static *.a libraries of freetype, libpng and zlib that we produce during the gs2-extension's CASCADE-MAKE process, we have to make further modifications.
257
258    (1) First, the XPDFTOOLS.sh cascade-make file should pass the full paths to the actual (non-symbolic link) .a file for each library.
259    A custom GS flag, GSDLFLAG_STATIC, is also invented in gs-CMakeLists.txt and assigned "-static for linux
260    and "-Bstatic" for Mac, to pass in during the linking stage of building xpdf-tools.
261
262    For Mac OSX, when -static is passed in for linking as on linux, this produced the error
263    "ld: library not found for -lcrt0.o" during the build of the xpdf-tools package. For information, see
264    https://stackoverflow.com/questions/3801011/ld-library-not-found-for-lcrt0-o-on-osx-10-6-with-gcc-clang-static-flag
265    The page https://stackoverflow.com/questions/844819/how-to-static-link-on-os-x mentions compiling
266    with -Bstatic on Mac OSX instead. To do so, XPDFTOOLS.sh passes in the GSDLFLAG_STATIC set to either
267    "-static" (for linux) or "-Bstatic" for darwin.
268    However the last mentioned stackoverflow page also says that -Bstatic is a no-op, and this appears to be
269    the case when "otool -L" is run over the generated xpdf-tools binaries: the binaries are all dynamically
270    linked. Although they're finding our .so files of freetype, libpng and zlib, they're not finding the .a
271    versions, even though XPDFTOOLS.sh tries to point gs-CMakeLists.txt to the correct .a files.
272
273    The new modifications to XPDFTOOLS.sh:
274
275    if [ "x$GSDLOS" == "xdarwin" ] ; then
276        static_flag=-Bstatic
277    else
278        static_flag=-static
279    fi
280
281    ...
282    cmake -DCMAKE_BUILD_TYPE=Release \
283        -DCMAKE_INSTALL_PREFIX=$prefix \
284        -DZLIB_LIBRARY=$prefix/lib/libz.a \                 # <========= MODIFIED TO .a
285        -DPNG_LIBRARY=$prefix/lib/libpng15.a \              # <========= MODIFIED TO .a
286        -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.a \          # <========= MODIFIED TO .a
287        -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
288        -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
289        -DCMAKE_C_FLAGS="$CFLAGS" \
290        -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
291        -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
292        -DGSDLFLAG_STATIC="$static_flag" \                  # <========= NEW
293        $GEXT_XPDFTOOLS/packages/$package$version
294
295    (2) Our customised gs-CMakeLists.txt file now checks for this flag GSDLFLAG_STATIC being set and, if it is,
296    uses it during the linking stage. As in (1) above, it will be set to "-static" for Linux and "-Bstatic" for Mac.
297   
298    - When the flag is set, the linking flags passed into each occurrence of target_link_libraries() in
299    gs-CMakeLists.txt is moreover manually written in the form of "-static -l<libs>" rather than using
300    the default linking commands inherited from the original CMakeLists.txt.
301    - If GSDLFLAG_STATIC isn't set, then we don't build statically, and the linking flags passed to each
302    target_link_libraries() are mostly the original ones.
303
304    For example,
305
306        if(GSDLFLAG_STATIC)
307            target_link_libraries(pdftoppm goo fofi splash
308              ${GSDLFLAG_STATIC} -lfreetype ${DTYPE_LIBRARY} ${LCMS_LIBRARY} -lz -lm -lc -lpthread)
309        else ()
310            target_link_libraries(pdftoppm goo fofi splash
311                            ${FREETYPE_LIBRARY} ${FREETYPE_OTHER_LIBS}
312                            ${DTYPE_LIBRARY}
313                           ${LCMS_LIBRARY}
314                    ${ZLIB_LIBRARIES})
315        endif ()
316
317    DETAILED EXPLANATION:
318    We found that when building *statically*, gs-CMakeLists.txt needed to NOT use the PNG_LIBRARIES, ZLIB_LIBRARIES
319    and FREETYPE_LIBRARY in its linker commands, target_link_libraries(), as doing so produced partially dynamic
320    xpdf-tools executables which were moreover BROKEN. They wouldn't run, and in fact attempting to run an xpdf-tool,
321    like "./pdftohtml", would produce a file not found error. Something like "bash: no such file or directory".
322
323    Online discussions mentioned that this generally happened when attempting to run 32 bit executables on 64 bit
324    linux when 32 bit loaders are not installed. (In such cases, the solution was to apt-get install some 32 bit package.)
325    However, our broken binaries were all 64 bit, as indicated when running the "file" command on them. However, their
326    being further partially dynamically linked executables didn't imply that they would be broken, as we were eventually
327    able to produce partially dynamic executables that did work, before solving static linking altogether.
328
329    The real issue was that including references to  ${FREETYPE_LIBRARY} ${FREETYPE_OTHER_LIBS}, ${PNG_LIBRARIES} and
330    ${ZLIB_LIBRARIES} in any target_link_libraries() resulted in the wrong linking command producing broken binaries.
331
332    Doing the regular target_link_libraries() in static mode results in building with
333    "-Wl,-Bstatic -lfreetype -lpng15 -lz -Wl,-Bdynamic -lpthread" at end of link line
334    and produces broken binaries for pdftohtml/pdftoppm/pdftops/pdftopng.
335
336    Note that PNG_LIBRARIES includes zlib/lz: "-lpng -lz", and along with freetype,
337    these are linked statically. However, Threads/lpthread is included as a dynamically
338    linked library instead of including a .a (regardless of whether it's appended
339    as -lpthread or Threads::Threads in the target_link_libraries()), contributing to
340    the pdfhtml binary produced being a partially static, partially dynamic one,
341    so a dynamic executable overall.
342
343    The order of dynamic .so files listed by ldd in the broken static binary of pdftohtml differs from
344    a manually statically linked working version of pdftohtml, and seems to be the only difference
345    between the two in ldd's output. Not using "-Wl,-Bstatic" and using -static (-Bstatic on Mac)
346    in its place creates a partially static dynamic executable that isn't broken, whereas
347    additionally removing "-Wl,-Bdynamic -lpthread" and replacing it with -lpthread
348    moreover produces a working pdftohtml that is a fully static linked executable.
349
350    The inclusion of the math lib and c lib (lm and lc) in the final link command
351    are to completely bypass the remaining .so dependencies that were present in
352    the executable and produce the fully static executable. The lm and lc libs were referenced
353    by all xpdf-tool binaries (as indicated when generating dynamic ones and running ldd over them)
354    but Dr Bainbridge said that -lm and -lc were some libs passed in by the compiler by default,
355    which would explain why explicitly setting them for some xpdftools and not other may not have
356    mattered.
357
358NOTES:
359Initial attempts at modifying gs-CMakeLists.txt for static compiling that proved to be unnecessary:
360
361    (1) Setting -static globally doesn't have a useful effect.
362
363    # We want to build static xpdf-tools binaries. See
364    # https://stackoverflow.com/questions/24648357/compiling-a-static-executable-with-cmake
365    # Want to make the min number of changes for building statically, so using the way
366    # below. Beware, must *append* "-static" to existing CMAKE_EXE_LINKER_FLAGS=LD_FLAGS
367    ##SET(CMAKE_FIND_LIBRARY_SUFFIXES ".a")
368    ##SET(BUILD_SHARED_LIBS OFF)
369    ##SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -static")
370
371    The above 3 lines just add a -static before the "-O2 -Wall -fPIC -rdynamic ..." during linking, such as below.
372    But they have no further effect on whether static building actually succeeds or not. The only effective static
373    linking command (for Linux so far) was to pass -static in the target_link_libraries() command followed by the
374    "-l<libname>" for each library in the correct order.
375
376----
377/usr/bin/c++  -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include  -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include/libpng15 -O3 -Wall -fPIC  -L/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib  -L/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib -static ***** <- HERE ****** -O2 -Wall -fPIC -rdynamic CMakeFiles/pdftohtml.dir/HTMLGen.cc.o CMakeFiles/pdftohtml.dir/SplashOutputDev.cc.o CMakeFiles/pdftohtml.dir/TextOutputDev.cc.o CMakeFiles/pdftohtml.dir/pdftohtml.cc.o CMakeFiles/xpdf_objs.dir/AcroForm.cc.o CMakeFiles/xpdf_objs.dir/Annot.cc.o CMakeFiles/xpdf_objs.dir/Array.cc.o CMakeFiles/xpdf_objs.dir/BuiltinFont.cc.o CMakeFiles/xpdf_objs.dir/BuiltinFontTables.cc.o CMakeFiles/xpdf_objs.dir/Catalog.cc.o CMakeFiles/xpdf_objs.dir/CharCodeToUnicode.cc.o CMakeFiles/xpdf_objs.dir/CMap.cc.o CMakeFiles/xpdf_objs.dir/Decrypt.cc.o CMakeFiles/xpdf_objs.dir/Dict.cc.o CMakeFiles/xpdf_objs.dir/Error.cc.o CMakeFiles/xpdf_objs.dir/FontEncodingTables.cc.o CMakeFiles/xpdf_objs.dir/Form.cc.o CMakeFiles/xpdf_objs.dir/Function.cc.o CMakeFiles/xpdf_objs.dir/Gfx.cc.o CMakeFiles/xpdf_objs.dir/GfxFont.cc.o CMakeFiles/xpdf_objs.dir/GfxState.cc.o CMakeFiles/xpdf_objs.dir/GlobalParams.cc.o CMakeFiles/xpdf_objs.dir/JArithmeticDecoder.cc.o CMakeFiles/xpdf_objs.dir/JBIG2Stream.cc.o CMakeFiles/xpdf_objs.dir/JPXStream.cc.o CMakeFiles/xpdf_objs.dir/Lexer.cc.o CMakeFiles/xpdf_objs.dir/Link.cc.o CMakeFiles/xpdf_objs.dir/NameToCharCode.cc.o CMakeFiles/xpdf_objs.dir/Object.cc.o CMakeFiles/xpdf_objs.dir/OptionalContent.cc.o CMakeFiles/xpdf_objs.dir/Outline.cc.o CMakeFiles/xpdf_objs.dir/OutputDev.cc.o CMakeFiles/xpdf_objs.dir/Page.cc.o CMakeFiles/xpdf_objs.dir/Parser.cc.o CMakeFiles/xpdf_objs.dir/PDFDoc.cc.o CMakeFiles/xpdf_objs.dir/PDFDocEncoding.cc.o CMakeFiles/xpdf_objs.dir/PSTokenizer.cc.o CMakeFiles/xpdf_objs.dir/SecurityHandler.cc.o CMakeFiles/xpdf_objs.dir/Stream.cc.o CMakeFiles/xpdf_objs.dir/TextString.cc.o CMakeFiles/xpdf_objs.dir/UnicodeMap.cc.o CMakeFiles/xpdf_objs.dir/UnicodeTypeTable.cc.o CMakeFiles/xpdf_objs.dir/UTF8.cc.o CMakeFiles/xpdf_objs.dir/XFAForm.cc.o CMakeFiles/xpdf_objs.dir/XRef.cc.o CMakeFiles/xpdf_objs.dir/Zoox.cc.o  -o pdftohtml ../goo/libgoo.a ../fofi/libfofi.a ../splash/libsplash.a -static -lfreetype -lpng -lz -lm -lc -lpthread
378----
379
380    (2) Threads::Threads instead of -lpthread results in a partially dynamic executable.
381
382    # The original, unmodified CMakeLists.txt was not set up sufficiently
383    # for static compilation of xpdf-tools. As a result, compile would first fail
384    # with errors about undefined refs to mutex / lpthread.
385    # When building xpdf-tools statically, need to add the following 2 lines as well
386    # as append "Threads::Threads" to the end of each "target_link_libraries(<list>)"
387    # See https://stackoverflow.com/questions/1620918/cmake-and-libpthread
388    # found googling cmake and "-lpthread" (pthread) after ERRORS to do with this, like:
389    #   undefined reference to `pthread_mutex_unlock'
390    ##set(THREADS_PREFER_PTHREAD_FLAG ON)
391    ##find_package(Threads REQUIRED)
392
393    In instances when compilation was successful, including the above 2 lines in combination with "Threads::Threads"
394    as the final argument to every target_link_libraries(...) occurrence in gs-CMakeLists.txt would only manage to
395    produce partially dynamically linked xpdftools binaries. (Depending on what the linking command was when building
396    Xpdf-Tools, the partially dynamically linked executables may work or may be broken. See explanation further above.)
397    We wanted fully statically linked binaries, for which we needed to pass in "-lpthread" as the trailing argument
398    to each target_link_libraries(...). So without either, compilation will fail. However, with "Threads::Threads"
399    the binaries weren't fully static, whereas with -lpthread the xpdftools executables were fully static as CMake no
400    longer tried to link against a dynamic Threads library.
401
402
403(5) To view the unmodified CMakeLists.txt included in the xpdf-4.00 source code tarball, untar it and look for its "xpdf/CMakeLists.txt" (not the toplevel file of the same name).
404Run a 'diff' against gs-CMakeLists.txt to see further differences, such as debug statements and comments. Most comments have been removed and placed into this readme file instead.
405
406
407(6) When CASCADE-MAKE is run on the xpdf-tools GS2-extension, it first compiles up CMake, needed to compile up xpdf-tools.
408Unlike the library packages like freetype, libpng and zlib that we also build for xpdf-tools as part of this gs2-extension, CMake's build products don't need to be included in the distribution tarball of our built xpdf-tools executables.
409
410There's a "move-cmake.sh" script in the xpdf-tools gs2-extension that can be run with the "away" and "back" options to move the CMake stuff out of the way (into a "devel" folder) after successfully building xpdf binaries and that can also be run to move them back if wanting to recompile.
411
412The script can be run manually, but it's also run by the extension:
413- packages/CASCADE-MAKE/XPDFTOOLS.sh runs "move-cmake.sh away" after xpdf-tools has been built, so that the extension's install location is ready for tarring up for distribution.
414- When recompiling the xpdf-tools extenion, the CASCADE-MAKE process will run packages/CASCADE-MAKE/CMAKE.sh file which in turn runs "move-cmake.sh back" if there's a prebuilt CMake which had earlier been moved out of the way.
415
416
417__________________________________________________________
418E. Getting more output when running CMake (verbosity)
419__________________________________________________________
420See https://www.linuxquestions.org/questions/programming-9/cmake-or-make-debug-output-show-command-624800/
421To turn on debugging:
422    export VERBOSE=1
423    ./CASCADE-MAKE.sh
424
425To turn off debugging, need to actually make VERBOSE undefined again (don't set it to 0):
426    export VERBOSE=
427    ./CASCADE-MAKE.sh
428
429
430__________________________________________________________
431F. APPENDIX - Useful links
432__________________________________________________________
433A. Helping CMake along. (Not all of this was necessary for compiling xpdftools statically, but they're generally useful links)
434
435https://github.com/SynoCommunity/spksrc/issues/1779
436https://stackoverflow.com/questions/1620918/cmake-and-libpthread
437https://cmake.org/cmake/help/v3.0/prop_tgt/LINK_FLAGS.html
438https://cmake.org/cmake/help/v3.11/command/target_link_libraries.html?highlight=target_link_libraries
439https://stackoverflow.com/questions/24648357/compiling-a-static-executable-with-cmake
440https://stackoverflow.com/questions/42815420/cmake-cant-find-my-static-libs
441https://cmake.org/cmake/help/v3.0/command/message.html
442https://stackoverflow.com/questions/30980383/cmake-compile-options-for-libpng
443    https://stackoverflow.com/questions/36220123/undefined-reference-to-png-set-longjmp-fn-when-compiling-pcl-source-file
444
445
446B. About the error "bash: no such file or directory" when run on a statically generated binary:
447
448https://askubuntu.com/questions/351827/unable-to-run-a-32-bit-program-on-64-bit-vm/353497#353497
449https://unix.stackexchange.com/questions/13391/getting-not-found-message-when-running-a-32-bit-binary-on-a-64-bit-system/13409#13409
450https://arstechnica.com/civis/viewtopic.php?f=16&t=1173118
451https://superuser.com/questions/344533/no-such-file-or-directory-error-in-bash-but-the-file-exists
452https://unix.stackexchange.com/questions/45277/executing-binary-file-file-not-found
453
454C. Other links
455
456https://unix.stackexchange.com/questions/279397/ldd-dont-find-path-how-to-add
457
458
459D. On why you can't build static binaries on Mac, but can build static libraries and link against them
460
461https://developer.apple.com/library/archive/qa/qa1118/_index.html (official page on how Mac doesn't support static binaries)
462https://stackoverflow.com/questions/3801011/ld-library-not-found-for-lcrt0-o-on-osx-10-6-with-gcc-clang-static-flag
463https://stackoverflow.com/questions/844819/how-to-static-link-on-os-x (mention of -Bstatic)
464https://www.allegro.cc/forums/thread/610923
465https://dropline.net/2015/10/static-linking-on-mac-os-x/ (explains that on Mac, .dylibs must be hidden for .a versions of libraries to be selected when linking)
466    This means that where possible we want to essentially do "--enable-static --disable-shared", or equivalent, when generating freetype, libz, libpng, libjpg, libtiff library files
467    so that Xpdf-Tools links against the .a files we generated rather than additional .dylib files
468
469http://www.simplesystems.org/libtiff/build.html
470configuration options for building libtiff. Want to turn off the compile process for libtiff producing tiff binaries, but there appears to be no such option.
471
472
473__________________________________________________________
474G. LIBJPEG and LIBTIFF
475__________________________________________________________
476
4771. Issues building LIBJPEG on 64 bit machines and the patch
478
479I copied the LIBJPEG package from http://trac.greenstone.org/browser/other-projects/realistic-books/trunk/packages (also at http://trac.greenstone.org/browser/gs2-extensions/ocr/trunk/packages/cmdline).
480
481    * Configuring out of the box produced the following error:
482       checking host system type... Invalid configuration `x86_64-unknown-linux-gnu': machine `x86_64-unknown' not recognized
483
484    * So that, as a consequence, when running make on the libjpeg package, make failed with the error:
485       ./libtool --mode=compile gcc -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include -fPIC  -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include  -I. -c ./jcapimin.c
486       make: ./libtool: Command not found
487       make: *** [jcapimin.lo] Error 127
488        Error encountered running *make * stage of ./CASCADE-MAKE/LIBJPEG.sh
489
490The same was true when I grabbed the libjpeg from sourceforge (https://sourceforge.net/projects/libjpeg/files/), which was also still version jpeg 6b from 2008.
491
492I found the following webpages discussing the above error messages:
493- https://unix.stackexchange.com/questions/80479/how-to-work-with-libtool
494- https://github.com/rwestlund/freesweep/issues/1
495- https://ubuntuforums.org/showthread.php?t=1232714
496- https://stackoverflow.com/questions/12828687/configure-fails-to-detect-proper-ld-on-a-64-bit-system-with-32-bit-userland
497- SOLUTION: https://sourceforge.net/p/libjpeg/bugs/12/
498
499However, the error only strikes when configure is run with --enable-static.
500
501Note also that contrary to the above pages, running configure with the additional options
502    --host=x86_64-linux-gnu --build=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-shared --enable-static
503did not help. Nor did adding the above flags get rid of configure attempting to work with host=x86_64-unknown(-unknown)-linux-gnu
504
505The SOLUTION, found when searching for the error message along with "enable-static", as it's the combination that is relevant, is described
506at https://sourceforge.net/p/libjpeg/bugs/12/
507
508which was to patch up the config.sub filed included in the jpeg-6b tarball, to also cover x86_64-* machines:
509        tahoe | i860 | x86_64-* | m32r | m68k | m68000 | m88k | ns32k | arc | arm \
510
511The above change is necessary because this libjpeg is outdated and has been superceded by other JPEG libraries, also discussed at https://sourceforge.net/p/libjpeg/bugs/12/
512I'm not sure if those libraries are compatible with XpdfTools however, so I'm sticking with libjpeg as long as I can get it to build and be recognised by XpdfTools.
513
514The solution is once more to have a patch file: CASCADE-MAKE/LIBJPEG.sh replaces the config.sub with in the jpeg-6b package after this is untarred with packages/gs-libjpeg-config.sub, which contains the patch.
515
516
5172. I followed the instructions at http://www.linuxfromscratch.org/blfs/view/6.3/general/libjpeg.html
518to try to build libjpeg with --enable-static and --enable-shared to produce both libjpeg.a and libjpeg.so.
519
520However, nothing I try gets it to generate a libjpeg.so. It seems to always produce a libjpeg.a in xpdf-tools/linux/lib
521regardless of whether CASCADE-MAKE/LIBJPEG.sh passes the --enable-static flag to the configure command or not, and regardless of whether --enable-shared is additionally or individually passed in.
522
523As a consequence, there's no  libjpeg.so file to set the -DJPEG_LIBRARY flag in XPDFTOOLS.sh to for when building xpdf-tools against dynamically linked libraries.
524
525I tried the various combinations with the lib jpeg-6b source tarballs from
526- sourceforge, https://sourceforge.net/projects/libjpeg/files/, the latest tarball of this was from 2008
527- http://www.linuxfromscratch.org/blfs/view/6.3/general/libjpeg.html, which was last updated in 2007
528- http://trac.greenstone.org/browser/other-projects/realistic-books/trunk/packages/jpeg-6b.tar.gz, which was added to trac in 2009 but is probably the 2008 or 2007 version too.
529
530
5313. Modifications for using TIFF and JPEG libraries when building Xpdf-Tools:
532   
533* CASCADE-MAKE.sh, replaced
534    PACKAGES="CMAKE LIBZ LIBPNG FREETYPE XPDFTOOLS"
535with
536    PACKAGES="CMAKE LIBZ LIBTIFF LIBPNG LIBJPEG FREETYPE XPDFTOOLS"
537
538
539* XPDFTOOLS.sh
540If compiling statically make sure the CMake command contains the following changes:
541        -DTIFF_INCLUDE_DIR=$prefix/include \        # <========== new
542        -DJPEG_INCLUDE_DIR=$prefix/include \        # <========== new
543        -DZLIB_LIBRARY=$prefix/lib/libz.a \
544        -DTIFF_LIBRARY=$prefix/lib/libtiff.a \      # <========== new
545        -DPNG_LIBRARY=$prefix/lib/libpng15.a \
546        -DJPEG_LIBRARY=$prefix/lib/libjpeg.a \      # <========== new
547        -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.a \
548        -DGSDLFLAG_STATIC="$static_flag" \
549
550
551
552The above flag names were discovered by deleting the untarred xpdf-4.00 folder.
553Then in a fresh terminal, source devel.bash from xpdf-tools and re-run CASCADE-MAKE.sh without the above modifications:
554
555    -- Found FreeType (new-style includes): /home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib/libfreetype.a
556    -- Found ZLIB: /home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib/libz.a (found version "1.2.8")
557    -- Found PNG: /home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib/libpng15.a (found version "1.2.50")
558    -- Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)
559    -- Could NOT find TIFF (missing: TIFF_LIBRARY TIFF_INCLUDE_DIR)
560    -- lcms2 not found
561    -- No Qt library found
562
563
564* packages/gs-CMakeLists.txt was modified again,
565
566    - this time to also pass:
567        -ltiff and -ljpeg to all target_link_libraries() commands that run when GSDLFLAG_STATIC is set
568    and
569        ${TIFF_LIBRARY} and ${JPEG_LIBRARY} to all target_link_libraries() commands that run when GSDLFLAG_STATIC is not set
570
571    - And to add in the include directories and defitions if JPEG/TIFF libraries were provided:
572        if (JPEG_FOUND)
573          include_directories("${JPEG_INCLUDE_DIR}")
574          add_definitions("${JPEG_DEFINITIONS}")
575          message(STATUS "@@@@@@@@@@@@@@@ JPEG_FOUND (include_dir ; include_dirs): ${JPEG_INCLUDE_DIR} ; ${JPEG_INCLUDE_DIRS}")
576        else ()
577          message(STATUS "@@@@@@@@@@@@@@@ NO JPEG_FOUND")
578        endif ()
579        if (TIFF_FOUND)
580          include_directories("${TIFF_INCLUDE_DIRS}")
581          add_definitions("${TIFF_DEFINITIONS}")
582          message(STATUS "@@@@@@@@@@@@@@@ TIFF_FOUND ${TIFF_INCLUDE_DIRS}")
583        else ()
584          message(STATUS "@@@@@@@@@@@@@@@ NO TIFF_FOUND")
585        endif ()
586
587    Note however that although gs-CMakeLists.txt now knows what the pluralised TIFF_INCLUDE_DIRS is (and TIFF_INCLUDE_DIR)
588    as for PNG and ZLIB, gs-CMakeLists.txt does not have a value for the pluralised JPEG_INCLUDE_DIRS, only the
589    JPEG_INCLUDE_DIRS set above. And both the CMAKE flags in XPDFTOOLS.sh for tiff and jpeg libs seem to have been setup
590    in the same way now. Not sure where these automatically assigned variables come from in order to check up on them.
591
592
593__________________________________________________________
594H. PDF2DOM: tried it out, but wasn't what we wanted
595__________________________________________________________
596Using PDFBox to convert a PDF to full HTML, both images and text and placed correctly with respect to each other, is tricky, see https://stackoverflow.com/questions/9671239/pdfbox-convert-a-pdf-to-text-or-html-including-images-from-the-pdf
597(Google: pdfbox to convert pdf to html with images)
598
599PDF2DOM tool (based on PDFBox) to convert PDF to HTML with images
600* http://cssbox.sourceforge.net/pdf2dom/documentation.php
601* Got the command line jar tool, PDFToHTML.jar version 1.7, from https://sourceforge.net/projects/cssbox/files/Pdf2DOM/
602* Further information and source code at https://github.com/radkovo/Pdf2Dom
603* API: http://cssbox.sourceforge.net/pdf2dom/api/index.html
604
605
6061. Running
607
608java -jar PDFToHTML.jar <infile> [<outfile>]
609
610    greenstone@machine-name:~/Downloads$ java -jar PDFToHTML.jar SampleDoc1.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
611
612
613It will output the page, but you'll see the following output indicating that the logger is not displaying anything:
614    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
615    SLF4J: Defaulting to no-operation (NOP) logger implementation
616    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
617
618See https://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
619
620To see error output download SLF4J simple jar, run as follows:
621
622    greenstone@machine-name:~/Downloads$ java -classpath slf4j-simple-1.7.25.jar:PDFToHTML.jar org.fit.pdfdom.PDFToHTML ApacheLicencePDFA.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
623
624The above is a MS Word produced PDF (archive format) and works fine: font folder generated containing the extracted fonts
625
626The following is a PDF produced from the same doc file by the latest libreoffice installed on Windows:
627    ApacheLicencePDFA_FromODT.pdf
628But running the same command on it produces the following font errors:
629
630greenstone@machine-name:~/Downloads$ java -classpath slf4j-simple-1.7.25.jar:PDFToHTML.jar org.fit.pdfdom.PDFToHTML ApacheLicencePDFA_FromODT.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
631[main] INFO org.reflections.Reflections - Reflections took 163 ms to scan 1 urls, producing 36 keys and 222 values
632[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
633[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
634[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
635[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
636
637Fonts get extracted if the source PDF was generated by MS Word's doc to PDF conversion. Fonts didn't get extracted from PDF upon conversion to HTML when libreoffice was used to convert a .doc to the source PDF.
638
6392. Check version of PDF
640https://www.codeproject.com/Questions/167550/How-to-check-different-versions-of-PDF
641
642
6433. pdf to html command line conversion open source
644https://stackoverflow.com/questions/8370014/how-to-convert-pdf-to-html
645
646"Download
647
648    pdfbox-2.0.3.jar
649    fontbox-2.0.3.jar
650    preflight-2.0.3.jar
651    xmpbox-2.0.3.jar
652    pdfbox-tools-2.0.3.jar
653    pdfbox-debugger-2.0.3.jar
654
655from http://pdfbox.apache.org/
656...
657
658PLEASE NOTE: Images do not get pushed to the HTML output."
659
660
6614. Need a way to check if PDF contains images, then use pdf2dom, else basic pdfbox conversion to html (less div tags with inline style markup)?
662https://stackoverflow.com/questions/46215879/count-images-in-pdf-using-pdfbox
663
664
665UNUSED
666Googled for: java tool convert pdf version
667* https://stackoverflow.com/questions/11137912/all-inclusive-tool-to-convert-different-types-of-documents-to-pdf
668* https://www.qoppa.com/pdfprocess/
669jPDFProcess – Java PDF Library to Create, Manipulate PDF
670(appears to be payware)
671* https://www.gnostice.com/nl_article.asp?id=95&t=How_to_Change_the_PDF_Version_of_a_Document
672How to Convert a PDF Document to an Older or Newer Version
673uses .NET
674* http://www.baeldung.com/pdf-conversions-java
675PDF Conversions in Java
676e.g. PDF to html and html to PDF
677
678
679__________________________________________________________
680
681greenstone@machine-name:~/Downloads$ java -classpath slf4j-simple-1.7.25.jar:PDFToHTML.jar org.fit.pdfdom.PDFToHTML SampleDoc1.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
682[main] INFO org.reflections.Reflections - Reflections took 153 ms to scan 1 urls, producing 36 keys and 222 values
683[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
684[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
685[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
686[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
687
688
689
690greenstone@machine-name:~/Downloads$ java -classpath Pdf2Dom/target/pdf2dom-1.8-SNAPSHOT.jar:pdfbox-app.jar:slf4j-jdk14-1.6.6.jar:log4j-over-slf4j-1.6.6.jar:slf4j-api-1.6.6.jar  org.fit.pdfdom.PDFToHTML SampleDoc1.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
691Exception in thread "main" java.lang.NoClassDefFoundError: org/mabb/fontverter/FontVerter
692    at org.fit.pdfdom.FontTable$Entry.loadTrueTypeFont(FontTable.java:178)
693    at org.fit.pdfdom.FontTable$Entry.getData(FontTable.java:147)
694    at org.fit.pdfdom.FontTable$Entry.isEntryValid(FontTable.java:161)
695    at org.fit.pdfdom.FontTable.addEntry(FontTable.java:48)
696    at org.fit.pdfdom.PDFBoxTree.processFontResources(PDFBoxTree.java:378)
697    at org.fit.pdfdom.PDFBoxTree.updateFontTable(PDFBoxTree.java:361)
698    at org.fit.pdfdom.PDFDomTree.updateFontTable(PDFDomTree.java:544)
699    at org.fit.pdfdom.PDFBoxTree.processPage(PDFBoxTree.java:206)
700    at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
701    at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
702    at org.fit.pdfdom.PDFDomTree.createDOM(PDFDomTree.java:218)
703    at org.fit.pdfdom.PDFDomTree.writeText(PDFDomTree.java:194)
704    at org.fit.pdfdom.PDFToHTML.main(PDFToHTML.java:77)
705Caused by: java.lang.ClassNotFoundException: org.mabb.fontverter.FontVerter
706    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
707    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
708    at java.security.AccessController.doPrivileged(Native Method)
709    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
710    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
711    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
712    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
713    ... 13 more
714greenstone@machine-name:~/Downloads$
Note: See TracBrowser for help on using the browser.