source: gs2-extensions/xpdf-tools/trunk/src/GS-README.txt@ 32259

Last change on this file since 32259 was 32259, checked in by sjm84, 6 years ago

Once more, redoing the folder structure of the xpdftools tarball once extracted

File size: 48.3 KB
RevLine 
[32248]1__________________________________________________________
[32249]2CONTENTS
3__________________________________________________________
4
5Xpdf-Tools related
[32248]6A. XPDF
[32249]7B. Mojo::DOM perl package for parsing HTML
8C. Compiling Xpdf-Tools: statically or dynamically linked
9D. How we got Xpdf-Tools to compile using CASCADE-MAKE
10E. Getting more output when running CMake (verbosity)
11F. APPENDIX - Useful links
12
13LIBJPEG related
14G. LIBJPEG and LIBTIFF
[32253]15- Moving from 2008's libjpeg version 6b to the newer 2018 version 9c
16- Issues building LIBJPEG version 6b on 64 bit machines and the patch
[32249]17
[32253]18H. Licensing information and making the distributable tarball
19
20I. PDF2DOM
[32250]21 unused, replaced by Xpdf-Tools' more suited pdftohtml capabilities
22
[32248]23__________________________________________________________
[32249]24A. XPDF
25__________________________________________________________
[32229]26
[32248]27Xpdf's last mod date is in 2017 and it includes its own pdftohtml utility tool, whereas the old "pdftohtml" tool that GS used was last updated 2013 (and itself made use of Xpdf, possible older versions).
[32229]28
[32248]29The tool takes a PDF and produces an HTML file for each page of the PDF, consisting of selectable HTML text overlaid on top of "screenshot" image of the page. (A page's text is not part of the screenshot.)
[32229]30
[32248]311. https://www.xpdfreader.com/download.html
[32229]32
[32248]33As per the Readme file found in the linux binary of Xpdf Tools, the Xpdf Viewer requires the qt toolkit, but not the Xpdf Tools. Have not read the Install file to confirm whether the same is the case for when compiling the command line tools. (But in that case, can't we just include the tools binary available for all 3 OS, instead of compiling on each platform)
34
35 - Using Xpdf's pdftohtml tool:
36 greenstone@bedrock:~/Downloads/xpdf-tools-linux-4.00/bin64$./pdftohtml -z 1.5 ~/Downloads/ApacheLicence.pdf licence
37
38 where licence is a folder.
39
40 - Using Xpdf's pdftotext tool:
41 greenstone@bedrock:~/Downloads/xpdf-tools-linux-4.00/bin64$./pdftotext -nopgbrk ~/Downloads/ApacheLicence.pdf ~/Downloads/ApacheLicence.txt
42
43 where the output text file must be specified with a full path name.
44
45
462. Documentation on Xpdf-Tools:
47- https://www.xpdfreader.com/support.html
48 for example, the pdftohtml man page: https://www.xpdfreader.com/pdftohtml-man.html
49- https://linux.die.net/man/5/xpdfrc
50(Configuration flags you can put into ~/.xpdfrc to use as defaults when running xpdf tool commands)
51
523. We're using Xpdf Tools version: xpdf-tools-linux-4.00
53
544. We started by working with the ready-made Xpdf-tools binaries available for download from the xpdf site for Win, Linux and Mac.
55
565. We're now moving to compiling up Xpdf-tools ourselves using CASCADE-MAKE, which we have so far got to successfully compile statically on Linux (LSB environment inclusive) to build working binaries.
57
58On Mac, I've been unable to get it to produce statically linked libraries: at this stage they're dynamically linked.
59
60
61__________________________________________________________
62B. Mojo::DOM perl package for parsing HTML
63__________________________________________________________
64
65XPDF's pdftohtml conversion of a single PDF document produces multiple HTML files: one for each page in the source PDF.
66We want the output to be "paged_html": a single HTML file that is sectionalised, each section representing a page of the
67original PDF.
68
69We need to be able to parse the many HTML pages produced by XPDF's pdftohtml conversion of a doc, in order to massage the output
70into the single sectionalised HTML file. For this we needed a HTML parser package for Perl.
71
721. Before Dr Bainbridge found Mojo::DOM, he looked at
73* https://en.wikipedia.org/wiki/Comparison_of_HTML_parsers
74* http://radar.oreilly.com/2014/02/parsing-html-with-perl-2.html
75
762. Main links for Mojo::DOM
77* https://mojolicious.org/perldoc/Mojo/DOM
78* https://metacpan.org/pod/Mojo::DOM
79 Dependencies: http://deps.cpantesters.org/?module=Mojo%3A%3ADOM;perl=latest
80
813. Once you've downloaded Mojo::DOM's src, follow Dr Bainbridge's sequence of commands for building the Mojo::DOM CPAN module of perl below.
82We'll be using this module to be used for parsing the HTML output by XPDF tool pdftohtml
83
84
85 mkdir cpan
86 2020 tar xvzf Mojolicious-7.84.tar.gz
87 2021 cd Mojolicious-7.84/
88 2028 perl ./Makefile.PL PREFIX=`pwd`/installed
89 2030 make
90 2031 make install
91 2033 cp -r installed/share/perl/5.18.2 ../cpan
92 cd ..
93 2044 export PERL5LIB=`pwd`/cpan
94
95 2053 emacs -nw test.pl
96
97 #!/usr/bin/perl -w
98 add in 'use v5.10;'
99
100 2054 chmod a+x test.pl
101 2055 ./test.pl
102
103
104__________________________________________________________
105C. Compiling Xpdf-Tools: statically or dynamically linked
106__________________________________________________________
107
[32249]108As explained in detail in section D below, we have a customised gs-CMakeLists.txt file which replaces the one in the xpf-4.00.tar.gz package's xpdf subfolder after this is untarred. This customised CMake configure/make file now allows us to compile xpdf-tools either statically (as we've now set it up for by default) or dynamically (as its CMake makefiles were originally set up for).
[32248]109
1101. To compile Xpdf-Tools statically, packages/CASCADE-MAKE/XPDFTOOLS.sh should contain:
111
[32229]112 cmake -DCMAKE_BUILD_TYPE=Release \
113 -DCMAKE_INSTALL_PREFIX=$prefix \
[32248]114 -DZLIB_LIBRARY=$prefix/lib/libz.a \ # <========= THIS
115 -DPNG_LIBRARY=$prefix/lib/libpng15.a \ # <========= THIS
116 -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.a \ # <========= THIS
[32229]117 -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
118 -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
119 -DCMAKE_C_FLAGS="$CFLAGS" \
120 -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
121 -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
[32248]122 -DGSDLFLAG_STATIC="$static_flag" \ # <========= THIS
[32229]123 $GEXT_XPDFTOOLS/packages/$package$version
124
[32248]125In place of FREETYPE_LIBRARY above, could also try the following,
126 -DFREETYPE_DIR=$prefix \
127but then check the built binaries by running "ldd" and "file" over them, to make sure they're not referencing any .so dynamic link libraries:
[32229]128
129
[32256]1302. To compile Xpdf-Tools dynamically and make it find *our* dynamically linked libraries for its helper packages zlib, libpng, libjpeg and freetype, edit packages/CASCADE-MAKE/XPDFTOOLS.sh to contain:
[32229]131
[32248]132 cmake -DCMAKE_BUILD_TYPE=Release \
133 -DCMAKE_INSTALL_PREFIX=$prefix \
134 -DZLIB_LIBRARY=$prefix/lib/libz.so.1.2.7 \ # <========= THIS
135 -DPNG_LIBRARY=$prefix/lib/libpng15.so.15.30.0 \ # <========= THIS
[32256]136 -DJPEG_LIBRARY=$prefix/lib/libjpeg.so.PUT_THE_NUMBER_HERE \ # <========= THIS AND ENTER THE .SO VERSION NUMBER
[32248]137 -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.so.6.3.20 \ # <========= THIS
138 -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
139 -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
140 -DCMAKE_C_FLAGS="$CFLAGS" \
141 -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
142 -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
143 $GEXT_XPDFTOOLS/packages/$package$version # <=== -DGSDLFLAG_STATIC removed
144
145
146
147 (1) In the above, you could also set
148 -DFREETYPE_DIR=$prefix
149 in place of
150 -DGSDLFLAG_STATIC="$static_flag"
151
152 In that case it makes, xpdf-tools compilation find the "libfreetype.so" (no versioning at end) in our gs2-extension.
153 After successfully building, make sure to have sourced the gs2-extension's setup.bash before running "ldd" over the
154 generated xpdf-tools binaries, in order to let it use the $LD_LIBRARY_PATH we set to find our .so files.
155
156 (2) Note that there are no equivalent for ZLIB and LIBPNG: doing -DZLIB_DIR=$prefix or -DPNG_DIR=$prefix will be
157 ineffective, as neither are recognised by xpdf-tools' CMake set up.
158
159__________________________________________________________
160D. How we got Xpdf-Tools to compile using CASCADE-MAKE
161__________________________________________________________
162
163The process:
164
1651. We set up a CASCADE-MAKE GS2-extension "xpdf-tools" at trac.greenstone.org/browser/gs2-extensions/xpdf-tools/trunk/src
166Be aware that its lowercased "cascade-make" subfolder is an svn external, the original is at http://trac.greenstone.org/browser/other-projects/cascade-make/trunk/
167
168So far, this CASCADE-MAKE project includes the Xpdf-Tools source tarball, its helper packages zlib, libpng and freetype, as well as CMake to compile the Xpdf-Tools source code.
169The next step is to include JPEG and TIFF libraries too.
170
1712a. We downloaded the Xpdf-Tools source tarball, xpdf-4.00.tar.gz, from the xpdf site at https://www.xpdfreader.com/download.html under section "Download the Xpdf source code".
172
173The xpdf-tools source code tarball consists of the source for Xpf-tools and Xpdf (Xpdf-Reader). The Xpdf-Reader additionally requires Qt to build and run, but we don't want the Xpdf-Reader, just Xpdf-Tools.
174
175b. Compiling Xpdf-Tools fron source and running them requires the following packages and libraries, as per the xpdf-tools source code INSTALL file:
176
177To build xpdf-tools:
178- CMake 2.8.8 or newer
179
180Libraries to link against and used by xpdf-tools:
181- FreeType 2.0.5 or newer
182- libpng (for pdftoppm and pdftohtml)
183- zlib (for pdftoppm and pdftohtml)
184
185
1863. Compilation of xpdf-tools worked with CMake 3.11.4 on the linux resnet machine. However, CMake 3.11.3 itself failed to compile in the LSB environment and on the Mac Mountain Lion machine because of a version incompatibility between the older g++ installed there and the advanced version of CMake 3.11.4.
187
188CMake version 3.9.6 however is supposed to be compatible with older versions of g++, as per https://stackoverflow.com/questions/47886400/cmake-configure-error-in-3-10-1-but-not-in-3-9-6
189To avoid installing newer versions of g++ and clang in the LSB virtual machine and the Mac, I've shifted the CMake version back to version 3.9.6, still
190
191
1924a. On building xpdf-tools to work with dynamically linked libs found anywhere.
193
194If compiling xpdf-tools against dynamic linked libraries for these packages, then the basic CMake command in packages/CASECADE-MAKE/XPDFTOOLS.sh can look like:
195 cmake -DCMAKE_BUILD_TYPE=Release \
196 -DCMAKE_INSTALL_PREFIX=$prefix \
197 -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
198 -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
199 -DCMAKE_C_FLAGS="$CFLAGS" \
200 -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
201 -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
202 $GEXT_XPDFTOOLS/packages/$package$version # Note: no -DGSDLFLAG_STATIC=...
203
204With the above, the xpdf-tools source code and its make files work out of the box.
205
2064b. On building xpdf-tools to work with the dynamically linked libs for freetype libpng, zlib that we produce when cascade-making the xpdf-tools gs2-extension.
207
208Since we're compiling up freetype, libpng and zlib packages as part of the Xpdf-Tools GS2-extension with CASCADE-MAKE, the next step was to compile xpdf-tools by dynamically linking against our .so files for these 3 libraries. To do so, XPDFTOOL.sh should have the following changes
209
[32256]210 (1) For linux, we need to build on the LSB environment.
211 We're moreover hoping that 32 bit binaries generated this way will work on both 32 and 64 bit machines.
[32248]212
[32256]213 However, on the 32 bit LSB environment, we additionally need to pass in "-march=i486|i586|i686" to gcc
214 Without it, things end up with the error
215 undefined reference to `__sync_add_and_fetch_4'
216 See https://stackoverflow.com/questions/130740/link-error-when-compiling-gcc-atomic-operation-in-32-bit-mode
217 which further explains that
218 "-march=" means "generate code for a particular CPU (and don't run on older CPUs)".
219 So, although uname -m returns i686 on the 32 bit linux VM that generates the nightly bins, we
220 still want to support i586 and i486 systems, so passing that in as the architecture
221 Don't do this for 64 bit systems.
222 And it seems it only needs to be set on CXXFLAGS in this case.
223
224 arch=`uname -m`
225 if [[ $arch = *"64"* ]]; then
226 arch=
227 else
228 echo "@@@ 32 bit machine, need to pass in -march=i486 to avoid certain linking errors"
229 arch="-march=i486"
230 fi
231 ...
232 export CXXFLAGS="$CXXFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15 $arch"
233
234 (2) set up CFLAGS, CXXFLAGS, CPPFLAGS and LDFLAGS to help linkage of xpdf-tools find our .so versions of the necessary libs:
235
[32248]236 export CFLAGS="$CFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15"
237 export CPPFLAGS="$CPPFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15"
[32256]238 export CXXFLAGS="$CXXFLAGS -I$GEXTXPDFTOOLS_INSTALLED/include -I$GEXTXPDFTOOLS_INSTALLED/include/libpng15 $arch"
[32248]239 export LDFLAGS="$LDFLAGS -L$GEXTXPDFTOOLS_INSTALLED/lib"
240
[32256]241 (3) The CMAKE command we run must pass the full paths to the actual .so library files (the ones with specific
[32248]242 versions in their files names) rather than the symbolically linked generally-named .so files (the latter won't
243 be found when building xpdf-tools and CMake will try to look for the .so library files elsewhere on the system):
244
245 cmake -DCMAKE_BUILD_TYPE=Release \
246 -DCMAKE_INSTALL_PREFIX=$prefix \
247 -DZLIB_LIBRARY=$prefix/lib/libz.so.1.2.7 \ # <========= NEW
248 -DPNG_LIBRARY=$prefix/lib/libpng15.so.15.30.0 \ # <========= NEW
249 -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.so.6.3.20 \ # <========= NEW
250 -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
251 -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
252 -DCMAKE_C_FLAGS="$CFLAGS" \
253 -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
254 -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
255 $GEXT_XPDFTOOLS/packages/$package$version # Again: no -DGSDLFLAG_STATIC=...
256
257Further, the "xpdf/CMakeLists.txt" file within the xpdf-4.00.tar.gz source code tarball needs to be modified to refer to ZLIB_LIBRARIES when linking pdftops and pdftoppm. The linking commands for *both* the "pdftops" and "pdftoppm" executable targets in xpdf/CMakeLists.txt should look like the following,
258
259 target_link_libraries(pdftoppm goo fofi splash
260 ${FREETYPE_LIBRARY} ${FREETYPE_OTHER_LIBS}
261 ${DTYPE_LIBRARY}
262 ${LCMS_LIBRARY}
263 ${ZLIB_LIBRARIES}) # <========= NEW
264
265
[32256]266 (4) Since CMakeLists.txt has been modified, we initially renamed the xpdf src tarball to gs-xpdf-4.00.tar.gz.
[32248]267 However, the current version works with the regular downloaded xpdf-4.00.tar.gz tarball. But after extraction,
268 XPDFTOOLS.sh copies across the custom packages/gs-CMakeLists.txt into the extracted tarball's xpdf subdirectory,
269 renaming the file as CMakeLists.txt (so the path to it becomes "xpdf-4.00/xpdf/CMakeLists.txt"). In XPDFTOOLS.sh:
270
271 # patch the original tarball with our custom makefile
272 if [[ -d "$package$version/xpdf" && -f "gs-CMakeLists.txt" ]]; then
273 echo "*******************************************************************"
274 echo "Using our custom gs-CMakeLists.txt instead of the one included in $package$version"
275 echo "Renaming gs-CMakeLists.txt to $package$version/xpdf/CMakeLists.txt"
276 echo "*******************************************************************"
277
278 cp "gs-CMakeLists.txt" "$package$version/xpdf/CMakeLists.txt"
279 fi
280
281
2824c. On building static xpdf-tools binaries using the static *.a freetype libpng, zlib libraries that we produce when cascade-making the xpdf-tools gs2-extension.
283
284In order to compile up xpdf-tools *statically*, so that it builds against the static *.a libraries of freetype, libpng and zlib that we produce during the gs2-extension's CASCADE-MAKE process, we have to make further modifications.
285
286 (1) First, the XPDFTOOLS.sh cascade-make file should pass the full paths to the actual (non-symbolic link) .a file for each library.
287 A custom GS flag, GSDLFLAG_STATIC, is also invented in gs-CMakeLists.txt and assigned "-static for linux
288 and "-Bstatic" for Mac, to pass in during the linking stage of building xpdf-tools.
289
290 For Mac OSX, when -static is passed in for linking as on linux, this produced the error
291 "ld: library not found for -lcrt0.o" during the build of the xpdf-tools package. For information, see
292 https://stackoverflow.com/questions/3801011/ld-library-not-found-for-lcrt0-o-on-osx-10-6-with-gcc-clang-static-flag
293 The page https://stackoverflow.com/questions/844819/how-to-static-link-on-os-x mentions compiling
294 with -Bstatic on Mac OSX instead. To do so, XPDFTOOLS.sh passes in the GSDLFLAG_STATIC set to either
295 "-static" (for linux) or "-Bstatic" for darwin.
296 However the last mentioned stackoverflow page also says that -Bstatic is a no-op, and this appears to be
297 the case when "otool -L" is run over the generated xpdf-tools binaries: the binaries are all dynamically
298 linked. Although they're finding our .so files of freetype, libpng and zlib, they're not finding the .a
299 versions, even though XPDFTOOLS.sh tries to point gs-CMakeLists.txt to the correct .a files.
300
301 The new modifications to XPDFTOOLS.sh:
302
303 if [ "x$GSDLOS" == "xdarwin" ] ; then
304 static_flag=-Bstatic
305 else
306 static_flag=-static
307 fi
308
309 ...
310 cmake -DCMAKE_BUILD_TYPE=Release \
311 -DCMAKE_INSTALL_PREFIX=$prefix \
312 -DZLIB_LIBRARY=$prefix/lib/libz.a \ # <========= MODIFIED TO .a
313 -DPNG_LIBRARY=$prefix/lib/libpng15.a \ # <========= MODIFIED TO .a
314 -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.a \ # <========= MODIFIED TO .a
315 -DCMAKE_DISABLE_FIND_PACKAGE_Qt4=1 \
316 -DCMAKE_DISABLE_FIND_PACKAGE_Qt5Widgets=1 \
317 -DCMAKE_C_FLAGS="$CFLAGS" \
318 -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
319 -DCMAKE_EXE_LINKER_FLAGS="$LDFLAGS" \
320 -DGSDLFLAG_STATIC="$static_flag" \ # <========= NEW
321 $GEXT_XPDFTOOLS/packages/$package$version
322
323 (2) Our customised gs-CMakeLists.txt file now checks for this flag GSDLFLAG_STATIC being set and, if it is,
324 uses it during the linking stage. As in (1) above, it will be set to "-static" for Linux and "-Bstatic" for Mac.
325
326 - When the flag is set, the linking flags passed into each occurrence of target_link_libraries() in
327 gs-CMakeLists.txt is moreover manually written in the form of "-static -l<libs>" rather than using
328 the default linking commands inherited from the original CMakeLists.txt.
329 - If GSDLFLAG_STATIC isn't set, then we don't build statically, and the linking flags passed to each
330 target_link_libraries() are mostly the original ones.
331
332 For example,
333
334 if(GSDLFLAG_STATIC)
335 target_link_libraries(pdftoppm goo fofi splash
336 ${GSDLFLAG_STATIC} -lfreetype ${DTYPE_LIBRARY} ${LCMS_LIBRARY} -lz -lm -lc -lpthread)
337 else ()
338 target_link_libraries(pdftoppm goo fofi splash
339 ${FREETYPE_LIBRARY} ${FREETYPE_OTHER_LIBS}
340 ${DTYPE_LIBRARY}
341 ${LCMS_LIBRARY}
342 ${ZLIB_LIBRARIES})
343 endif ()
344
345 DETAILED EXPLANATION:
346 We found that when building *statically*, gs-CMakeLists.txt needed to NOT use the PNG_LIBRARIES, ZLIB_LIBRARIES
347 and FREETYPE_LIBRARY in its linker commands, target_link_libraries(), as doing so produced partially dynamic
348 xpdf-tools executables which were moreover BROKEN. They wouldn't run, and in fact attempting to run an xpdf-tool,
349 like "./pdftohtml", would produce a file not found error. Something like "bash: no such file or directory".
350
351 Online discussions mentioned that this generally happened when attempting to run 32 bit executables on 64 bit
352 linux when 32 bit loaders are not installed. (In such cases, the solution was to apt-get install some 32 bit package.)
353 However, our broken binaries were all 64 bit, as indicated when running the "file" command on them. However, their
354 being further partially dynamically linked executables didn't imply that they would be broken, as we were eventually
355 able to produce partially dynamic executables that did work, before solving static linking altogether.
356
357 The real issue was that including references to ${FREETYPE_LIBRARY} ${FREETYPE_OTHER_LIBS}, ${PNG_LIBRARIES} and
358 ${ZLIB_LIBRARIES} in any target_link_libraries() resulted in the wrong linking command producing broken binaries.
359
360 Doing the regular target_link_libraries() in static mode results in building with
361 "-Wl,-Bstatic -lfreetype -lpng15 -lz -Wl,-Bdynamic -lpthread" at end of link line
362 and produces broken binaries for pdftohtml/pdftoppm/pdftops/pdftopng.
363
364 Note that PNG_LIBRARIES includes zlib/lz: "-lpng -lz", and along with freetype,
365 these are linked statically. However, Threads/lpthread is included as a dynamically
366 linked library instead of including a .a (regardless of whether it's appended
367 as -lpthread or Threads::Threads in the target_link_libraries()), contributing to
368 the pdfhtml binary produced being a partially static, partially dynamic one,
369 so a dynamic executable overall.
370
371 The order of dynamic .so files listed by ldd in the broken static binary of pdftohtml differs from
372 a manually statically linked working version of pdftohtml, and seems to be the only difference
373 between the two in ldd's output. Not using "-Wl,-Bstatic" and using -static (-Bstatic on Mac)
374 in its place creates a partially static dynamic executable that isn't broken, whereas
375 additionally removing "-Wl,-Bdynamic -lpthread" and replacing it with -lpthread
376 moreover produces a working pdftohtml that is a fully static linked executable.
377
378 The inclusion of the math lib and c lib (lm and lc) in the final link command
379 are to completely bypass the remaining .so dependencies that were present in
380 the executable and produce the fully static executable. The lm and lc libs were referenced
381 by all xpdf-tool binaries (as indicated when generating dynamic ones and running ldd over them)
382 but Dr Bainbridge said that -lm and -lc were some libs passed in by the compiler by default,
383 which would explain why explicitly setting them for some xpdftools and not other may not have
384 mattered.
385
386NOTES:
387Initial attempts at modifying gs-CMakeLists.txt for static compiling that proved to be unnecessary:
388
[32256]389 (i) Setting -static globally doesn't have a useful effect.
[32248]390
391 # We want to build static xpdf-tools binaries. See
392 # https://stackoverflow.com/questions/24648357/compiling-a-static-executable-with-cmake
393 # Want to make the min number of changes for building statically, so using the way
394 # below. Beware, must *append* "-static" to existing CMAKE_EXE_LINKER_FLAGS=LD_FLAGS
395 ##SET(CMAKE_FIND_LIBRARY_SUFFIXES ".a")
396 ##SET(BUILD_SHARED_LIBS OFF)
397 ##SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -static")
398
399 The above 3 lines just add a -static before the "-O2 -Wall -fPIC -rdynamic ..." during linking, such as below.
400 But they have no further effect on whether static building actually succeeds or not. The only effective static
401 linking command (for Linux so far) was to pass -static in the target_link_libraries() command followed by the
402 "-l<libname>" for each library in the correct order.
403
404----
[32249]405/usr/bin/c++ -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include/libpng15 -O3 -Wall -fPIC -L/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib -L/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib -static ***** <- HERE ****** -O2 -Wall -fPIC -rdynamic CMakeFiles/pdftohtml.dir/HTMLGen.cc.o CMakeFiles/pdftohtml.dir/SplashOutputDev.cc.o CMakeFiles/pdftohtml.dir/TextOutputDev.cc.o CMakeFiles/pdftohtml.dir/pdftohtml.cc.o CMakeFiles/xpdf_objs.dir/AcroForm.cc.o CMakeFiles/xpdf_objs.dir/Annot.cc.o CMakeFiles/xpdf_objs.dir/Array.cc.o CMakeFiles/xpdf_objs.dir/BuiltinFont.cc.o CMakeFiles/xpdf_objs.dir/BuiltinFontTables.cc.o CMakeFiles/xpdf_objs.dir/Catalog.cc.o CMakeFiles/xpdf_objs.dir/CharCodeToUnicode.cc.o CMakeFiles/xpdf_objs.dir/CMap.cc.o CMakeFiles/xpdf_objs.dir/Decrypt.cc.o CMakeFiles/xpdf_objs.dir/Dict.cc.o CMakeFiles/xpdf_objs.dir/Error.cc.o CMakeFiles/xpdf_objs.dir/FontEncodingTables.cc.o CMakeFiles/xpdf_objs.dir/Form.cc.o CMakeFiles/xpdf_objs.dir/Function.cc.o CMakeFiles/xpdf_objs.dir/Gfx.cc.o CMakeFiles/xpdf_objs.dir/GfxFont.cc.o CMakeFiles/xpdf_objs.dir/GfxState.cc.o CMakeFiles/xpdf_objs.dir/GlobalParams.cc.o CMakeFiles/xpdf_objs.dir/JArithmeticDecoder.cc.o CMakeFiles/xpdf_objs.dir/JBIG2Stream.cc.o CMakeFiles/xpdf_objs.dir/JPXStream.cc.o CMakeFiles/xpdf_objs.dir/Lexer.cc.o CMakeFiles/xpdf_objs.dir/Link.cc.o CMakeFiles/xpdf_objs.dir/NameToCharCode.cc.o CMakeFiles/xpdf_objs.dir/Object.cc.o CMakeFiles/xpdf_objs.dir/OptionalContent.cc.o CMakeFiles/xpdf_objs.dir/Outline.cc.o CMakeFiles/xpdf_objs.dir/OutputDev.cc.o CMakeFiles/xpdf_objs.dir/Page.cc.o CMakeFiles/xpdf_objs.dir/Parser.cc.o CMakeFiles/xpdf_objs.dir/PDFDoc.cc.o CMakeFiles/xpdf_objs.dir/PDFDocEncoding.cc.o CMakeFiles/xpdf_objs.dir/PSTokenizer.cc.o CMakeFiles/xpdf_objs.dir/SecurityHandler.cc.o CMakeFiles/xpdf_objs.dir/Stream.cc.o CMakeFiles/xpdf_objs.dir/TextString.cc.o CMakeFiles/xpdf_objs.dir/UnicodeMap.cc.o CMakeFiles/xpdf_objs.dir/UnicodeTypeTable.cc.o CMakeFiles/xpdf_objs.dir/UTF8.cc.o CMakeFiles/xpdf_objs.dir/XFAForm.cc.o CMakeFiles/xpdf_objs.dir/XRef.cc.o CMakeFiles/xpdf_objs.dir/Zoox.cc.o -o pdftohtml ../goo/libgoo.a ../fofi/libfofi.a ../splash/libsplash.a -static -lfreetype -lpng -lz -lm -lc -lpthread
[32248]406----
407
[32256]408 (ii) Threads::Threads instead of -lpthread results in a partially dynamic executable.
[32248]409
410 # The original, unmodified CMakeLists.txt was not set up sufficiently
411 # for static compilation of xpdf-tools. As a result, compile would first fail
412 # with errors about undefined refs to mutex / lpthread.
413 # When building xpdf-tools statically, need to add the following 2 lines as well
414 # as append "Threads::Threads" to the end of each "target_link_libraries(<list>)"
415 # See https://stackoverflow.com/questions/1620918/cmake-and-libpthread
416 # found googling cmake and "-lpthread" (pthread) after ERRORS to do with this, like:
417 # undefined reference to `pthread_mutex_unlock'
418 ##set(THREADS_PREFER_PTHREAD_FLAG ON)
419 ##find_package(Threads REQUIRED)
420
421 In instances when compilation was successful, including the above 2 lines in combination with "Threads::Threads"
422 as the final argument to every target_link_libraries(...) occurrence in gs-CMakeLists.txt would only manage to
423 produce partially dynamically linked xpdftools binaries. (Depending on what the linking command was when building
424 Xpdf-Tools, the partially dynamically linked executables may work or may be broken. See explanation further above.)
425 We wanted fully statically linked binaries, for which we needed to pass in "-lpthread" as the trailing argument
426 to each target_link_libraries(...). So without either, compilation will fail. However, with "Threads::Threads"
427 the binaries weren't fully static, whereas with -lpthread the xpdftools executables were fully static as CMake no
428 longer tried to link against a dynamic Threads library.
429
430
[32256]4315. To view the unmodified CMakeLists.txt included in the xpdf-4.00 source code tarball, untar it and look for its "xpdf/CMakeLists.txt" (not the toplevel file of the same name).
[32248]432Run a 'diff' against gs-CMakeLists.txt to see further differences, such as debug statements and comments. Most comments have been removed and placed into this readme file instead.
433
434
[32256]4356. When CASCADE-MAKE is run on the xpdf-tools GS2-extension, it first compiles up CMake, needed to compile up xpdf-tools.
[32248]436Unlike the library packages like freetype, libpng and zlib that we also build for xpdf-tools as part of this gs2-extension, CMake's build products don't need to be included in the distribution tarball of our built xpdf-tools executables.
437
438There's a "move-cmake.sh" script in the xpdf-tools gs2-extension that can be run with the "away" and "back" options to move the CMake stuff out of the way (into a "devel" folder) after successfully building xpdf binaries and that can also be run to move them back if wanting to recompile.
439
440The script can be run manually, but it's also run by the extension:
441- packages/CASCADE-MAKE/XPDFTOOLS.sh runs "move-cmake.sh away" after xpdf-tools has been built, so that the extension's install location is ready for tarring up for distribution.
442- When recompiling the xpdf-tools extenion, the CASCADE-MAKE process will run packages/CASCADE-MAKE/CMAKE.sh file which in turn runs "move-cmake.sh back" if there's a prebuilt CMake which had earlier been moved out of the way.
443
444
[32249]445__________________________________________________________
446E. Getting more output when running CMake (verbosity)
447__________________________________________________________
448See https://www.linuxquestions.org/questions/programming-9/cmake-or-make-debug-output-show-command-624800/
449To turn on debugging:
450 export VERBOSE=1
451 ./CASCADE-MAKE.sh
452
453To turn off debugging, need to actually make VERBOSE undefined again (don't set it to 0):
454 export VERBOSE=
455 ./CASCADE-MAKE.sh
456
457
458__________________________________________________________
459F. APPENDIX - Useful links
460__________________________________________________________
461A. Helping CMake along. (Not all of this was necessary for compiling xpdftools statically, but they're generally useful links)
462
463https://github.com/SynoCommunity/spksrc/issues/1779
464https://stackoverflow.com/questions/1620918/cmake-and-libpthread
465https://cmake.org/cmake/help/v3.0/prop_tgt/LINK_FLAGS.html
466https://cmake.org/cmake/help/v3.11/command/target_link_libraries.html?highlight=target_link_libraries
467https://stackoverflow.com/questions/24648357/compiling-a-static-executable-with-cmake
468https://stackoverflow.com/questions/42815420/cmake-cant-find-my-static-libs
469https://cmake.org/cmake/help/v3.0/command/message.html
470https://stackoverflow.com/questions/30980383/cmake-compile-options-for-libpng
471 https://stackoverflow.com/questions/36220123/undefined-reference-to-png-set-longjmp-fn-when-compiling-pcl-source-file
472
473
474B. About the error "bash: no such file or directory" when run on a statically generated binary:
475
476https://askubuntu.com/questions/351827/unable-to-run-a-32-bit-program-on-64-bit-vm/353497#353497
477https://unix.stackexchange.com/questions/13391/getting-not-found-message-when-running-a-32-bit-binary-on-a-64-bit-system/13409#13409
478https://arstechnica.com/civis/viewtopic.php?f=16&t=1173118
479https://superuser.com/questions/344533/no-such-file-or-directory-error-in-bash-but-the-file-exists
480https://unix.stackexchange.com/questions/45277/executing-binary-file-file-not-found
481
482C. Other links
483
484https://unix.stackexchange.com/questions/279397/ldd-dont-find-path-how-to-add
485
486
[32251]487D. On why you can't build static binaries on Mac, but can build static libraries and link against them
488
489https://developer.apple.com/library/archive/qa/qa1118/_index.html (official page on how Mac doesn't support static binaries)
490https://stackoverflow.com/questions/3801011/ld-library-not-found-for-lcrt0-o-on-osx-10-6-with-gcc-clang-static-flag
491https://stackoverflow.com/questions/844819/how-to-static-link-on-os-x (mention of -Bstatic)
492https://www.allegro.cc/forums/thread/610923
[32252]493https://stackoverflow.com/questions/5259249/creating-static-mac-os-x-c-build (has some other suggestions)
494 http://www.network-theory.co.uk/docs/gccintro/gccintro_79.html
495Dead end: https://nelsonslog.wordpress.com/2013/04/24/macos-doesnt-support-static-binaries/
496https://dropline.net/2015/10/static-linking-on-mac-os-x/
497 explains that on Mac, .dylibs must be hidden for .a versions of libraries to be selected when linking
498 This must be true for non-system dylibs too.
499 This means that where possible we want to essentially do "--enable-static --disable-shared", or equivalent,
500 when generating freetype, libz, libpng, libjpg, libtiff library files, so that Xpdf-Tools links against the
501 .a files we generated rather than additional .dylib files
[32251]502
503http://www.simplesystems.org/libtiff/build.html
504configuration options for building libtiff. Want to turn off the compile process for libtiff producing tiff binaries, but there appears to be no such option.
505
506
[32249]507__________________________________________________________
508G. LIBJPEG and LIBTIFF
509__________________________________________________________
510
[32253]5111. The first version of LIBJPEG to work out was version 6b, which required some patching up before it could be built, see point 2 below.
512Besides the fact that version 6b needed patching up, it was also from 2008. I've now found a version of libjpeg from Jan 2018, called "jpegsrc.v9c.tar.gz"
513which was downloadable from www.ijg.org at http://www.ijg.org/files/jpegsrc.v9c.tar.gz. Version 9c can build both static and dynamically linked libraries of
514libjpeg, though we only want the former. (The older version 6b could only generate the static libjpeg.a library file, and contrary to online instructions.)
[32249]515
[32253]516As needed to be done with the older 6b version, this tarball was renamed to jpeg-9c.tar.gz to fit the naming pattern of its folder once extracted.
517
518There was an incompatibility between the existing CASCADE-MAKE/LIBJPEG.sh and the Makefile generated by configuring the Makefile.in/.am in the jpeg-9c tarball.
519The LIBJPEG.sh would run "make install-lib" at the end, to install the libjpeg.a in the lib folder and to install 4 header files. This is as per the install.txt
520instructions in the older and current version of jpeg src tarball. However, the header files never got installed when doing so, whether in version 6b or the
521current 9c. And install-lib is not a recognised target in 9c's Makefile, where the target is install-libLTLIBRARIES. So LIBJPEG.sh has been modified to use this
522target name and to moreover copy over the header files (even though they weren't necessary when compiling xpdftools against the libjpeg 6b library previously and
523possibly now with 9c).
524
525Since we want to only generate libjpeg.a and not the .so/.dylib dynamically linked versions, the latter is turned off during configure by passing --disable-shared.
526
527A final change made to LIBJPEG.sh was to undo it copying over the patch file "gs-libjpeg-config.sub" into the extracted jpeg tarball, since the patch was only
528necessary for libjpeg version 6b and not for 9c. These steps have been commented out in LIBJPEG.sh now.
529
530
5312. Issues building LIBJPEG VERSION 6b on 64 bit machines and the patch
532
533LIBJPEG version 6b is from 2008.
534
[32249]535I copied the LIBJPEG package from http://trac.greenstone.org/browser/other-projects/realistic-books/trunk/packages (also at http://trac.greenstone.org/browser/gs2-extensions/ocr/trunk/packages/cmdline).
536
537 * Configuring out of the box produced the following error:
538 checking host system type... Invalid configuration `x86_64-unknown-linux-gnu': machine `x86_64-unknown' not recognized
539
540 * So that, as a consequence, when running make on the libjpeg package, make failed with the error:
541 ./libtool --mode=compile gcc -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include -fPIC -I/home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/include -I. -c ./jcapimin.c
542 make: ./libtool: Command not found
543 make: *** [jcapimin.lo] Error 127
544 Error encountered running *make * stage of ./CASCADE-MAKE/LIBJPEG.sh
545
546The same was true when I grabbed the libjpeg from sourceforge (https://sourceforge.net/projects/libjpeg/files/), which was also still version jpeg 6b from 2008.
547
548I found the following webpages discussing the above error messages:
549- https://unix.stackexchange.com/questions/80479/how-to-work-with-libtool
550- https://github.com/rwestlund/freesweep/issues/1
551- https://ubuntuforums.org/showthread.php?t=1232714
552- https://stackoverflow.com/questions/12828687/configure-fails-to-detect-proper-ld-on-a-64-bit-system-with-32-bit-userland
553- SOLUTION: https://sourceforge.net/p/libjpeg/bugs/12/
554
555However, the error only strikes when configure is run with --enable-static.
556
557Note also that contrary to the above pages, running configure with the additional options
558 --host=x86_64-linux-gnu --build=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-shared --enable-static
559did not help. Nor did adding the above flags get rid of configure attempting to work with host=x86_64-unknown(-unknown)-linux-gnu
560
561The SOLUTION, found when searching for the error message along with "enable-static", as it's the combination that is relevant, is described
562at https://sourceforge.net/p/libjpeg/bugs/12/
563
564which was to patch up the config.sub filed included in the jpeg-6b tarball, to also cover x86_64-* machines:
565 tahoe | i860 | x86_64-* | m32r | m68k | m68000 | m88k | ns32k | arc | arm \
566
567The above change is necessary because this libjpeg is outdated and has been superceded by other JPEG libraries, also discussed at https://sourceforge.net/p/libjpeg/bugs/12/
568I'm not sure if those libraries are compatible with XpdfTools however, so I'm sticking with libjpeg as long as I can get it to build and be recognised by XpdfTools.
569
570The solution is once more to have a patch file: CASCADE-MAKE/LIBJPEG.sh replaces the config.sub with in the jpeg-6b package after this is untarred with packages/gs-libjpeg-config.sub, which contains the patch.
571
572
5732. I followed the instructions at http://www.linuxfromscratch.org/blfs/view/6.3/general/libjpeg.html
574to try to build libjpeg with --enable-static and --enable-shared to produce both libjpeg.a and libjpeg.so.
575
576However, nothing I try gets it to generate a libjpeg.so. It seems to always produce a libjpeg.a in xpdf-tools/linux/lib
577regardless of whether CASCADE-MAKE/LIBJPEG.sh passes the --enable-static flag to the configure command or not, and regardless of whether --enable-shared is additionally or individually passed in.
578
579As a consequence, there's no libjpeg.so file to set the -DJPEG_LIBRARY flag in XPDFTOOLS.sh to for when building xpdf-tools against dynamically linked libraries.
580
581I tried the various combinations with the lib jpeg-6b source tarballs from
582- sourceforge, https://sourceforge.net/projects/libjpeg/files/, the latest tarball of this was from 2008
583- http://www.linuxfromscratch.org/blfs/view/6.3/general/libjpeg.html, which was last updated in 2007
584- http://trac.greenstone.org/browser/other-projects/realistic-books/trunk/packages/jpeg-6b.tar.gz, which was added to trac in 2009 but is probably the 2008 or 2007 version too.
585
586
5873. Modifications for using TIFF and JPEG libraries when building Xpdf-Tools:
588
589* CASCADE-MAKE.sh, replaced
590 PACKAGES="CMAKE LIBZ LIBPNG FREETYPE XPDFTOOLS"
591with
592 PACKAGES="CMAKE LIBZ LIBTIFF LIBPNG LIBJPEG FREETYPE XPDFTOOLS"
593
594
595* XPDFTOOLS.sh
596If compiling statically make sure the CMake command contains the following changes:
597 -DTIFF_INCLUDE_DIR=$prefix/include \ # <========== new
598 -DJPEG_INCLUDE_DIR=$prefix/include \ # <========== new
599 -DZLIB_LIBRARY=$prefix/lib/libz.a \
600 -DTIFF_LIBRARY=$prefix/lib/libtiff.a \ # <========== new
601 -DPNG_LIBRARY=$prefix/lib/libpng15.a \
602 -DJPEG_LIBRARY=$prefix/lib/libjpeg.a \ # <========== new
603 -DFREETYPE_LIBRARY=$prefix/lib/libfreetype.a \
604 -DGSDLFLAG_STATIC="$static_flag" \
605
606
607
608The above flag names were discovered by deleting the untarred xpdf-4.00 folder.
609Then in a fresh terminal, source devel.bash from xpdf-tools and re-run CASCADE-MAKE.sh without the above modifications:
610
611 -- Found FreeType (new-style includes): /home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib/libfreetype.a
612 -- Found ZLIB: /home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib/libz.a (found version "1.2.8")
613 -- Found PNG: /home/greenstone/gs3-svn-26Mar2018/gs2build/ext/xpdf-tools/linux/lib/libpng15.a (found version "1.2.50")
614 -- Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)
615 -- Could NOT find TIFF (missing: TIFF_LIBRARY TIFF_INCLUDE_DIR)
616 -- lcms2 not found
617 -- No Qt library found
618
619
620* packages/gs-CMakeLists.txt was modified again,
621
622 - this time to also pass:
623 -ltiff and -ljpeg to all target_link_libraries() commands that run when GSDLFLAG_STATIC is set
624 and
625 ${TIFF_LIBRARY} and ${JPEG_LIBRARY} to all target_link_libraries() commands that run when GSDLFLAG_STATIC is not set
626
627 - And to add in the include directories and defitions if JPEG/TIFF libraries were provided:
628 if (JPEG_FOUND)
629 include_directories("${JPEG_INCLUDE_DIR}")
630 add_definitions("${JPEG_DEFINITIONS}")
631 message(STATUS "@@@@@@@@@@@@@@@ JPEG_FOUND (include_dir ; include_dirs): ${JPEG_INCLUDE_DIR} ; ${JPEG_INCLUDE_DIRS}")
632 else ()
633 message(STATUS "@@@@@@@@@@@@@@@ NO JPEG_FOUND")
634 endif ()
635 if (TIFF_FOUND)
636 include_directories("${TIFF_INCLUDE_DIRS}")
637 add_definitions("${TIFF_DEFINITIONS}")
638 message(STATUS "@@@@@@@@@@@@@@@ TIFF_FOUND ${TIFF_INCLUDE_DIRS}")
639 else ()
640 message(STATUS "@@@@@@@@@@@@@@@ NO TIFF_FOUND")
641 endif ()
642
643 Note however that although gs-CMakeLists.txt now knows what the pluralised TIFF_INCLUDE_DIRS is (and TIFF_INCLUDE_DIR)
644 as for PNG and ZLIB, gs-CMakeLists.txt does not have a value for the pluralised JPEG_INCLUDE_DIRS, only the
645 JPEG_INCLUDE_DIRS set above. And both the CMAKE flags in XPDFTOOLS.sh for tiff and jpeg libs seem to have been setup
646 in the same way now. Not sure where these automatically assigned variables come from in order to check up on them.
647
[32253]648__________________________________________________________
649H. Licensing information and making the distributable tarball
650__________________________________________________________
[32249]651
[32253]652XpdfTools' README lists which files need to be included as per its license when redistributing xpdf-tools binaries.
653
[32258]654Running "./CASCADE-MAKE.sh makedist" assembles a custom whitelist of files to include in the distribution tarball of the xpdf-tools we compile up.
[32253]655
[32258]656The files and folders into the distribution tarball xpdf-tools-GSDLOS.tar.gz are:
657- the GSDLOS/bin/pdf* statically linked binaries (or dynamic executables linked against mostly static libraries in the case of Macs),
[32259]658- the GSDLOS/man folder as well as the further compulsory files README, COPYING and COPYING3 as required for xpdf-tools' license.
[32253]659
[32258]660Beware that the cascade-make makedist function always maintains the directory structure of folders but also files included in the whitelist.
[32259]661So when untarred, the folder xpdf-tools is produced with subfolders like linux/bin (containing the pdf* binaries), a linux/man subfolder
662and files README, COPYING, COPYING3.
[32258]663
664
[32250]665__________________________________________________________
[32253]666I. PDF2DOM: tried it out, but wasn't what we wanted
[32250]667__________________________________________________________
668Using PDFBox to convert a PDF to full HTML, both images and text and placed correctly with respect to each other, is tricky, see https://stackoverflow.com/questions/9671239/pdfbox-convert-a-pdf-to-text-or-html-including-images-from-the-pdf
669(Google: pdfbox to convert pdf to html with images)
670
671PDF2DOM tool (based on PDFBox) to convert PDF to HTML with images
672* http://cssbox.sourceforge.net/pdf2dom/documentation.php
673* Got the command line jar tool, PDFToHTML.jar version 1.7, from https://sourceforge.net/projects/cssbox/files/Pdf2DOM/
674* Further information and source code at https://github.com/radkovo/Pdf2Dom
675* API: http://cssbox.sourceforge.net/pdf2dom/api/index.html
676
677
6781. Running
679
680java -jar PDFToHTML.jar <infile> [<outfile>]
681
682 greenstone@machine-name:~/Downloads$ java -jar PDFToHTML.jar SampleDoc1.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
683
684
685It will output the page, but you'll see the following output indicating that the logger is not displaying anything:
686 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
687 SLF4J: Defaulting to no-operation (NOP) logger implementation
688 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
689
690See https://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
691
692To see error output download SLF4J simple jar, run as follows:
693
694 greenstone@machine-name:~/Downloads$ java -classpath slf4j-simple-1.7.25.jar:PDFToHTML.jar org.fit.pdfdom.PDFToHTML ApacheLicencePDFA.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
695
696The above is a MS Word produced PDF (archive format) and works fine: font folder generated containing the extracted fonts
697
698The following is a PDF produced from the same doc file by the latest libreoffice installed on Windows:
699 ApacheLicencePDFA_FromODT.pdf
700But running the same command on it produces the following font errors:
701
702greenstone@machine-name:~/Downloads$ java -classpath slf4j-simple-1.7.25.jar:PDFToHTML.jar org.fit.pdfdom.PDFToHTML ApacheLicencePDFA_FromODT.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
703[main] INFO org.reflections.Reflections - Reflections took 163 ms to scan 1 urls, producing 36 keys and 222 values
704[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
705[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
706[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
707[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
708
709Fonts get extracted if the source PDF was generated by MS Word's doc to PDF conversion. Fonts didn't get extracted from PDF upon conversion to HTML when libreoffice was used to convert a .doc to the source PDF.
710
7112. Check version of PDF
712https://www.codeproject.com/Questions/167550/How-to-check-different-versions-of-PDF
713
714
7153. pdf to html command line conversion open source
716https://stackoverflow.com/questions/8370014/how-to-convert-pdf-to-html
717
718"Download
719
720 pdfbox-2.0.3.jar
721 fontbox-2.0.3.jar
722 preflight-2.0.3.jar
723 xmpbox-2.0.3.jar
724 pdfbox-tools-2.0.3.jar
725 pdfbox-debugger-2.0.3.jar
726
727from http://pdfbox.apache.org/
728...
729
730PLEASE NOTE: Images do not get pushed to the HTML output."
731
732
7334. Need a way to check if PDF contains images, then use pdf2dom, else basic pdfbox conversion to html (less div tags with inline style markup)?
734https://stackoverflow.com/questions/46215879/count-images-in-pdf-using-pdfbox
735
736
737UNUSED
738Googled for: java tool convert pdf version
739* https://stackoverflow.com/questions/11137912/all-inclusive-tool-to-convert-different-types-of-documents-to-pdf
740* https://www.qoppa.com/pdfprocess/
741jPDFProcess – Java PDF Library to Create, Manipulate PDF
742(appears to be payware)
743* https://www.gnostice.com/nl_article.asp?id=95&t=How_to_Change_the_PDF_Version_of_a_Document
744How to Convert a PDF Document to an Older or Newer Version
745uses .NET
746* http://www.baeldung.com/pdf-conversions-java
747PDF Conversions in Java
748e.g. PDF to html and html to PDF
749
750
751__________________________________________________________
752
753greenstone@machine-name:~/Downloads$ java -classpath slf4j-simple-1.7.25.jar:PDFToHTML.jar org.fit.pdfdom.PDFToHTML SampleDoc1.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
754[main] INFO org.reflections.Reflections - Reflections took 153 ms to scan 1 urls, producing 36 keys and 222 values
755[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
756[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
757[main] WARN org.fit.pdfdom.FontTable - Error loading font 'BAAAAA+Georgia' Message: FontVerter could not detect the input font's type. class java.io.IOException
758[main] WARN org.fit.pdfdom.FontTable - Error loading font 'CAAAAA+Georgia-Bold' Message: FontVerter could not detect the input font's type. class java.io.IOException
759
760
761
762greenstone@machine-name:~/Downloads$ java -classpath Pdf2Dom/target/pdf2dom-1.8-SNAPSHOT.jar:pdfbox-app.jar:slf4j-jdk14-1.6.6.jar:log4j-over-slf4j-1.6.6.jar:slf4j-api-1.6.6.jar org.fit.pdfdom.PDFToHTML SampleDoc1.pdf -im=SAVE_TO_DIR -idir=/home/greenstone/Downloads/tmp1 -fm=SAVE_TO_DIR -fdir=/home/greenstone/Downloads/tmp2
763Exception in thread "main" java.lang.NoClassDefFoundError: org/mabb/fontverter/FontVerter
764 at org.fit.pdfdom.FontTable$Entry.loadTrueTypeFont(FontTable.java:178)
765 at org.fit.pdfdom.FontTable$Entry.getData(FontTable.java:147)
766 at org.fit.pdfdom.FontTable$Entry.isEntryValid(FontTable.java:161)
767 at org.fit.pdfdom.FontTable.addEntry(FontTable.java:48)
768 at org.fit.pdfdom.PDFBoxTree.processFontResources(PDFBoxTree.java:378)
769 at org.fit.pdfdom.PDFBoxTree.updateFontTable(PDFBoxTree.java:361)
770 at org.fit.pdfdom.PDFDomTree.updateFontTable(PDFDomTree.java:544)
771 at org.fit.pdfdom.PDFBoxTree.processPage(PDFBoxTree.java:206)
772 at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
773 at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
774 at org.fit.pdfdom.PDFDomTree.createDOM(PDFDomTree.java:218)
775 at org.fit.pdfdom.PDFDomTree.writeText(PDFDomTree.java:194)
776 at org.fit.pdfdom.PDFToHTML.main(PDFToHTML.java:77)
777Caused by: java.lang.ClassNotFoundException: org.mabb.fontverter.FontVerter
778 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
779 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
780 at java.security.AccessController.doPrivileged(Native Method)
781 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
782 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
783 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
784 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
785 ... 13 more
786greenstone@machine-name:~/Downloads$
Note: See TracBrowser for help on using the repository browser.