Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

pdftohtml.1@ 32205

Last change on this file since 32205 was 32205, checked in by ak19, 6 years ago

First set of commits to do with implementing the new 'paged_html' output option of PDFPlugin that uses using xpdftools' new pdftohtml. So far tested only on Linux (64 bit), but things work there so I'm optimistically committing the changes since they work. 2. Committing the pre-built Linux binaries of XPDFtools for both 32 and 64 bit built by the XPDF group. 2. To use the correct bitness variant of xpdftools, setup.bash now exports the BITNESS env var, consulted by gsConvert.pl. 3. All the perl code changes to do with using xpdf tools' pdftohtml to generate paged_html and feed it in the desired form into GS(3): gsConvert.pl, PDFPlugin.pm and its parent ConvertBinaryPFile.pm have been modified to make it all work. xpdftools' pdftohtml generates a folder containing an html file and a screenshot for each page in a PDF (as well as an index.html linking to each page's html). However, we want a single html file that contains each individual 'page' html's content in a div, and need to do some further HTML style, attribute and structure modifications to massage the xpdftool output to what we want for GS. In order to parse and manipulate the HTML 'DOM' to do this, we're using the Mojo::DOM package that Dr Bainbridge found and which he's compiled up. Mojo::DOM is therefore also committed in this revision. Some further changes and some display fixes are required, but need to check with the others about that.

File size: 3.4 KB

Line
1	.\" Copyright 1997-2017 Glyph & Cog, LLC
2	.TH pdftohtml 1 "10 Aug 2017"
3	.SH NAME
4	pdftohtml \- Portable Document Format (PDF) to HTML converter
5	(version 4.00)
6	.SH SYNOPSIS
7	.B pdftohtml
8	[options]
9	.I PDF-file
10	.I HTML-dir
11	.SH DESCRIPTION
12	.B Pdftohtml
13	converts Portable Document Format (PDF) files to HTML.
14	.PP
15	Pdftohtml reads the PDF file,
16	.IR PDF-file ,
17	and places an HTML file for each page, along with auxiliary images
18	in the directory,
19	.IR HTML-dir .
20	The HTML directory will be created; if it already exists, pdftohtml
21	will report an error.
22	.SH CONFIGURATION FILE
23	Pdftohtml reads a configuration file at startup. It first tries to
24	find the user's private config file, ~/.xpdfrc. If that doesn't
25	exist, it looks for a system-wide config file, typically
26	/usr/local/etc/xpdfrc (but this location can be changed when pdftohtml
27	is built). See the
28	.BR xpdfrc (5)
29	man page for details.
30	.SH OPTIONS
31	Many of the following options can be set with configuration file
32	commands. These are listed in square brackets with the description of
33	the corresponding command line option.
34	.TP
35	.BI \-f " number"
36	Specifies the first page to convert.
37	.TP
38	.BI \-l " number"
39	Specifies the last page to convert.
40	.TP
41	.BI \-z " number"
42	Specifies the initial zoom level. The default is 1.0, which means
43	72dpi, i.e., 1 point in the PDF file will be 1 pixel in the HTML.
44	Using \'-z 1.5', for example, will make the initial view 50% larger.
45	.TP
46	.BI \-r " number"
47	Specifies the resolution, in DPI, for background images. This
48	controls the pixel size of the background image files. The initial
49	zoom level is controlled by the \'-z' option. Specifying a larger
50	\'-r' value will allow the viewer to zoom in farther without upscaling
51	artifacts in the background.
52	.TP
53	.B \-skipinvisible
54	Don't draw invisible text. By default, invisible text (commonly used
55	in OCR'ed PDF files) is drawn as transparent (alpha=0) HTML text.
56	This option tells pdftohtml to discard invisible text entirely.
57	.TP
58	.B \-allinvisible
59	Treat all text as invisible. By default, regular (non-invisible) text
60	is not drawn in the background image, and is instead drawn with HTML
61	on top of the image. This option tells pdftohtml to include the
62	regular text in the background image, and then draw it as transparent
63	(alpha=0) HTML text.
64	.TP
65	.BI \-opw " password"
66	Specify the owner password for the PDF file. Providing this will
67	bypass all security restrictions.
68	.TP
69	.BI \-upw " password"
70	Specify the user password for the PDF file.
71	.TP
72	.B \-q
73	Don't print any messages or errors.
74	.RB "[config file: " errQuiet ]
75	.TP
76	.BI \-cfg " config-file"
77	Read
78	.I config-file
79	in place of ~/.xpdfrc or the system-wide config file.
80	.TP
81	.B \-v
82	Print copyright and version information.
83	.TP
84	.B \-h
85	Print usage information.
86	.RB ( \-help
87	and
88	.B \-\-help
89	are equivalent.)
90	.SH BUGS
91	Some PDF files contain fonts whose encodings have been mangled beyond
92	recognition. There is no way (short of OCR) to extract text from
93	these files.
94	.SH EXIT CODES
95	The Xpdf tools use the following exit codes:
96	.TP
97	0
98	No error.
99	.TP
100	1
101	Error opening a PDF file.
102	.TP
103	2
104	Error opening an output file.
105	.TP
106	3
107	Error related to PDF permissions.
108	.TP
109	99
110	Other error.
111	.SH AUTHOR
112	The pdftohtml software and documentation are copyright 1996-2017 Glyph
113	& Cog, LLC.
114	.SH "SEE ALSO"
115	.BR xpdf (1),
116	.BR pdftops (1),
117	.BR pdftotext (1),
118	.BR pdfinfo (1),
119	.BR pdffonts (1),
120	.BR pdfdetach (1),
121	.BR pdftoppm (1),
122	.BR pdftopng (1),
123	.BR pdfimages (1),
124	.BR xpdfrc (5)
125	.br
126	.B http://www.xpdfreader.com/

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format