root/main/trunk/binaries/windows/bin/xpdf-tools/doc/pdftotext.txt @ 32207

Revision 32207, 6.1 KB (checked in by ak19, 18 months ago)

Got a basic Windows version of PDFPlugin's new paged_html mode working

Line 
1pdftotext(1)                General Commands Manual               pdftotext(1)
2
3
4
5NAME
6       pdftotext  -  Portable Document Format (PDF) to text converter (version
7       4.00)
8
9SYNOPSIS
10       pdftotext [options] [PDF-file [text-file]]
11
12DESCRIPTION
13       Pdftotext converts Portable Document Format (PDF) files to plain text.
14
15       Pdftotext reads the PDF file, PDF-file, and writes a text  file,  text-
16       file.   If  text-file  is not specified, pdftotext converts file.pdf to
17       file.txt.  If text-file is '-', the text is sent to stdout.
18
19CONFIGURATION FILE
20       Pdftotext reads a configuration file at startup.   It  first  tries  to
21       find the user's private config file, ~/.xpdfrc.  If that doesn't exist,
22       it looks for a system-wide config file, typically /usr/local/etc/xpdfrc
23       (but  this  location  can be changed when pdftotext is built).  See the
24       xpdfrc(5) man page for details.
25
26OPTIONS
27       Many of the following options can be set with configuration  file  com-
28       mands.  These are listed in square brackets with the description of the
29       corresponding command line option.
30
31       -f number
32              Specifies the first page to convert.
33
34       -l number
35              Specifies the last page to convert.
36
37       -layout
38              Maintain (as best as possible) the original physical  layout  of
39              the  text.   The  default is to 'undo' physical layout (columns,
40              hyphenation, etc.) and output the text in reading order.  If the
41              -fixed  option is given, character spacing within each line will
42              be determined by the specified character pitch.
43
44       -simple
45              Similar to -layout, but optimized for simple  one-column  pages.
46              This  mode  will do a better job of maintaining horizontal spac-
47              ing, but it will only work properly  with  a  single  column  of
48              text.
49
50       -table Table mode is similar to physical layout mode, but optimized for
51              tabular data, with the goal of keeping rows and columns  aligned
52              (at  the  expense of inserting extra whitespace).  If the -fixed
53              option is given, character spacing  within  each  line  will  be
54              determined by the specified character pitch.
55
56       -lineprinter
57              Line  printer  mode  uses  a  strict  fixed-character-pitch  and
58              -height layout.  That is, the page is broken into  a  grid,  and
59              characters  are  placed  into that grid.  If the grid spacing is
60              too small for the actual characters, the result is extra  white-
61              space.   If the grid spacing is too large, the result is missing
62              whitespace.  The grid spacing can be specified using the  -fixed
63              and  -linespacing  options.  If one or both are not given on the
64              command line, pdftotext  will  attempt  to  compute  appropriate
65              value(s).
66
67       -raw   Keep the text in content stream order.  Depending on how the PDF
68              file was generated, this may or may not be useful.
69
70       -fixed number
71              Specify the character pitch (character width),  in  points,  for
72              physical  layout,  table, or line printer mode.  This is ignored
73              in all other modes.
74
75       -linespacing number
76              Specify the line spacing, in  points,  for  line  printer  mode.
77              This is ignored in all other modes.
78
79       -clip  Text which is hidden because of clipping is removed before doing
80              layout, and then added back in.  This can be helpful for  tables
81              where clipped (invisible) text would overlap the next column.
82
83       -nodiag
84              Diagonal text, i.e., text that is not close to one of the 0, 90,
85              180, or 270 degree axes, is discarded.  This is useful  to  skip
86              watermarks drawn on top of body text, etc.
87
88       -enc encoding-name
89              Sets  the  encoding  to  use for text output.  The encoding-name
90              must be defined with the  unicodeMap  command  (see  xpdfrc(5)).
91              The  encoding name is case-sensitive.  This defaults to "Latin1"
92              (which is a built-in encoding).  [config file: textEncoding]
93
94       -eol unix | dos | mac
95              Sets the end-of-line convention to use for text output.  [config
96              file: textEOL]
97
98       -nopgbrk
99              Don't  insert  page breaks (form feed characters) between pages.
100              [config file: textPageBreaks]
101
102       -bom   Insert a Unicode byte order marker (BOM) at  the  start  of  the
103              text output.
104
105       -opw password
106              Specify  the  owner  password  for the PDF file.  Providing this
107              will bypass all security restrictions.
108
109       -upw password
110              Specify the user password for the PDF file.
111
112       -q     Don't print any messages or errors.  [config file: errQuiet]
113
114       -cfg config-file
115              Read config-file in place of ~/.xpdfrc or the system-wide config
116              file.
117
118       -v     Print copyright and version information.
119
120       -h     Print usage information.  (-help and --help are equivalent.)
121
122BUGS
123       Some  PDF  files contain fonts whose encodings have been mangled beyond
124       recognition.  There is no way (short of OCR) to extract text from these
125       files.
126
127EXIT CODES
128       The Xpdf tools use the following exit codes:
129
130       0      No error.
131
132       1      Error opening a PDF file.
133
134       2      Error opening an output file.
135
136       3      Error related to PDF permissions.
137
138       99     Other error.
139
140AUTHOR
141       The  pdftotext software and documentation are copyright 1996-2017 Glyph
142       & Cog, LLC.
143
144SEE ALSO
145       xpdf(1),  pdftops(1),  pdftohtml(1),  pdfinfo(1),  pdffonts(1),  pdfde-
146       tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5)
147       http://www.xpdfreader.com/
148
149
150
151                                  10 Aug 2017                     pdftotext(1)
Note: See TracBrowser for help on using the browser.