source: trunk/gsdl/macros/prescrpt.dm@ 1095

Last change on this file since 1095 was 1095, checked in by gwp, 24 years ago

The DL meeting on April 13, 2000 approved (or rather, did not object to) a broad set of changes to the NZDL.org web pages. These changes include a page for projects, upadtes on the people involved, rearranging the home page, and the removal of th technology pages.

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 7.6 KB
Line 
1package prescript
2
3
4#######################################################################
5# java images/scripts
6#######################################################################
7
8# the _javalinks_ macros are the flashy image links at the top right of
9# the page.
10
11_javalinks_ {_imagehome_}
12_javalinks_ [v=1] {
13_imagehome_<br>
14}
15
16
17#######################################################################
18# icons
19#######################################################################
20
21_iconhpscrpt_ {<img src="_httpiconhpscrpt_" width=_widthhpscrpt_ height=_heighthpscrpt_}
22
23#######################################################################
24# http macros
25#
26# These contain the url without any quotes
27#######################################################################
28
29_httpiconhpscrpt_ {_httpimg_/h\_pscrpt.gif}
30_widthhpscrpt_ {200}
31_heighthpscrpt_ {57}
32
33
34#######################################################################
35# page content
36#######################################################################
37
38_pagetitle_ {NZDL: PreScript}
39
40_imagethispage_ {_iconhpscrpt_}
41
42_content_ {
43_iconblankbar_
44<p>
45<i>PreScript</i> is a utility for extracting text from PostScfript files.
46PreScript offers:
47
48<dl>
49<dt><b>PostScript conversion to plain ASCII or HTML.</b></dt>
50
51<dd><i>PreScript</i> is really a PostScript to plain text converter,
52but rudimentary HTML can also be produced. Tags are inserted to mark
53paragraphs (&lt;p&gt;), short lines (&lt;br&gt;), page breaks
54(&lt;hr&gt;), and header and footers (italisized with
55&lt;i&gt;...&lt;/i&gt;).
56
57<dt><b>Paragraph boundaries detection.</b></dt>
58<dd><i>PreScript</i> determines the line spacing of a document and
59uses this (and also indentations) to determine paragraph boundaries.
60
61<dt><b>Hyphenation removal.</b></dt>
62<dd>Hyphenated words are de-hyphenated.
63
64<dt><b>Ligature translation.</b></dt>
65<dd>Most ligatures used by TeX document are detected.
66<i>PreScript</i> doesn't track font changes making it impossible to
67reliablely detect all ligatures.
68
69</dl>
70<br>
71<h3>Installing PreScript</h3>
72<i>PreScript</i> is written in PostScript and Python. You will
73need <a href="http://www.cs.wisc.edu/~ghost">Ghostscript</a> (at least
74version 4.01) and the
75<a href="http://www.python.org">Python</a> interpreter (at least
76version 1.4.).
77
78<h4>The PreScript 0.1 distribution</h4>
79
80This distribution is the most stable - it is what you should use to
81do real work.
82
83<ul>
84
85<li>Download the <a href="http://www.nzdl.org/download/prescript/prescript-0.1.tar.gz">PreScript
860.1</a> distribution.
87
88<li>Define the environment variable <tt>PRESCRIPT_DIR</tt> to the
89directory where <i>PreScript</i> is installed (or where ever you put
90<tt>prescript.ps</tt>).
91
92<li>Move <tt>prescript.py</tt> to a directory listed in your
93<tt>PATH</tt> environment variable. You may want to remove the
94<tt>.py</tt> suffix (<tt>prescript.py</tt> can be either a standalone
95program, or an imported library of another Python program).
96
97<li>Change <tt>\#! /usr/local/bin/python</tt> in
98<tt>prescript.py</tt> to the location of your Python interpreter.
99
100</ul>
101
102<h4>The PreScript 2 distribution</h4>
103
104This is a beta release of our latest version. This version is a lot cleaner and
105faster; it is also extensible (users can write their own renderers),
106better documented, and contains better prediction of line, paragraph,
107and page breaks. If you notice any bugs, want to request new
108features, or want to become a beta tester please email the <a
109href="mailto:[email protected]">New Zealand Digital Library
110administrator</a>.
111
112<ul>
113
114<li>Download a <i>PreScript 2</i> distribution (the later versions are more stable).
115
116<blockquote>
117<a href="http://www.nzdl.org/download/prescript/prescript-2.0.tar.gz">PreScript 2.0</a><br>
118<a href="http://www.nzdl.org/download/prescript/prescript-2.1.tar.gz">PreScript 2.1</a><br>
119<a href="http://www.nzdl.org/download/prescript/prescript-2.2.tar.gz">PreScript 2.2</a> -- same as Prescript 2.1 but compatibility issues
120with python 1.5 have been fixed
121</blockquote>
122
123<li>On unix systems 'make install' will install prescript to /usr/local/bin. It will
124also install the accompanying manual page (to install somewhere else simply edit
125the Makefile).
126
127<li>If not installing with the make utility:<br>
128It is easiest if all of the program scripts are kept in the same directory,
129which ideally should be listed in the <tt>PATH</tt> environment variable. If
130this is inconvenient, be sure that <tt>PRESCRIPT_DIR</tt> points to where
131<tt>prescript.ps</tt> is installed, and that <tt>PYTHONPATH</tt> points to
132where <tt>*.py</tt> are installed.
133
134<li>Change <tt>\#!/usr/local/bin/python</tt> in <tt>prescript</tt> to the
135location of your Python interpreter ('make install' does NOT do this for you).
136
137</ul>
138<br>
139
140<h3>Running PreScript</h3>
141<b>Usage:</b>
142<blockquote>
143prescript <i>format</i> <i>input</i> [<i>output</i>]
144</blockquote>
145
146<ul>
147<li><i>format</i> is either <tt>plain</tt> or <tt>html</tt>.
148<li><i>input</i> is the input filename, a PostScript file.
149<li><i>output</i> is the output filename. By default, the output file
150name is the same as the input filename with the path removed and suffix
151replace to either <tt>.txt</tt> or <tt>.html</tt>.
152</ul>
153
154<br>
155<h3>Bugs</h3>
156Please report bugs to the <a
157href="mailto:[email protected]">New Zealand Digital Library
158administrator</a>.
159<p><br>
160
161<h3>Notes</h3>
162<i>PreScript</i> is a port of a Perl program used by the New Zealand Digital
163Library project to convert computer science technical reports to HTML. The
164Perl version is deemed unfit for a public release because the code is quite
165messy (a consequence of Perl's cumbersome syntax for defining objects). The
166Python version is considerably easier to understand, maintain, and extend.
167The technical paper <a
168href="http://www.nzdl.org/download/prescript/prescript.ps.gz">prescript.ps.gz</a> documents the
169algorithms and heuristics used in <i>PreScript 0.1</i> - there is an
170update to this for <i>PreScript 2</i> inside its distribution
171archive.<p><br>
172
173<h3>Other Postscript Converters</h3>
174Here is a summary of other PostScript to text converters we found.
175
176<dl>
177<dt><a
178href="http://www.research.digital.com/SRC/virtualpaper/pstotext.html"><b>pstotext</b></a></dt>
179
180
181<dd>From the DEC Virtual Paper research project. PostScript program
182and C program. Probably the best PostScript to text converter (after
183<i>PreScript</i>, of course).
184
185<dt><a
186href="http://stasi.bradley.edu/ftp/pub/ps2html/ps2html-v2.html"><b>ps2html,
187The Sequel</b></a></dt>
188
189<dd>Developed at John Hopkins University to convert JHU journal
190articals to HTML. This converter attempts to preserve the formatting
191of the original PostScript document, but is tied to PostScript
192files generated with a specific package (QuarkXPress?). A table
193describing a number of parameters is used to aid conversion and can be
194modified for new formats. Uses a variation of Ghostscript's
195<tt>ps2ascii.ps</tt>.
196
197<dt><b>ps2ascii.ps</b></dt>
198
199<dd>Part of the Ghostscript distribution. <tt>ps2ascii.ps</tt> is
200considerably less robust than <i>PreScript</i>.
201
202<dt><a
203href="ftp://ftp.mpce.mq.edu.au/pub/comp/src/ps2a.sh"><b>ps2a.sh</b></a></dt>
204
205<dd>A PostScript program similar to Ghostscript's <tt>ps2ascii.ps</tt>.
206
207<dt><a href="ftp://apocalypse.engr.ucf.edu:/usr/ssd/ps2ascii.shar"><b>ps2ascii.shar</b></a></dt>
208
209<dd>A PostScript program and Perl script.
210
211<dt><a
212href="ftp://wilma.cs.brown.edu/pub/postscript/ps2ascii.pl"><b>ps2ascii.pl</b></a></dt>
213
214<dd>A Perl script that extracts parenthesized text from a PostScript
215file.
216
217<dt><a
218href="ftp://ftp.funet.fi/pub/archive/alt.sources/volume92/Feb/920223.01.gz"><b>ps2txt</b></a></dt>
219
220<dd>A stand alone C program that extracts parenthesized text. Some
221special code to deal with <tt>dvips</tt> generated files.
222
223</dl>
224
225}
Note: See TracBrowser for help on using the repository browser.