source: trunk/gsdl/macros/prescrpt.dm@ 1049

Last change on this file since 1049 was 1049, checked in by nzdl, 24 years ago

* empty log message *

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 7.5 KB
Line 
1package prescript
2
3
4#######################################################################
5# java images/scripts
6#######################################################################
7
8# the _javalinks_ macros are the flashy image links at the top right of
9# the page.
10
11_javalinks_ {_imagehome_}
12_javalinks_ [v=1] {
13_imagehome_<br>
14}
15
16
17#######################################################################
18# icons
19#######################################################################
20
21_iconhpscrpt_ {<img src="_httpiconhpscrpt_" width=_widthhpscrpt_ height=_heighthpscrpt_}
22
23#######################################################################
24# http macros
25#
26# These contain the url without any quotes
27#######################################################################
28
29_httpiconhpscrpt_ {_httpimg_/h\_pscrpt.gif}
30_widthhpscrpt_ {200}
31_heighthpscrpt_ {57}
32
33
34#######################################################################
35# page content
36#######################################################################
37
38_pagetitle_ {NZDL: PreScript}
39
40_imagethispage_ {_iconhpscrpt_}
41
42_content_ {
43_iconblankbar_
44<p><i>PreScript</i> offers:
45
46<dl>
47<dt><b>PostScript conversion to plain ASCII or HTML.</b></dt>
48
49<dd><i>PreScript</i> is really a PostScript to plain text converter,
50but rudimentary HTML can also be produced. Tags are inserted to mark
51paragraphs (&lt;p&gt;), short lines (&lt;br&gt;), page breaks
52(&lt;hr&gt;), and header and footers (italisized with
53&lt;i&gt;...&lt;/i&gt;).
54
55<dt><b>Paragraph boundaries detection.</b></dt>
56<dd><i>PreScript</i> determines the line spacing of a document and
57uses this (and also indentations) to determine paragraph boundaries.
58
59<dt><b>Hyphenation removal.</b></dt>
60<dd>Hyphenated words are de-hyphenated.
61
62<dt><b>Ligature translation.</b></dt>
63<dd>Most ligatures used by TeX document are detected.
64<i>PreScript</i> doesn't track font changes making it impossible to
65reliablely detect all ligatures.
66
67</dl>
68<br>
69<h3>Installing PreScript</h3>
70<i>PreScript</i> is written in PostScript and Python. You will
71need <a href="http://www.cs.wisc.edu/~ghost">Ghostscript</a> (at least
72version 4.01) and the
73<a href="http://www.python.org">Python</a> interpreter (at least
74version 1.4.).
75
76<h4>The PreScript 0.1 distribution</h4>
77
78This distribution is the most stable - it is what you should use to
79do real work.
80
81<ul>
82
83<li>Download the <a href="http://www.nzdl.org/download/prescript/prescript-0.1.tar.gz">PreScript
840.1</a> distribution.
85
86<li>Define the environment variable <tt>PRESCRIPT_DIR</tt> to the
87directory where <i>PreScript</i> is installed (or where ever you put
88<tt>prescript.ps</tt>).
89
90<li>Move <tt>prescript.py</tt> to a directory listed in your
91<tt>PATH</tt> environment variable. You may want to remove the
92<tt>.py</tt> suffix (<tt>prescript.py</tt> can be either a standalone
93program, or an imported library of another Python program).
94
95<li>Change <tt>\#! /usr/local/bin/python</tt> in
96<tt>prescript.py</tt> to the location of your Python interpreter.
97
98</ul>
99
100<h4>The PreScript 2 distribution</h4>
101
102This is a beta release of our latest version. This version is a lot cleaner and
103faster; it is also extensible (users can write their own renderers),
104better documented, and contains better prediction of line, paragraph,
105and page breaks. If you notice any bugs, want to request new
106features, or want to become a beta tester please email the <a
107href="mailto:[email protected]">New Zealand Digital Library
108administrator</a>.
109
110<ul>
111
112<li>Download a <i>PreScript 2</i> distribution (the later versions are more stable).
113
114<blockquote>
115<a href="http://www.nzdl.org/download/prescript/prescript-2.0.tar.gz">PreScript 2.0</a><br>
116<a href="http://www.nzdl.org/download/prescript/prescript-2.1.tar.gz">PreScript 2.1</a><br>
117<a href="http://www.nzdl.org/download/prescript/prescript-2.2.tar.gz">PreScript 2.2</a> -- same as Prescript 2.1 but compatibility issues
118with python 1.5 have been fixed
119</blockquote>
120
121<li>On unix systems 'make install' will install prescript to /usr/local/bin. It will
122also install the accompanying manual page (to install somewhere else simply edit
123the Makefile).
124
125<li>If not installing with the make utility:<br>
126It is easiest if all of the program scripts are kept in the same directory,
127which ideally should be listed in the <tt>PATH</tt> environment variable. If
128this is inconvenient, be sure that <tt>PRESCRIPT_DIR</tt> points to where
129<tt>prescript.ps</tt> is installed, and that <tt>PYTHONPATH</tt> points to
130where <tt>*.py</tt> are installed.
131
132<li>Change <tt>\#!/usr/local/bin/python</tt> in <tt>prescript</tt> to the
133location of your Python interpreter ('make install' does NOT do this for you).
134
135</ul>
136<br>
137
138<h3>Running PreScript</h3>
139<b>Usage:</b>
140<blockquote>
141prescript <i>format</i> <i>input</i> [<i>output</i>]
142</blockquote>
143
144<ul>
145<li><i>format</i> is either <tt>plain</tt> or <tt>html</tt>.
146<li><i>input</i> is the input filename, a PostScript file.
147<li><i>output</i> is the output filename. By default, the output file
148name is the same as the input filename with the path removed and suffix
149replace to either <tt>.txt</tt> or <tt>.html</tt>.
150</ul>
151
152<br>
153<h3>Bugs</h3>
154Please report bugs to the <a
155href="mailto:[email protected]">New Zealand Digital Library
156administrator</a>.
157<p><br>
158
159<h3>Notes</h3>
160<i>PreScript</i> is a port of a Perl program used by the New Zealand Digital
161Library project to convert computer science technical reports to HTML. The
162Perl version is deemed unfit for a public release because the code is quite
163messy (a consequence of Perl's cumbersome syntax for defining objects). The
164Python version is considerably easier to understand, maintain, and extend.
165The technical paper <a
166href="http://www.nzdl.org/download/prescript/prescript.ps.gz">prescript.ps.gz</a> documents the
167algorithms and heuristics used in <i>PreScript 0.1</i> - there is an
168update to this for <i>PreScript 2</i> inside its distribution
169archive.<p><br>
170
171<h3>Other Postscript Converters</h3>
172Here is a summary of other PostScript to text converters we found.
173
174<dl>
175<dt><a
176href="http://www.research.digital.com/SRC/virtualpaper/pstotext.html"><b>pstotext</b></a></dt>
177
178
179<dd>From the DEC Virtual Paper research project. PostScript program
180and C program. Probably the best PostScript to text converter (after
181<i>PreScript</i>, of course).
182
183<dt><a
184href="http://stasi.bradley.edu/ftp/pub/ps2html/ps2html-v2.html"><b>ps2html,
185The Sequel</b></a></dt>
186
187<dd>Developed at John Hopkins University to convert JHU journal
188articals to HTML. This converter attempts to preserve the formatting
189of the original PostScript document, but is tied to PostScript
190files generated with a specific package (QuarkXPress?). A table
191describing a number of parameters is used to aid conversion and can be
192modified for new formats. Uses a variation of Ghostscript's
193<tt>ps2ascii.ps</tt>.
194
195<dt><b>ps2ascii.ps</b></dt>
196
197<dd>Part of the Ghostscript distribution. <tt>ps2ascii.ps</tt> is
198considerably less robust than <i>PreScript</i>.
199
200<dt><a
201href="ftp://ftp.mpce.mq.edu.au/pub/comp/src/ps2a.sh"><b>ps2a.sh</b></a></dt>
202
203<dd>A PostScript program similar to Ghostscript's <tt>ps2ascii.ps</tt>.
204
205<dt><a href="ftp://apocalypse.engr.ucf.edu:/usr/ssd/ps2ascii.shar"><b>ps2ascii.shar</b></a></dt>
206
207<dd>A PostScript program and Perl script.
208
209<dt><a
210href="ftp://wilma.cs.brown.edu/pub/postscript/ps2ascii.pl"><b>ps2ascii.pl</b></a></dt>
211
212<dd>A Perl script that extracts parenthesized text from a PostScript
213file.
214
215<dt><a
216href="ftp://ftp.funet.fi/pub/archive/alt.sources/volume92/Feb/920223.01.gz"><b>ps2txt</b></a></dt>
217
218<dd>A stand alone C program that extracts parenthesized text. Some
219special code to deal with <tt>dvips</tt> generated files.
220
221</dl>
222
223}
Note: See TracBrowser for help on using the repository browser.