1 | <!doctype html public "-//W3C//DTD HTML 4.0//EN">
|
---|
2 | <html>
|
---|
3 | <head>
|
---|
4 | <title>W3MIR HOWTO</title>
|
---|
5 | <style type="text/css">
|
---|
6 | <!--
|
---|
7 | body { background-color: white }
|
---|
8 | h1, h2, h3, b { font-family: sans-serif }
|
---|
9 | .red { color: red }
|
---|
10 | -->
|
---|
11 | </style>
|
---|
12 | <body>
|
---|
13 | <h1>W3MIR HOWTO</h1>
|
---|
14 |
|
---|
15 | <p><b>Corresponding to w3mir version 1.0.2 and above</b>
|
---|
16 |
|
---|
17 | <p>W3mir is an all purpose WWW copying and mirroring program. Its
|
---|
18 | main focus is copying complete directory structures keeping your copy
|
---|
19 | browseable through a web server, or directly off a disk or CDROM if
|
---|
20 | you want. W3mir will fix URLs that are redirected and everything else
|
---|
21 | that needs to be fixed to make your copy browseable. But it also does
|
---|
22 | odd jobs, retrieving single documents, batch getting several documents
|
---|
23 | and more. You may tell w3mir not to change anything in the retrieved
|
---|
24 | documents. W3mir has been in development quite a long time so you
|
---|
25 | find options to do a lot of things needed when copying things off the
|
---|
26 | web.
|
---|
27 |
|
---|
28 | <p>With w3mir you may copy the entire contents a web server. Or just
|
---|
29 | a directory hierarchy, or several related hierarchies off as many
|
---|
30 | servers as you like. They don't even have to be related.
|
---|
31 |
|
---|
32 | <p>W3mir supports HTML4, and has partial support for CSS, Java,
|
---|
33 | ActiveX and Adobe Acrobat (PDF) files. And it works on Win32
|
---|
34 | machines.
|
---|
35 |
|
---|
36 | <p><b>Warning:</b> W3mir enables you to copy a lot of things off the
|
---|
37 | Web, but remember, the things you retrieve might be copyrighted and
|
---|
38 | the copy you make with w3mir might in fact be illegal to make and
|
---|
39 | posses.
|
---|
40 |
|
---|
41 | <hr>
|
---|
42 |
|
---|
43 | <h2><a name="contents">Contents</a></h2>
|
---|
44 |
|
---|
45 | <p><a href="#intro">README</a> (You want to read this! <b
|
---|
46 | class="red">Really!</b>)
|
---|
47 |
|
---|
48 | <p><b>How do I...</b>
|
---|
49 | <ol>
|
---|
50 | <li><p><a href="#copy">copy a file?</a>
|
---|
51 | <li><p><a href="#recurse">copy a directory hierarchy?</a>
|
---|
52 | <li><p><a href="#resources">copy the needed resource files from another
|
---|
53 | directory hierarchy?</a>
|
---|
54 | <li><p><a href="#ignore">avoid copying files I don't want or copy only
|
---|
55 | files I want?</a>
|
---|
56 | <li><p><a href="#rm">remove the files that are no longer on the
|
---|
57 | original site from the mirror?</a>
|
---|
58 | <li><p><a href="#depth">limit how deep w3mir will recurse?</a>
|
---|
59 | <li><p><a href="#memory">limit w3mirs memory usage?</a>
|
---|
60 | <li><p><a href="#multi">copy files from multiple sites?</a>
|
---|
61 | <li><p><a href="#alias">copy files from one server with several names?</a>
|
---|
62 | <li><p><a href="#aborted">restart a mirror process after stopping it
|
---|
63 | prematurely?</a>
|
---|
64 | <li><p><a href="#enlarge">enlarge or prune an established mirror?</a>
|
---|
65 | <li><p><a href="#cat">'cat' a file?</a>
|
---|
66 | <li><p><a href="#list">list URLs in a document?</a>
|
---|
67 | <li><p><a href="#robots">disable robots.txt obedience?</a>
|
---|
68 | <li><p><a href="#corrupt">stop w3mir from corrupting binary files?</a>
|
---|
69 | <li><p><a href="#auth">copy a site that wants user-name and password?</a>
|
---|
70 | <li><p><a href="#mauth">access a site that wants several different
|
---|
71 | user-names and passwords?</a>
|
---|
72 | <li><p><a href="#proxy">use a proxy server?</a>
|
---|
73 | <li><p><a href="#pauth">authenticate myself to a proxy server?</a>
|
---|
74 | <li><p><a href="#proxytweak">ensure that the proxy server ...?</a>
|
---|
75 | <li><p><a href="#batchget">batch get files with w3mir?</a>
|
---|
76 | <li><p><a href="#cgi">handle CGI?</a>
|
---|
77 | <li><p><a href="#imap">handle server side image-maps?</a>
|
---|
78 | <li><p><a href="#java">handle Java and ActiveX?</a>
|
---|
79 | <li><p><a href="#script">handle java-script and other script languages?</a>
|
---|
80 | <li><p><a href="#css">handle the other things with 'partial support'?</a>
|
---|
81 | <li><p><a href="#anon">keep my identity secret?</a>
|
---|
82 | <li><p><a href="#ns">pretend that I'm using Netscape, Internet
|
---|
83 | Explorer or Lynx?</a>
|
---|
84 | <li><p><a href="#other">do other things?</a>
|
---|
85 | </ol>
|
---|
86 |
|
---|
87 | <hr>
|
---|
88 |
|
---|
89 | <h2><a name="intro">README</a></h2>
|
---|
90 |
|
---|
91 | <p>W3mir may be used in two, main, ways:
|
---|
92 |
|
---|
93 | <ul>
|
---|
94 | <li><p>To copy something random once.
|
---|
95 | <li><p>To keep a local mirror of some remote site
|
---|
96 | </ul>
|
---|
97 |
|
---|
98 | <p>To copy something random once there is a high likeliness you can
|
---|
99 | just start w3mir with some simple options and it will do the job you
|
---|
100 | want it to. Providing that the remote site is not too complex and
|
---|
101 | your expectations of the copy aren't high :-) This is what wget, the
|
---|
102 | gnu w3 mirroring program, does and is good at.
|
---|
103 |
|
---|
104 | <h3>Configuration file</h3>
|
---|
105 |
|
---|
106 | <p>Once you want to keep a copy of a remote site up-to-date over time,
|
---|
107 | mirror something with server side image-maps, redirects or
|
---|
108 | authentication you have to write a configuration file for w3mir. This
|
---|
109 | is what w3mir is good at, compared to wget. Writing the file is not
|
---|
110 | hard, and there are two example files in the w3mir distribution. It
|
---|
111 | will also be explained here. The configuration file is typically
|
---|
112 | called <tt>.w3mirc</tt> (<tt>w3mir.ini</tt> on win32 machines), and
|
---|
113 | can be written with a simple text editor. It is kept in the top
|
---|
114 | directory of the mirror, where w3mir will find it when it starts.
|
---|
115 | Please refer to the <a href="#contents">contents</a> for how to handle
|
---|
116 | a specific problem with a configuration file.
|
---|
117 |
|
---|
118 | <hr>
|
---|
119 |
|
---|
120 | <h2>The answers:</h2>
|
---|
121 |
|
---|
122 | <hr>
|
---|
123 |
|
---|
124 | <h3><a name="copy">How do I copy a file?</a></h3>
|
---|
125 |
|
---|
126 | <p>To copy the top page off www.starwars.com:
|
---|
127 |
|
---|
128 | <p><tt>w3mir http://www.starwars.com/</tt>
|
---|
129 |
|
---|
130 | <p><b>Note:</b> it is <em>important</em> that you give the trailing
|
---|
131 | slash for server names and directories.
|
---|
132 |
|
---|
133 | <hr>
|
---|
134 |
|
---|
135 | <h3><a name="recurse">How to I copy a directory hierarchy?</a></h3>
|
---|
136 |
|
---|
137 | <p>To copy the entire stuff about episode I from www.starwars.com
|
---|
138 | which is stored in <tt>http://www.starwars.com/episode-i/</tt> (I don't
|
---|
139 | recommend this, it's quite a lot of data):
|
---|
140 |
|
---|
141 | <p><tt>w3mir -r http://www.starwars.com/episode-i/</tt>
|
---|
142 |
|
---|
143 | <p>The corresponding configuration file is simple:
|
---|
144 |
|
---|
145 | <pre>
|
---|
146 | Options: recurse
|
---|
147 | URL: http://www.starwars.com/episode-i/
|
---|
148 | Fixup: run
|
---|
149 | </pre>
|
---|
150 |
|
---|
151 | <p>The <tt>-r</tt> option makes w3mir recurse down from the starting
|
---|
152 | point. It will only copy all the documents under
|
---|
153 | http://www.starwars.com/episode-i/ that it sees referenced from those
|
---|
154 | same documents. W3mir will <em>not</em> retrieve documents from
|
---|
155 | http://www.starwars.com/ because it is considered to be 'over' the
|
---|
156 | starting point.
|
---|
157 |
|
---|
158 | <p>The command-line will get you a copy that is definitely browseable
|
---|
159 | via a WEB server, and possibly browseable directly from a CDROM or
|
---|
160 | hard-disk. To ensure that it is browseable from CDROM and disk you
|
---|
161 | need to use a configuration file with the <tt>Fixup: run</tt> line in.
|
---|
162 | It causes w3mir to edit anything that needs editing after the mirror
|
---|
163 | has completed, including fixing URLs that caused redirects. The dirty
|
---|
164 | work is done by w3mirs helper program w3mfix. The directive will
|
---|
165 | cause w3mfix to be run each time w3mir completes the mirror.
|
---|
166 |
|
---|
167 | <p><b>Note:</b> it is <em>important</em> that you give the trailing
|
---|
168 | slash after the directory name. Specifying
|
---|
169 | <tt>http://www.starwars.com/episode-i</tt> and
|
---|
170 | <tt>http://www.starwars.com/episode-i/</tt> is quite different in
|
---|
171 | w3mirs eyes. In the former case episode-i is considered to be a
|
---|
172 | document within the / (top) directory of www.starwars.com and w3mir
|
---|
173 | will recurse from /, which is a lot more than you wanted. In the
|
---|
174 | latter case w3mir understands that episode-i is a directory and will
|
---|
175 | consider that directory to be the staring point, which is what you
|
---|
176 | wanted.
|
---|
177 |
|
---|
178 | <hr>
|
---|
179 |
|
---|
180 | <h3><a name="resources">How do I copy the needed resource files from
|
---|
181 | another directory hierarchy?</a></h3>
|
---|
182 |
|
---|
183 | <p>Some sites store their documents in one place, and puts their
|
---|
184 | banners, icons and such in a separate directory called
|
---|
185 | <tt>/images</tt>, <tt>/banners</tt>, <tt>/icons</tt>,
|
---|
186 | <tt>/resources</tt> or some such. Unless you retrieve these as well as
|
---|
187 | the documents things will probably not be too colorful. So, imagine
|
---|
188 | that the starwars site stored all the images in one holding directory
|
---|
189 | called <tt>/imagery</tt> and you want to copy all the stuff in it that
|
---|
190 | the episode-i pages need. Then you do this:
|
---|
191 |
|
---|
192 | <pre>
|
---|
193 | Options: recurse
|
---|
194 | URL: http://www.starwars.com/episode-i/ episode-i
|
---|
195 | Also: http://www.starwars.com/imagery/ imagery
|
---|
196 | Fixup: run
|
---|
197 | </pre>
|
---|
198 |
|
---|
199 | <p>There are two changes here compared to the simpler file we started
|
---|
200 | with: There is an extra argument at the end of the URL directive. It
|
---|
201 | tells w3mir to store everything gotten from
|
---|
202 | <tt>http://www.starwars.com/episode-i/</tt> in the subdirectory
|
---|
203 | <tt>episode-i</tt>. The directory can be omitted, but I think its
|
---|
204 | neater this way. Then the new directive 'Also:'. It tells w3mir that
|
---|
205 | you also want whatever the documents under
|
---|
206 | <tt>http://www.starwars.com/episode-i/</tt> references under
|
---|
207 | <tt>http://www.starwars.com/imagery/</tt>.
|
---|
208 |
|
---|
209 | <p><b>Note:</b> this will only get stuff that was used by the
|
---|
210 | documents under <tt>http://www.starwars.com/episode-i/</tt>, anything
|
---|
211 | stored under <tt>http://www.starwars.com/imagery/</tt> which is not
|
---|
212 | used will not be retrieved. If you want everything under
|
---|
213 | <tt>imagery</tt> to be retrived use the <tt>Also-quene:</tt>
|
---|
214 | directive.
|
---|
215 |
|
---|
216 | <hr>
|
---|
217 |
|
---|
218 | <h3><a name="ignore">How do I avoid copying files I don't want or copy
|
---|
219 | only files I want?</a></h3>
|
---|
220 |
|
---|
221 | <p>To control what files w3mir copies you can use the
|
---|
222 | <tt>Ignore:</tt>, <tt>Fetch:</tt>, <tt>Ignore-RE:</tt> and
|
---|
223 | <tt>Fetch-RE:</tt> directives in the configuration file. The embeded
|
---|
224 | references to any file you chose to ignore, i.e., not copy, will point
|
---|
225 | at the original site, <em>not</em> to the mirror. This means that the
|
---|
226 | mirror user may still get ahold of the file from the original source
|
---|
227 | by simply clicking if she so desires.
|
---|
228 |
|
---|
229 | <p>If a site contains huge .wav audio files that you are not
|
---|
230 | interested in you put
|
---|
231 |
|
---|
232 | <pre>
|
---|
233 | Ignore: *.wav
|
---|
234 | </pre>
|
---|
235 |
|
---|
236 | <p>in the configuration file. You may ignore as many different
|
---|
237 | filename patterns as you want. If you are mirroring a site you want
|
---|
238 | very few, specific files from, say all HTML (named
|
---|
239 | <em>something</em><tt>.html</tt>) and all Mpeg video files (named
|
---|
240 | <em>something</em><tt>.mpg</tt>) you can write this:
|
---|
241 |
|
---|
242 | <pre>
|
---|
243 | Fetch: *.html
|
---|
244 | Fetch: *.mpg
|
---|
245 | Ignore: *
|
---|
246 | </pre>
|
---|
247 |
|
---|
248 | <p>W3mir will test each filename against each Fetch/Ignore rule in
|
---|
249 | sequence. A html file will match the first line and be fetched. Any
|
---|
250 | mpg file will match the second line and be fetched. All other files
|
---|
251 | will match the third line, and be ignored. This last line is needed
|
---|
252 | because the default is to get any files which are not ignored. By
|
---|
253 | arranging fetch and ignore rules carefully you may retrieve exactly
|
---|
254 | the filename patterns you want and not retrieve anything else.
|
---|
255 |
|
---|
256 | <p>If you decide you also want all Mpeg Layer 3 audio files
|
---|
257 | (<em>something.</em><tt>mp3</tt>) from the site, after the mirror has
|
---|
258 | been established. Then you add this:
|
---|
259 |
|
---|
260 | <pre>
|
---|
261 | Fetch: *.mp3
|
---|
262 | </pre>
|
---|
263 |
|
---|
264 | <p>as the third line, making the <tt>Ignore: *</tt> line the forth and
|
---|
265 | last. Then you must fix all references to .mp3 files within the
|
---|
266 | mirror by running w3mfix thus:
|
---|
267 |
|
---|
268 | <pre>
|
---|
269 | w3mfix -editref .mp3
|
---|
270 | </pre>
|
---|
271 |
|
---|
272 | <p>which will edit all references to .mp3 files, pointing them the
|
---|
273 | right place, on your disk. Ditto when you remove a fetch rule, or add
|
---|
274 | or remove an ignore rule. See the answer about <a
|
---|
275 | href="#enlarge">enlarging and pruning</a> mirrors for more examples of
|
---|
276 | using <tt>w3mfix -editme ...</tt>
|
---|
277 |
|
---|
278 | <p><b>Note:</b> when retrieving only a very limited set of files, as
|
---|
279 | in the example above, you <em>must</em> retrieve the html files,
|
---|
280 | because how else will w3mir find URLs of files to retrieve? Only html
|
---|
281 | files contain links to other files.
|
---|
282 |
|
---|
283 | <p>Similarly, you may chose to not mirror whole branches of the
|
---|
284 | original site. If you for example mirror my home-pages, and you decide
|
---|
285 | not to mirror the comics pages you can put
|
---|
286 |
|
---|
287 | <pre>
|
---|
288 | Ignore: /ts/
|
---|
289 | </pre>
|
---|
290 |
|
---|
291 | <p>or more precisely
|
---|
292 |
|
---|
293 | <pre>
|
---|
294 | Ignore: http://www.ifi.uio.no/~janl/ts/
|
---|
295 | </pre>
|
---|
296 |
|
---|
297 | <p>in the configuration file. If you do this after having established
|
---|
298 | the mirror you use w3mfix to fix the references:
|
---|
299 |
|
---|
300 | <pre>
|
---|
301 | w3mfix -editref /ts/
|
---|
302 | </pre>
|
---|
303 |
|
---|
304 | <p><tt>Fetch:</tt> and <tt>Ignore:</tt> rules can only use a very
|
---|
305 | limited subset of the Unix wild-cards. w3mir understands only '?',
|
---|
306 | '*', and '[a-z]' ranges.
|
---|
307 |
|
---|
308 | <p><tt>Ignore-RE:</tt> and <tt>Fetch-RE:</tt> are the same as
|
---|
309 | <tt>Fetch:</tt> and <tt>Ignore:</tt> except that they give you access
|
---|
310 | to the full power of Regular Expressions to make rules for that to get
|
---|
311 | or not to get. They support perls superset of the normal Unix regular
|
---|
312 | expression syntax. They must be completely specified, including the
|
---|
313 | prefixed m, a delimiter of your choice (except the paired delimiters:
|
---|
314 | parenthesis, brackets and braces), and any of the RE modifiers. I.e.,
|
---|
315 |
|
---|
316 | <pre>
|
---|
317 | Ignore-RE: m/.gif$/i
|
---|
318 | </pre>
|
---|
319 |
|
---|
320 | <p>or
|
---|
321 |
|
---|
322 | <pre>
|
---|
323 | Ignore-RE: m~/.*/.*/.*/~
|
---|
324 | </pre>
|
---|
325 |
|
---|
326 | <p>and so on. "#" cannot be used as delimiter as it is the comment
|
---|
327 | character in the configuration file.
|
---|
328 |
|
---|
329 | <p>There are some traps when using <tt>Ignore-RE</tt> and
|
---|
330 | <tt>Fetch-RE</tt>, please see their documentation in <tt>mandoc
|
---|
331 | w3mir</tt> for a more complete explanation.
|
---|
332 |
|
---|
333 | <hr>
|
---|
334 |
|
---|
335 | <h3><a name="depth">How do I limit how deep w3mir will recurse?</a></h3>
|
---|
336 |
|
---|
337 | <p>W3mir has no explicit mechanism to limit the depth of recursion,
|
---|
338 | but the same result can be achieved with a simple <tt>Ignore</tt> rule:
|
---|
339 |
|
---|
340 | <pre>
|
---|
341 | Ignore: /*/*/*/*/*/*/
|
---|
342 | </pre>
|
---|
343 |
|
---|
344 | <p>This will ignore any URLs that contain at least 7 slashes ("/").
|
---|
345 | Note that a URL contains three slashes that does not have anything to
|
---|
346 | do with depth:
|
---|
347 |
|
---|
348 | <pre>
|
---|
349 | http://www.ifi.uio.no/
|
---|
350 | </pre>
|
---|
351 |
|
---|
352 | <p>so only the surplus slashes are used for depth in this match. In the
|
---|
353 | example above the limit is 4 levels from the top. The
|
---|
354 | <tt>Ignore:</tt> rule that is used to limit recursion depth must be
|
---|
355 | listed before any <tt>Fetch:</tt> rules to be effective.
|
---|
356 |
|
---|
357 | <hr>
|
---|
358 |
|
---|
359 | <h3><a name="memory">How do I limit w3mirs memory usage?</a></h3>
|
---|
360 |
|
---|
361 | <p>In a mirror consisting of <em>many</em> files, such as a archive of
|
---|
362 | an active mailinglist w3mir will build a very large referer table, in
|
---|
363 | part for w3mir to use in the <tt>Referer:</tt> header and in part for
|
---|
364 | w3mfix to use in fixing references.
|
---|
365 |
|
---|
366 | <p>If you disable both the <tt>Referer:</tt> header and don't use
|
---|
367 | w3mfix w3mir will not build a referer table. You do this in the
|
---|
368 | configuration file:
|
---|
369 |
|
---|
370 | <pre>
|
---|
371 | Disable-headers: referer
|
---|
372 | Fixup: off
|
---|
373 | </pre>
|
---|
374 |
|
---|
375 | <p>Please note the potential problems of turning off fixup described
|
---|
376 | earlier in this howto. There are normaly no problems associated with
|
---|
377 | simple sites, but if there are redirects fixup <em>is</em> needed for
|
---|
378 | a consistent mirror.
|
---|
379 |
|
---|
380 | <hr>
|
---|
381 |
|
---|
382 | <h3><a name="rm">How do I remove the files are no longer on the
|
---|
383 | original site from the mirror?</a></h3>
|
---|
384 |
|
---|
385 | <p>Over time the site you mirror will add files, and quite possibly
|
---|
386 | remove files. Or you might introduce new <tt>Ignore:</tt> rules after
|
---|
387 | establishing the mirror that reduces the files wanted in the mirror.
|
---|
388 |
|
---|
389 | <p>By default w3mir will not delete such old files, some people might
|
---|
390 | want to keep the files even if they are removed from the original
|
---|
391 | site. To remove the old/unwanted files you add 'remove' to the
|
---|
392 | <tt>Options:</tt> line.
|
---|
393 |
|
---|
394 | <hr>
|
---|
395 |
|
---|
396 | <h3><a name="multi">How do I copy files from multiple sites?</a></h3>
|
---|
397 |
|
---|
398 | <p>In the answer to the previous question we see how to mirror several
|
---|
399 | related sites. For example, say you want to mirror all my home-pages
|
---|
400 | into one mirror:
|
---|
401 |
|
---|
402 | <pre>
|
---|
403 | Option: recurse
|
---|
404 | URL: http://www.math.uio.no/~janl/ math/janl
|
---|
405 | Also: http://www.math.uio.no/drift/personer/ math/drift
|
---|
406 | Also: http://www.ifi.uio.no/~janl/ ifi/janl
|
---|
407 | Also: http://www.mi.uib.no/~nicolai/ math-uib/nicolai
|
---|
408 | </pre>
|
---|
409 |
|
---|
410 | <p>As in the previous example this will only get documents that are
|
---|
411 | referenced. Any documents that are stored at these location but to
|
---|
412 | which w3mir finds no references will not be retrieved. So this will
|
---|
413 | fail if the sites are not in any way related, or if you wanted
|
---|
414 | <em>everything</em> stored at each site.
|
---|
415 |
|
---|
416 | <p>To mirror unrelated sites, or get it all you may specify that the
|
---|
417 | given URL should be considered a starting-point as well:
|
---|
418 |
|
---|
419 | <pre>
|
---|
420 | Also-quene: http://www.math.uio.no/drift/personer/ math/drift
|
---|
421 | </pre>
|
---|
422 |
|
---|
423 | <p>and, if you want to add an additional starting-point within a already
|
---|
424 | named site:
|
---|
425 |
|
---|
426 | <pre>
|
---|
427 | Quene: http://www.math.uio.no/drift/personer/foo.html
|
---|
428 | </pre>
|
---|
429 |
|
---|
430 | <p>Armed with that you should be able to get pretty much anything you
|
---|
431 | like.
|
---|
432 |
|
---|
433 | <hr>
|
---|
434 |
|
---|
435 | <h3><a name="alias">How do I copy files from one server with several
|
---|
436 | names?</a></h3>
|
---|
437 |
|
---|
438 | <p>Simple, the same way you mirror several servers with different
|
---|
439 | names. The math department at University of Oslo has a web server
|
---|
440 | known under two names: math-www.uio.no and www.math.uio.no, and both
|
---|
441 | names are used in documents stored on it. To copy the whole server,
|
---|
442 | one time only, give these URL and Also lines:
|
---|
443 |
|
---|
444 | <pre>
|
---|
445 | URL: http://www.math.uio.no/ .
|
---|
446 | Also: http://math-www.uio.no/ .
|
---|
447 | </pre>
|
---|
448 |
|
---|
449 | <p>Note the period/dot (.) at the end of each line. It means that
|
---|
450 | w3mir will store the files in the current directory, i.e. documents
|
---|
451 | from both servers will be stored in the same place. But since w3mir
|
---|
452 | asks to only get documents that are newer than the ones it already has
|
---|
453 | any document gotten from the server under the www.math.uio.no name
|
---|
454 | will not be gotten from the math-www.uio.no name as well. ... w3mir
|
---|
455 | will ask for the document, but the server will tell w3mir that its
|
---|
456 | copy is current and there will be no additional transfer of the
|
---|
457 | document.
|
---|
458 |
|
---|
459 | <hr>
|
---|
460 |
|
---|
461 | <h3><a name="enlarge">How do I enlarge or prune an established
|
---|
462 | mirror?</a></h3>
|
---|
463 |
|
---|
464 | <p>This only works if you use a configuration file.
|
---|
465 |
|
---|
466 | <p>If you want to add a site or directory to a mirror you simply add
|
---|
467 | the needed <tt>Also:</tt> or <tt>Also-Quene:</tt> to the configuration
|
---|
468 | file and then you run w3mfix manually, with the -editref option. If,
|
---|
469 | you for example have established a mirror of my home-pages, but want to
|
---|
470 | add my wife's home-page you add this
|
---|
471 |
|
---|
472 | <pre>
|
---|
473 | Also: http://www.ifi.uio.no/~annen/ ifi/annen
|
---|
474 | </pre>
|
---|
475 |
|
---|
476 | <p>to the configuration shown earlier. Then you run w3mfix, and you want
|
---|
477 | it to fix all URLs referencing her home-page, the distinguishing
|
---|
478 | characteristic is the name 'annen':
|
---|
479 |
|
---|
480 | <pre>
|
---|
481 | w3mfix -editref annen
|
---|
482 | </pre>
|
---|
483 |
|
---|
484 | <p>but
|
---|
485 |
|
---|
486 | <pre>
|
---|
487 | w3mirx -editref http://www.ifi.uio.no/~annen/
|
---|
488 | </pre>
|
---|
489 |
|
---|
490 | <p>would work too, but it's a lot more to type. This fixes all the
|
---|
491 | references to her home-page so that they point to the mirror instead of
|
---|
492 | the original pages.
|
---|
493 |
|
---|
494 | <p>To prune (cut out something) a mirror you do the same. Make the
|
---|
495 | change in the configuration file and run 'w3mfix -editme ...' to fix
|
---|
496 | the references to that which you removed.
|
---|
497 |
|
---|
498 | <hr>
|
---|
499 |
|
---|
500 | <h3><a name="cat">How do I 'cat' a file?</a></h3>
|
---|
501 |
|
---|
502 | <p>W3mir will output the fetched document to its standard output
|
---|
503 | (normally your screen/window) if you specify the '-s' command line
|
---|
504 | option. The corresponding configuration file directive is
|
---|
505 |
|
---|
506 | <pre>
|
---|
507 | File-Disposition: stdout
|
---|
508 | </pre>
|
---|
509 |
|
---|
510 | <hr>
|
---|
511 |
|
---|
512 | <h3><a name="list">How do I list URLs in a document?</a></h3>
|
---|
513 |
|
---|
514 | <p>To list the URLs in http://www.math.uio.no/:
|
---|
515 |
|
---|
516 | <pre>
|
---|
517 | w3mir -q -f -l http://www.math.uio.no/
|
---|
518 | </pre>
|
---|
519 |
|
---|
520 | <p>The <tt>-q</tt> switch causes w3mir to produce no other output
|
---|
521 | which would disturb the URL listing. The <tt>-f</tt> switch tells
|
---|
522 | w3mir to forget the document once it has been analyzed, i.e., not save
|
---|
523 | it on disk. And finally, the <tt>-l</tt> switch makes w3mir list the
|
---|
524 | URLs in the document. You may combine <tt>-l</tt> with <tt>-r</tt>
|
---|
525 | and you need not use it with <tt>-f</tt>.
|
---|
526 |
|
---|
527 | <p>In the configuration file you put <tt>list</tt> on the
|
---|
528 | <tt>Options:</tt> line.
|
---|
529 |
|
---|
530 | <hr>
|
---|
531 |
|
---|
532 | <h3><a name="aborted">How to I restart a mirror process after stopping
|
---|
533 | it prematurely?</a></h3>
|
---|
534 |
|
---|
535 | <p>You may just rerun the same command once more. But that makes
|
---|
536 | w3mir request all the documents you have already once more to see if a
|
---|
537 | more recent version is available on the server. You can save time by
|
---|
538 | using the <tt>-fs</tt> (Fetch Some) option. This makes w3mir only
|
---|
539 | request documents it does not find on your disk. E.g.:
|
---|
540 |
|
---|
541 | <p><tt>w3mir -fs -r http://www.starwars.com/</tt>
|
---|
542 |
|
---|
543 | <p>This is not something you would normally put in the configuration
|
---|
544 | file, but you can, by adding 'only-nonexistent' on the 'Options:' line.
|
---|
545 |
|
---|
546 | <hr>
|
---|
547 |
|
---|
548 | <h3><a name="robots">How do I disable robots.txt obedience?</a></h3>
|
---|
549 |
|
---|
550 | <p>Normally w3mir will read and obey each sites robots.txt file,
|
---|
551 | because w3mir wants to be a nice tool. However robots.txt was designed
|
---|
552 | with something slightly different than the normal use of w3mir in
|
---|
553 | mind, so if you want w3mir to disregard the robot rules you can use
|
---|
554 | <tt>-drr</tt> (Disable Robot Rules) on the command-line, or the line
|
---|
555 |
|
---|
556 | <pre>
|
---|
557 | Robot-Rules: off
|
---|
558 | </pre>
|
---|
559 |
|
---|
560 | <p>in the configuration file. The robot exclusion standard is
|
---|
561 | described in <a
|
---|
562 | href="http://info.webcrawler.com/mak/projects/robots/norobots.html">http://info.webcrawler.com/mak/projects/robots/norobots.htm</a>.
|
---|
563 |
|
---|
564 | <hr>
|
---|
565 |
|
---|
566 | <h3><a name="corrupt">How do I stop w3mir from corrupting binary
|
---|
567 | files?</a></h3>
|
---|
568 |
|
---|
569 | <p>During the normal course of events w3mir converts the newline
|
---|
570 | format of fetched HTML documents to your systems native newline
|
---|
571 | format. On Unix a newline consists of a single ASCII LF character, on
|
---|
572 | Macintoshes it's a single ASCII CR character and on Dos/Windows it's a
|
---|
573 | ASCII CR/LF pair. W3mir understands all these and all HTML files are
|
---|
574 | saved in the format your operating system prefers.
|
---|
575 |
|
---|
576 | <p>If, and this is very unlikely, a web server identifies a binary
|
---|
577 | file as HTML w3mir will very likely corrupt the file. If you discover
|
---|
578 | a file which is obviously ruined in the mirror, but is not ruined when
|
---|
579 | you view it on the original site do this:
|
---|
580 |
|
---|
581 | <ol>
|
---|
582 |
|
---|
583 | <li>Notify the webmaster on the original site that the file has the
|
---|
584 | wrong MIME type
|
---|
585 |
|
---|
586 | <li>Use the <tt>-nnc</tt> (No Newline Conversion) option on the
|
---|
587 | command line, or
|
---|
588 |
|
---|
589 | <pre>
|
---|
590 | Options: no-newline-conv
|
---|
591 | </pre>
|
---|
592 |
|
---|
593 | in the configuration file.
|
---|
594 |
|
---|
595 | <li>Remove the corrupt file(s).
|
---|
596 |
|
---|
597 | <li>Run "<tt>w3mir -fs</tt>...", to fetch only the deleted file(s)
|
---|
598 | again.
|
---|
599 |
|
---|
600 | </ol>
|
---|
601 |
|
---|
602 | <hr>
|
---|
603 |
|
---|
604 | <h3><a name="auth">How do I copy a site that wants user-name and
|
---|
605 | password?</a></h3>
|
---|
606 |
|
---|
607 | <p>This can only be done with a configuration file. Being able to
|
---|
608 | give this on the command-line would give the user-name and password away
|
---|
609 | to other users of the system, so the ability to give authentication
|
---|
610 | information that way has not been put in w3mir.
|
---|
611 |
|
---|
612 | <p>In the configuration file you put:
|
---|
613 |
|
---|
614 | <pre>
|
---|
615 | Auth-domain: */*
|
---|
616 | Auth-user: me
|
---|
617 | Auth-passwd: my-password
|
---|
618 | </pre>
|
---|
619 |
|
---|
620 | <p>This will cause w3mir to give the user-name and password each time
|
---|
621 | the server asks. There is no way to make w3mir give the user-name and
|
---|
622 | password each time no matter if the server asks or not.
|
---|
623 |
|
---|
624 | <hr>
|
---|
625 |
|
---|
626 | <h3><a name="mauth">How do I access a site that wants several
|
---|
627 | different user-names and passwords?</a></h3>
|
---|
628 |
|
---|
629 | <p>If you have several user-names and passwords across
|
---|
630 | the server(s) that are copied you need a slightly more advanced
|
---|
631 | version of this that associates each user-name/password with a
|
---|
632 | authentication "domain". "Domain" is a HTTP concept. It is simply a
|
---|
633 | grouping of files and documents within a "realm". One file or a whole
|
---|
634 | directory hierarchy can belong to a realm. One server may have many
|
---|
635 | realms. A user may have separate passwords for each realm, or the
|
---|
636 | same password for all the realms the user has access to. A
|
---|
637 | combination of a server name, server port and a realm is called a
|
---|
638 | domain.
|
---|
639 |
|
---|
640 | <pre>
|
---|
641 | Auth-domain: theserver:theport/therealm
|
---|
642 | Auth-user: me
|
---|
643 | Auth-passwd: my-password
|
---|
644 |
|
---|
645 | Auth-domain: theserver:theport/otherrealm
|
---|
646 | Auth-user: other-me
|
---|
647 | Auth-password: other-password
|
---|
648 | </pre>
|
---|
649 |
|
---|
650 | W3mir will tell you what the name of the realm is if it is unable to
|
---|
651 | authenticate itself with the server. You may also use '*' as the realm
|
---|
652 | name if you only copy documents from one realm on that server.
|
---|
653 |
|
---|
654 | <hr>
|
---|
655 |
|
---|
656 | <h3><a name="proxy">How do I use a proxy server?</a></h3>
|
---|
657 |
|
---|
658 | <p>On some secured sites you have to access the Internet through proxy
|
---|
659 | servers to get out of the internal network.
|
---|
660 |
|
---|
661 | <p>A proxy server has a host name, and a port you must use. On the
|
---|
662 | command line you simply specify <tt>-P proxy-host-name:proxy-port</tt>. In
|
---|
663 | the configuration file you put this:
|
---|
664 |
|
---|
665 | <pre>
|
---|
666 | HTTP-Proxy: proxy-host-name:proxyport
|
---|
667 | </pre>
|
---|
668 |
|
---|
669 | <p>The main advantage of working through proxy servers other than
|
---|
670 | security is that you take advantage of any caching the proxy server
|
---|
671 | which can speed up retrievals enormously.
|
---|
672 |
|
---|
673 | <p>Another use of the proxy option is to "prime" the proxy servers
|
---|
674 | cache. I.e. you can use w3mir to fetch the documents through the proxy
|
---|
675 | server to ensure that the documents are cached there later when you
|
---|
676 | want to read them with your browser. If you also specify
|
---|
677 |
|
---|
678 | <pre>
|
---|
679 | File-Disposition: forget
|
---|
680 | </pre>
|
---|
681 |
|
---|
682 | <p>it won't even use any space on your disk, w3mir will just process
|
---|
683 | the documents looking for URLs and then <em>not</em> save them.
|
---|
684 |
|
---|
685 | <hr>
|
---|
686 |
|
---|
687 | <h3><a name="pauth">How do I authenticate myself to a proxy
|
---|
688 | server?</a></h3>
|
---|
689 |
|
---|
690 | <p>Some proxy servers demands a user-name and password to let you use
|
---|
691 | them. W3mir does not support the domain concept in connection with
|
---|
692 | proxy authentication because the author cannot imagine that it will be
|
---|
693 | needed. You need to put this in your configuration file:
|
---|
694 |
|
---|
695 | <pre>
|
---|
696 | HTTP-Proxy-user: proxy-username
|
---|
697 | HTTP-Proxy-passwd: proxy-password
|
---|
698 | </pre>
|
---|
699 |
|
---|
700 | <hr>
|
---|
701 |
|
---|
702 | <h3><a name="proxytweak">How do I ensure that the proxy server
|
---|
703 | ...?</a></h3>
|
---|
704 |
|
---|
705 | <p>HTTP/1.0 proxy servers may be told to not use its current copy of
|
---|
706 | a document if you specify the <tt>-pflush</tt> command-line option. Or
|
---|
707 |
|
---|
708 | <pre>
|
---|
709 | Proxy-Options: refresh
|
---|
710 | </pre>
|
---|
711 |
|
---|
712 | <p>in the configuration file. This is useful if the proxy has an old
|
---|
713 | copy of some document and does not realize that a newer version exists
|
---|
714 | on the origin site. W3mir uses the HTTP/1.0 version of this command
|
---|
715 | by default. You can force w3mir to use the HTTP/1.1 version by adding
|
---|
716 | <tt>no-pragma</tt> to the line. If you do this it will not work at
|
---|
717 | all as you want unless the server knows the HTTP/1.1 protocol.
|
---|
718 |
|
---|
719 |
|
---|
720 | <p>HTTP/1.1 proxy servers can be manipulated in a few more ways. The
|
---|
721 | configuration file <tt>Proxy-Options:</tt> directive also takes
|
---|
722 | <tt>revalidate</tt> and <tt>no-store</tt> options. The former tells
|
---|
723 | the proxy server to check if there is any newer version available.
|
---|
724 | This is, in principle, more network friendly than the <tt>refresh</tt>
|
---|
725 | option since it will only cause a copy if there is a newer file
|
---|
726 | available. The <tt>no-store</tt> option tells the proxy server to not
|
---|
727 | store the documents you transfer. This might be useful if the
|
---|
728 | documents are 'sensitive' or something like that, but if the proxy
|
---|
729 | server does not understand HTTP/1.1 it will not obey this option, and
|
---|
730 | it might store the document anyway because the functionality is not
|
---|
731 | implemented, so you should not count on this to work.
|
---|
732 |
|
---|
733 | <hr>
|
---|
734 |
|
---|
735 | <h3><a name="batchget">How do I batch get files with w3mir?</a></h3>
|
---|
736 |
|
---|
737 | <p>Normally when fetching files w3mir will process each html (and PDF)
|
---|
738 | file to find URLs in them for further retrievals. This is
|
---|
739 | time-consuming, and not always wanted. Sometimes you simply want to
|
---|
740 | get a file, or more, and save it, untouched:
|
---|
741 |
|
---|
742 | <pre>
|
---|
743 | w3mir -B http://www.starwars.com/ http://www.ifi.uio.no/~janl/
|
---|
744 | </pre>
|
---|
745 |
|
---|
746 | <p>There is a companion switch for <tt>-B</tt>, namely <tt>-I</tt>, it
|
---|
747 | makes w3mir read URLs from its standard input, one pr. line. Thus you
|
---|
748 | can use w3mir in a pipe to batch get several files whose URLs you find
|
---|
749 | in some way. This is a stupid example:
|
---|
750 |
|
---|
751 | <pre>
|
---|
752 | w3mir -q -l -f http://www.ifi.uio.no/ | w3mir -I -B
|
---|
753 | </pre>
|
---|
754 |
|
---|
755 | <p><tt>-B</tt> may also be used with <tt>-r</tt>, but the only effect
|
---|
756 | it will have then is to save the html files unchanged on disk, because
|
---|
757 | to recurse w3mir <em>has</em> to examine all the html the documents
|
---|
758 | for URLs.
|
---|
759 |
|
---|
760 | <p><b>Please note</b> that using <tt>-B</tt> combined with <tt>-r</tt>
|
---|
761 | for mirroring will probably lead to a unstable mirror, because w3mir
|
---|
762 | does not get a chance to manipulate the URLs in the documents as it
|
---|
763 | needs to be able to maintain a mirror later, and most important of
|
---|
764 | all, w3mir needs all html files to contain a <HTML> tag to be
|
---|
765 | able to recognize a HTML file as a HTML file. When running with the
|
---|
766 | <tt>-B</tt> switch w3mir will not ensure the presence of this and thus
|
---|
767 | we must rely on the original documents author to be nice. This is a
|
---|
768 | bad bet. In other words, <b>don't use <tt>-B</tt> for recursive
|
---|
769 | mirroring</b>, only for batch copying/mirroring of single documents.
|
---|
770 |
|
---|
771 | <hr>
|
---|
772 |
|
---|
773 | <h3><a name="cgi">How do I handle CGI?</a></h3>
|
---|
774 |
|
---|
775 | <p>There is no way w3mir can duplicate the process that happens on the
|
---|
776 | Web server when it comes to CGI. For some CGI programs w3mir can
|
---|
777 | simply copy the output and store on disk. For other CGI programs this
|
---|
778 | is not possible, and the only way out is to make w3mir not get the
|
---|
779 | involved files using Ignore rules in the configuration file. These
|
---|
780 | will avoid a lot of cgi programs:
|
---|
781 |
|
---|
782 | <pre>
|
---|
783 | Ignore: *.cgi
|
---|
784 | Ignore: *-cgi
|
---|
785 | </pre>
|
---|
786 |
|
---|
787 | <p>You might have to add other/more rules for some sites if they have
|
---|
788 | other naming conventions or if it's simply impossible to tell from the
|
---|
789 | file-name if it's a CGI or not.
|
---|
790 |
|
---|
791 | <p>When you add ignore rules this causes two things:
|
---|
792 |
|
---|
793 | <ol>
|
---|
794 | <li><p>W3mir will not retrieve documents matching the rules
|
---|
795 | <li><p>W3mir will make all references to matching documents point to
|
---|
796 | the site you mirrored from instead of pointing to a non-existent
|
---|
797 | file in the mirror.
|
---|
798 | </ol>
|
---|
799 |
|
---|
800 | <hr>
|
---|
801 |
|
---|
802 | <h3><a name="imap">How do I handle server side image-maps?</a></h3>
|
---|
803 |
|
---|
804 | <p>Server side image-maps is yet another thing it's impossible for
|
---|
805 | w3mir to relate to. w3mir simply cannot handle them. Put ignore
|
---|
806 | rules in the configuration file:
|
---|
807 |
|
---|
808 | <pre>
|
---|
809 | Ignore: *.map
|
---|
810 | </pre>
|
---|
811 |
|
---|
812 | <p>W3mir has full support for client side image-maps though.
|
---|
813 |
|
---|
814 | <hr>
|
---|
815 |
|
---|
816 | <h3><a name="java">How do I handle Java and ActiveX?</a></h3>
|
---|
817 |
|
---|
818 | <p>Java and Active X objects are are included in html pages with a
|
---|
819 | <tt><OBJECT></tt> or <tt><APPLET></tt> tag. W3mir can
|
---|
820 | handle these on one condition: The CODEBASE attribute names the
|
---|
821 | directory where the program stores its resources (such as
|
---|
822 | subprograms, graphic files, sound, text, and so on) and w3mir must
|
---|
823 | have read access to this directory. Otherwise w3mir is without hope,
|
---|
824 | it's impossible to extract the name of the resources the program needs
|
---|
825 | in any reliable way.
|
---|
826 |
|
---|
827 | <p>HTML4 supports a attribute that enumerates the resources the
|
---|
828 | program needs, w3mir is not able to use this yet.
|
---|
829 |
|
---|
830 | <hr>
|
---|
831 |
|
---|
832 | <h3><a name="script">How do I handle java-script and other script
|
---|
833 | languages?</a></h3>
|
---|
834 |
|
---|
835 | <p>W3mir does its best to pass scripts (java-script, perl-script,
|
---|
836 | etc...) embedded in the HTML undamaged. It cannot, however, extract
|
---|
837 | any URLs the script generates and the browser would cause the document
|
---|
838 | to refer to or embed in a page.
|
---|
839 |
|
---|
840 | <p>It will however work if the script generates relative references
|
---|
841 | and there is some other way for w3mir to access the referenced file in
|
---|
842 | some other manner. Or if the script generates absolute references and
|
---|
843 | the person browsing the mirror has access to the site named, then the
|
---|
844 | user will be able to browse the referenced documents via that other
|
---|
845 | server.
|
---|
846 |
|
---|
847 | <hr>
|
---|
848 |
|
---|
849 | <h3><a name="css">How to I handle the other things with 'partial
|
---|
850 | support'</a></h3>
|
---|
851 |
|
---|
852 | <p>W3mir has partial support for CSS. This means that
|
---|
853 | <tt><style></tt> tags and the enclosed style data are passed
|
---|
854 | undamaged by w3mir. W3mir will also retrieve the external CSSes named
|
---|
855 | in HTML documents. But w3mir will <em>not</em> (yet) analyze the
|
---|
856 | CSSes data to find URLs of other resources (such as fonts) named in
|
---|
857 | these.
|
---|
858 |
|
---|
859 | <p>W3mir also has partial support for Adobe Acrobat (PDF) files. This
|
---|
860 | means that w3mir can extract URLs from PDF files, and get the named
|
---|
861 | documents if you want them. But w3mir cannot edit those URLs so that
|
---|
862 | the PDF files point to the mirror instead of wherever on the original
|
---|
863 | site they were pointing. If the PDF files contain absolute URLs they
|
---|
864 | will continue pointing to where they were pointing before. However,
|
---|
865 | if the PDF files contain relative references things will work out.
|
---|
866 |
|
---|
867 | <p>The reason that URLs in PDF files cannot be edited is that they are
|
---|
868 | binary and contain byte pointers. If the URLs length is changed the
|
---|
869 | byte pointers will point to the wrong place in the document. Writing
|
---|
870 | code to correct these pointers would be quite complex. But if you
|
---|
871 | write it I will use it.
|
---|
872 |
|
---|
873 | <hr>
|
---|
874 |
|
---|
875 | <h3><a name="anon">How do I keep my identity secret?</a></h3>
|
---|
876 |
|
---|
877 | <p>The HTTP protocol has a header, <tt>User:</tt> which is recommended
|
---|
878 | to use by robots, such as w3mir. Another way to track you is looking
|
---|
879 | at the 'Referer:' header w3mir gives in HTTP requests. Both can be
|
---|
880 | disabled:
|
---|
881 |
|
---|
882 | <pre>
|
---|
883 | Disable-headers: referer, user
|
---|
884 | </pre>
|
---|
885 |
|
---|
886 | <p>If you in addition use a proxy server that many other users use
|
---|
887 | there is little probability you can be tracked (easily) by the server
|
---|
888 | you are copying things from. You are however much easier to track
|
---|
889 | from the logs in the proxy server. And a court order is quite likely
|
---|
890 | to get you tracked in spite of any precautions you take.
|
---|
891 |
|
---|
892 | <p>W3mir does not support cookies and thus you cannot be tracked with
|
---|
893 | the help of that mechanism.
|
---|
894 |
|
---|
895 | <hr>
|
---|
896 |
|
---|
897 | <h3><a name="ns">How do I pretend that I'm using Netscape, Internet
|
---|
898 | Explorer or Lynx?</a></h3>
|
---|
899 |
|
---|
900 | <p>Some web sites give you different documents when you ask for a
|
---|
901 | specific URL based on what browser you use, or even what OS you appear
|
---|
902 | to be using. w3mir identifies itself with a string that looks like
|
---|
903 | this:
|
---|
904 |
|
---|
905 | <p><tt>w3mir/<em>version</em>-<em>release-date</em></tt>
|
---|
906 |
|
---|
907 | <p>Netscape identifies itself with strings that look something like
|
---|
908 | this:
|
---|
909 |
|
---|
910 | <p><tt>Mozilla/3.01 (X11; I; Linux 2.0.30 i586)</tt>
|
---|
911 |
|
---|
912 | <p>and Internet Explorer says it's something like this:
|
---|
913 |
|
---|
914 | <p><tt>Mozilla/2.0 (compatible; MSIE 3.02; Windows NT)</tt>
|
---|
915 |
|
---|
916 | <p>and Lynx says something like this
|
---|
917 |
|
---|
918 | <p><tt>Lynx/2.6 libwww-FM/2.14</tt>
|
---|
919 |
|
---|
920 | <p>You can change w3mirs identification with <tt>-agent 'string'</tt>
|
---|
921 | on the command line. In the configuration file you put
|
---|
922 |
|
---|
923 | <pre>
|
---|
924 | Agent: Mozilla/3.01 (X11; I; Linux 2.0.30 i586)
|
---|
925 | </pre>
|
---|
926 |
|
---|
927 | <p>to pretend w3mir is netscape 3.01.
|
---|
928 |
|
---|
929 | <hr>
|
---|
930 |
|
---|
931 | <h3><a name="other">How do I do other things?</a></h3>
|
---|
932 |
|
---|
933 | <p>This document is by no means a complete list of the things you can
|
---|
934 | do with w3mir. The w3mir man page (<tt>man w3mir</tt> or <tt>perldoc
|
---|
935 | w3mir</tt> lists more things, and goes into more detail of how things
|
---|
936 | work so you can use the knowledge to do neat things. There are
|
---|
937 | several things mentioned only in the man-page that helps you with
|
---|
938 | tricky multi-server mirroring, and gives you better control of what to
|
---|
939 | get and not to get and under what name to save it on disk. And a
|
---|
940 | couple of other things...
|
---|
941 |
|
---|
942 | <hr>
|
---|
943 | <address>Nicolai Langfeldt 9/7/1998</address>
|
---|