source: gsdl/trunk/build-src/packages/w3mir/libwww-perl-5.36/lwpcook.pod@ 16571

Last change on this file since 16571 was 720, checked in by davidb, 25 years ago

added w3mir package

  • Property svn:keywords set to Author Date Id Revision
File size: 9.1 KB
Line 
1=head1 NAME
2
3lwpcook - libwww-perl cookbook
4
5=head1 DESCRIPTION
6
7This document contain some examples that show typical usage of the
8libwww-perl library. You should consult the documentation for the
9individual modules for more detail.
10
11All examples should be runnable programs. You can, in most cases, test
12the code sections by piping the program text directly to perl.
13
14
15
16=head1 GET
17
18It is very easy to use this library to just fetch documents from the
19net. The LWP::Simple module provides the get() function that return
20the document specified by its URL argument:
21
22 use LWP::Simple;
23 $doc = get 'http://www.sn.no/libwww-perl/';
24
25or, as a perl one-liner using the getprint() function:
26
27 perl -MLWP::Simple -e 'getprint "http://www.sn.no/libwww-perl/"'
28
29or, how about fetching the latest perl by running this command:
30
31 perl -MLWP::Simple -e '
32 getstore "ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz",
33 "perl.tar.gz"'
34
35You will probably first want to find a CPAN site closer to you by
36running something like the following command:
37
38 perl -MLWP::Simple -e 'getprint "http://www.perl.com/perl/CPAN/CPAN.html"'
39
40Enough of this simple stuff! The LWP object oriented interface gives
41you more control over the request sent to the server. Using this
42interface you have full control over headers sent and how you want to
43handle the response returned.
44
45 use LWP::UserAgent;
46 $ua = new LWP::UserAgent;
47 $ua->agent("$0/0.1 " . $ua->agent);
48 # $ua->agent("Mozilla/8.0") # pretend we are very capable browser
49
50 $req = new HTTP::Request 'GET' => 'http://www.sn.no/libwww-perl';
51 $req->header('Accept' => 'text/html');
52
53 # send request
54 $res = $ua->request($req);
55
56 # check the outcome
57 if ($res->is_success) {
58 print $res->content;
59 } else {
60 print "Error: " . $res->status_line . "\n";
61 }
62
63The lwp-request program (alias GET) that is distributed with the
64library can also be used to fetch documents from WWW servers.
65
66
67
68=head1 HEAD
69
70If you just want to check if a document is present (i.e. the URL is
71valid) try to run code that looks like this:
72
73 use LWP::Simple;
74
75 if (head($url)) {
76 # ok document exists
77 }
78
79The head() function really returns a list of meta-information about
80the document. The first three values of the list returned are the
81document type, the size of the document, and the age of the document.
82
83More control over the request or access to all header values returned
84require that you use the object oriented interface described for GET
85above. Just s/GET/HEAD/g.
86
87
88=head1 POST
89
90There is no simple procedural interface for posting data to a WWW server. You
91must use the object oriented interface for this. The most common POST
92operation is to access a WWW form application:
93
94 use LWP::UserAgent;
95 $ua = new LWP::UserAgent;
96
97 my $req = new HTTP::Request 'POST','http://www.perl.com/cgi-bin/BugGlimpse';
98 $req->content_type('application/x-www-form-urlencoded');
99 $req->content('match=www&errors=0');
100
101 my $res = $ua->request($req);
102 print $res->as_string;
103
104Lazy people use the HTTP::Request::Common module to set up a suitable
105POST request message (it handles all the escaping issues) and has a
106suitable default for the content_type:
107
108 use HTTP::Request::Common qw(POST);
109 use LWP::UserAgent;
110 $ua = new LWP::UserAgent;
111
112 my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse',
113 [ search => 'www', errors => 0 ];
114
115 print $ua->request($req)->as_string;
116
117The lwp-request program (alias POST) that is distributed with the
118library can also be used for posting data.
119
120
121
122=head1 PROXIES
123
124Some sites use proxies to go through fire wall machines, or just as
125cache in order to improve performance. Proxies can also be used for
126accessing resources through protocols not supported directly (or
127supported badly :-) by the libwww-perl library.
128
129You should initialize your proxy setting before you start sending
130requests:
131
132 use LWP::UserAgent;
133 $ua = new LWP::UserAgent;
134 $ua->env_proxy; # initialize from environment variables
135 # or
136 $ua->proxy(ftp => 'http://proxy.myorg.com');
137 $ua->proxy(wais => 'http://proxy.myorg.com');
138 $ua->no_proxy(qw(no se fi));
139
140 my $req = new HTTP::Request 'wais://xxx.com/';
141 print $ua->request($req)->as_string;
142
143The LWP::Simple interface will call env_proxy() for you automatically.
144Applications that use the $ua->env_proxy() method will normally not
145use the $ua->proxy() and $ua->no_proxy() methods.
146
147Some proxies also require that you send it a username/password in
148order to let requests through. You should be able to add the
149required header, with something like this:
150
151 use LWP::UserAgent;
152
153 $ua = new LWP::UserAgent;
154 $ua->proxy(['http', 'ftp'] => 'http://proxy.myorg.com');
155
156 $req = new HTTP::Request 'GET',"http://www.perl.com";
157 $req->proxy_authorization_basic("proxy_user", "proxy_password");
158
159 $res = $ua->request($req);
160 print $res->content if $res->is_success;
161
162Replace C<proxy.myorg.com>, C<proxy_user> and
163C<proxy_password> with something suitable for your site.
164
165
166=head1 ACCESS TO PROTECTED DOCUMENTS
167
168Documents protected by basic authorization can easily be accessed
169like this:
170
171 use LWP::UserAgent;
172 $ua = new LWP::UserAgent;
173 $req = new HTTP::Request GET => 'http://www.sn.no/secret/';
174 $req->authorization_basic('aas', 'mypassword');
175 print $ua->request($req)->as_string;
176
177The other alternative is to provide a subclass of I<LWP::UserAgent> that
178overrides the get_basic_credentials() method. Study the I<lwp-request>
179program for an example of this.
180
181
182=head1 MIRRORING
183
184If you want to mirror documents from a WWW server, then try to run
185code similar to this at regular intervals:
186
187 use LWP::Simple;
188
189 %mirrors = (
190 'http://www.sn.no/' => 'sn.html',
191 'http://www.perl.com/' => 'perl.html',
192 'http://www.sn.no/libwww-perl/' => 'lwp.html',
193 'gopher://gopher.sn.no/' => 'gopher.html',
194 );
195
196 while (($url, $localfile) = each(%mirrors)) {
197 mirror($url, $localfile);
198 }
199
200Or, as a perl one-liner:
201
202 perl -MLWP::Simple -e 'mirror("http://www.perl.com/", "perl.html")';
203
204The document will not be transfered unless it has been updated.
205
206
207
208=head1 LARGE DOCUMENTS
209
210If the document you want to fetch is too large to be kept in memory,
211then you have two alternatives. You can instruct the library to write
212the document content to a file (second $ua->request() argument is a file
213name):
214
215 use LWP::UserAgent;
216 $ua = new LWP::UserAgent;
217
218 my $req = new HTTP::Request 'GET',
219 'http://www.sn.no/~aas/perl/www/libwww-perl-5.00.tar.gz';
220 $res = $ua->request($req, "libwww-perl.tar.gz");
221 if ($res->is_success) {
222 print "ok\n";
223 }
224
225Or you can process the document as it arrives (second $ua->request()
226argument is a code reference):
227
228 use LWP::UserAgent;
229 $ua = new LWP::UserAgent;
230 $URL = 'ftp://ftp.unit.no/pub/rfc/rfc-index.txt';
231
232 my $expected_length;
233 my $bytes_received = 0;
234 $ua->request(HTTP::Request->new('GET', $URL),
235 sub {
236 my($chunk, $res) = @_;
237 $bytes_received += length($chunk);
238 unless (defined $expected_length) {
239 $expected_length = $res->content_length || 0;
240 }
241 if ($expected_length) {
242 printf STDERR "%d%% - ",
243 100 * $bytes_received / $expected_length;
244 }
245 print STDERR "$bytes_received bytes received\n";
246
247 # XXX Should really do something with the chunk itself
248 # print $chunk;
249 });
250
251
252
253=head1 HTML FORMATTING
254
255It is easy to convert HTML code to "readable" text.
256
257 use LWP::Simple;
258 use HTML::Parse;
259 print parse_html(get 'http://www.sn.no/libwww-perl/')->format;
260
261
262
263=head1 PARSE URLS
264
265To access individual elements of a URL, try this:
266
267 use URI::URL;
268 $host = url("http://www.sn.no/")->host;
269
270or
271
272 use URI::URL;
273 $u = url("ftp://ftp.sn.no/test/aas;type=i");
274 print "Protocol scheme is ", $u->scheme, "\n";
275 print "Host is ", $u->host, " at port ", $u->port, "\n";
276
277or even
278
279 use URI::URL;
280 my($host,$port) = (url("ftp://ftp.sn.no/test/aas;type=i")->crack)[3,4];
281
282
283=head1 EXPAND RELATIVE URLS
284
285This code reads URLs and print expanded version.
286
287 use URI::URL;
288 $BASE = "http://www.sn.no/some/place?query";
289 while (<>) {
290 print url($_, $BASE)->abs->as_string, "\n";
291 }
292
293We can expand URLs in an HTML document by using the parser to build a
294tree that we then traverse:
295
296 %link_elements =
297 (
298 'a' => 'href',
299 'img' => 'src',
300 'form' => 'action',
301 'link' => 'href',
302 );
303
304 use HTML::Parse;
305 use URI::URL;
306
307 $BASE = "http://somewhere/root/";
308 $h = parse_htmlfile("xxx.html");
309 $h->traverse(\&expand_urls, 1);
310
311 print $h->as_HTML;
312
313 sub expand_urls
314 {
315 my($e, $start) = @_;
316 return 1 unless $start;
317 my $attr = $link_elements{$e->tag};
318 return 1 unless defined $attr;
319 my $url = $e->attr($attr);
320 return 1 unless defined $url;
321 $e->attr($attr, url($url, $BASE)->abs->as_string);
322 }
323
324
325
326=head1 BASE URL
327
328If you want to resolve relative links in a page you will have to
329determine which base URL to use. The HTTP::Response objects now has a
330base() method.
331
332 $BASE = $res->base;
333
334
335
336=head1 COPYRIGHT
337
338Copyright 1996-1997, Gisle Aas
339
340This library is free software; you can redistribute it and/or
341modify it under the same terms as Perl itself.
342
343
Note: See TracBrowser for help on using the repository browser.