Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

lwpcook.pod@ 16571

Last change on this file since 16571 was 720, checked in by davidb, 25 years ago
added w3mir package
Property svn:keywords set to `Author Date Id Revision`
File size: 9.1 KB

Line
1	=head1 NAME
2
3	lwpcook - libwww-perl cookbook
4
5	=head1 DESCRIPTION
6
7	This document contain some examples that show typical usage of the
8	libwww-perl library. You should consult the documentation for the
9	individual modules for more detail.
10
11	All examples should be runnable programs. You can, in most cases, test
12	the code sections by piping the program text directly to perl.
13
14
15
16	=head1 GET
17
18	It is very easy to use this library to just fetch documents from the
19	net. The LWP::Simple module provides the get() function that return
20	the document specified by its URL argument:
21
22	use LWP::Simple;
23	$doc = get 'http://www.sn.no/libwww-perl/';
24
25	or, as a perl one-liner using the getprint() function:
26
27	perl -MLWP::Simple -e 'getprint "http://www.sn.no/libwww-perl/"'
28
29	or, how about fetching the latest perl by running this command:
30
31	perl -MLWP::Simple -e '
32	getstore "ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz",
33	"perl.tar.gz"'
34
35	You will probably first want to find a CPAN site closer to you by
36	running something like the following command:
37
38	perl -MLWP::Simple -e 'getprint "http://www.perl.com/perl/CPAN/CPAN.html"'
39
40	Enough of this simple stuff! The LWP object oriented interface gives
41	you more control over the request sent to the server. Using this
42	interface you have full control over headers sent and how you want to
43	handle the response returned.
44
45	use LWP::UserAgent;
46	$ua = new LWP::UserAgent;
47	$ua->agent("$0/0.1 " . $ua->agent);
48	# $ua->agent("Mozilla/8.0") # pretend we are very capable browser
49
50	$req = new HTTP::Request 'GET' => 'http://www.sn.no/libwww-perl';
51	$req->header('Accept' => 'text/html');
52
53	# send request
54	$res = $ua->request($req);
55
56	# check the outcome
57	if ($res->is_success) {
58	print $res->content;
59	} else {
60	print "Error: " . $res->status_line . "\n";
61	}
62
63	The lwp-request program (alias GET) that is distributed with the
64	library can also be used to fetch documents from WWW servers.
65
66
67
68	=head1 HEAD
69
70	If you just want to check if a document is present (i.e. the URL is
71	valid) try to run code that looks like this:
72
73	use LWP::Simple;
74
75	if (head($url)) {
76	# ok document exists
77	}
78
79	The head() function really returns a list of meta-information about
80	the document. The first three values of the list returned are the
81	document type, the size of the document, and the age of the document.
82
83	More control over the request or access to all header values returned
84	require that you use the object oriented interface described for GET
85	above. Just s/GET/HEAD/g.
86
87
88	=head1 POST
89
90	There is no simple procedural interface for posting data to a WWW server. You
91	must use the object oriented interface for this. The most common POST
92	operation is to access a WWW form application:
93
94	use LWP::UserAgent;
95	$ua = new LWP::UserAgent;
96
97	my $req = new HTTP::Request 'POST','http://www.perl.com/cgi-bin/BugGlimpse';
98	$req->content_type('application/x-www-form-urlencoded');
99	$req->content('match=www&errors=0');
100
101	my $res = $ua->request($req);
102	print $res->as_string;
103
104	Lazy people use the HTTP::Request::Common module to set up a suitable
105	POST request message (it handles all the escaping issues) and has a
106	suitable default for the content_type:
107
108	use HTTP::Request::Common qw(POST);
109	use LWP::UserAgent;
110	$ua = new LWP::UserAgent;
111
112	my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse',
113	[ search => 'www', errors => 0 ];
114
115	print $ua->request($req)->as_string;
116
117	The lwp-request program (alias POST) that is distributed with the
118	library can also be used for posting data.
119
120
121
122	=head1 PROXIES
123
124	Some sites use proxies to go through fire wall machines, or just as
125	cache in order to improve performance. Proxies can also be used for
126	accessing resources through protocols not supported directly (or
127	supported badly :-) by the libwww-perl library.
128
129	You should initialize your proxy setting before you start sending
130	requests:
131
132	use LWP::UserAgent;
133	$ua = new LWP::UserAgent;
134	$ua->env_proxy; # initialize from environment variables
135	# or
136	$ua->proxy(ftp => 'http://proxy.myorg.com');
137	$ua->proxy(wais => 'http://proxy.myorg.com');
138	$ua->no_proxy(qw(no se fi));
139
140	my $req = new HTTP::Request 'wais://xxx.com/';
141	print $ua->request($req)->as_string;
142
143	The LWP::Simple interface will call env_proxy() for you automatically.
144	Applications that use the $ua->env_proxy() method will normally not
145	use the $ua->proxy() and $ua->no_proxy() methods.
146
147	Some proxies also require that you send it a username/password in
148	order to let requests through. You should be able to add the
149	required header, with something like this:
150
151	use LWP::UserAgent;
152
153	$ua = new LWP::UserAgent;
154	$ua->proxy(['http', 'ftp'] => 'http://proxy.myorg.com');
155
156	$req = new HTTP::Request 'GET',"http://www.perl.com";
157	$req->proxy_authorization_basic("proxy_user", "proxy_password");
158
159	$res = $ua->request($req);
160	print $res->content if $res->is_success;
161
162	Replace C<proxy.myorg.com>, C<proxy_user> and
163	C<proxy_password> with something suitable for your site.
164
165
166	=head1 ACCESS TO PROTECTED DOCUMENTS
167
168	Documents protected by basic authorization can easily be accessed
169	like this:
170
171	use LWP::UserAgent;
172	$ua = new LWP::UserAgent;
173	$req = new HTTP::Request GET => 'http://www.sn.no/secret/';
174	$req->authorization_basic('aas', 'mypassword');
175	print $ua->request($req)->as_string;
176
177	The other alternative is to provide a subclass of I<LWP::UserAgent> that
178	overrides the get_basic_credentials() method. Study the I<lwp-request>
179	program for an example of this.
180
181
182	=head1 MIRRORING
183
184	If you want to mirror documents from a WWW server, then try to run
185	code similar to this at regular intervals:
186
187	use LWP::Simple;
188
189	%mirrors = (
190	'http://www.sn.no/' => 'sn.html',
191	'http://www.perl.com/' => 'perl.html',
192	'http://www.sn.no/libwww-perl/' => 'lwp.html',
193	'gopher://gopher.sn.no/' => 'gopher.html',
194	);
195
196	while (($url, $localfile) = each(%mirrors)) {
197	mirror($url, $localfile);
198	}
199
200	Or, as a perl one-liner:
201
202	perl -MLWP::Simple -e 'mirror("http://www.perl.com/", "perl.html")';
203
204	The document will not be transfered unless it has been updated.
205
206
207
208	=head1 LARGE DOCUMENTS
209
210	If the document you want to fetch is too large to be kept in memory,
211	then you have two alternatives. You can instruct the library to write
212	the document content to a file (second $ua->request() argument is a file
213	name):
214
215	use LWP::UserAgent;
216	$ua = new LWP::UserAgent;
217
218	my $req = new HTTP::Request 'GET',
219	'http://www.sn.no/~aas/perl/www/libwww-perl-5.00.tar.gz';
220	$res = $ua->request($req, "libwww-perl.tar.gz");
221	if ($res->is_success) {
222	print "ok\n";
223	}
224
225	Or you can process the document as it arrives (second $ua->request()
226	argument is a code reference):
227
228	use LWP::UserAgent;
229	$ua = new LWP::UserAgent;
230	$URL = 'ftp://ftp.unit.no/pub/rfc/rfc-index.txt';
231
232	my $expected_length;
233	my $bytes_received = 0;
234	$ua->request(HTTP::Request->new('GET', $URL),
235	sub {
236	my($chunk, $res) = @_;
237	$bytes_received += length($chunk);
238	unless (defined $expected_length) {
239	$expected_length = $res->content_length \|\| 0;
240	}
241	if ($expected_length) {
242	printf STDERR "%d%% - ",
243	100 * $bytes_received / $expected_length;
244	}
245	print STDERR "$bytes_received bytes received\n";
246
247	# XXX Should really do something with the chunk itself
248	# print $chunk;
249	});
250
251
252
253	=head1 HTML FORMATTING
254
255	It is easy to convert HTML code to "readable" text.
256
257	use LWP::Simple;
258	use HTML::Parse;
259	print parse_html(get 'http://www.sn.no/libwww-perl/')->format;
260
261
262
263	=head1 PARSE URLS
264
265	To access individual elements of a URL, try this:
266
267	use URI::URL;
268	$host = url("http://www.sn.no/")->host;
269
270	or
271
272	use URI::URL;
273	$u = url("ftp://ftp.sn.no/test/aas;type=i");
274	print "Protocol scheme is ", $u->scheme, "\n";
275	print "Host is ", $u->host, " at port ", $u->port, "\n";
276
277	or even
278
279	use URI::URL;
280	my($host,$port) = (url("ftp://ftp.sn.no/test/aas;type=i")->crack)[3,4];
281
282
283	=head1 EXPAND RELATIVE URLS
284
285	This code reads URLs and print expanded version.
286
287	use URI::URL;
288	$BASE = "http://www.sn.no/some/place?query";
289	while (<>) {
290	print url($_, $BASE)->abs->as_string, "\n";
291	}
292
293	We can expand URLs in an HTML document by using the parser to build a
294	tree that we then traverse:
295
296	%link_elements =
297	(
298	'a' => 'href',
299	'img' => 'src',
300	'form' => 'action',
301	'link' => 'href',
302	);
303
304	use HTML::Parse;
305	use URI::URL;
306
307	$BASE = "http://somewhere/root/";
308	$h = parse_htmlfile("xxx.html");
309	$h->traverse(\&expand_urls, 1);
310
311	print $h->as_HTML;
312
313	sub expand_urls
314	{
315	my($e, $start) = @_;
316	return 1 unless $start;
317	my $attr = $link_elements{$e->tag};
318	return 1 unless defined $attr;
319	my $url = $e->attr($attr);
320	return 1 unless defined $url;
321	$e->attr($attr, url($url, $BASE)->abs->as_string);
322	}
323
324
325
326	=head1 BASE URL
327
328	If you want to resolve relative links in a page you will have to
329	determine which base URL to use. The HTTP::Response objects now has a
330	base() method.
331
332	$BASE = $res->base;
333
334
335
336	=head1 COPYRIGHT
337
338	Copyright 1996-1997, Gisle Aas
339
340	This library is free software; you can redistribute it and/or
341	modify it under the same terms as Perl itself.
342
343

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format