source: main/trunk/greenstone2/perllib/cpan/LWP.pm@ 27183

Last change on this file since 27183 was 27183, checked in by davidb, 11 years ago

Changing to using installed version of LWP that comes from libwww-perl, which is more self-contained than v6.x

File size: 20.8 KB
Line 
1package LWP;
2
3$VERSION = "5.837";
4sub Version { $VERSION; }
5
6require 5.005;
7require LWP::UserAgent; # this should load everything you need
8
91;
10
11__END__
12
13=head1 NAME
14
15LWP - The World-Wide Web library for Perl
16
17=head1 SYNOPSIS
18
19 use LWP;
20 print "This is libwww-perl-$LWP::VERSION\n";
21
22
23=head1 DESCRIPTION
24
25The libwww-perl collection is a set of Perl modules which provides a
26simple and consistent application programming interface (API) to the
27World-Wide Web. The main focus of the library is to provide classes
28and functions that allow you to write WWW clients. The library also
29contain modules that are of more general use and even classes that
30help you implement simple HTTP servers.
31
32Most modules in this library provide an object oriented API. The user
33agent, requests sent and responses received from the WWW server are
34all represented by objects. This makes a simple and powerful
35interface to these services. The interface is easy to extend
36and customize for your own needs.
37
38The main features of the library are:
39
40=over 3
41
42=item *
43
44Contains various reusable components (modules) that can be
45used separately or together.
46
47=item *
48
49Provides an object oriented model of HTTP-style communication. Within
50this framework we currently support access to http, https, gopher, ftp, news,
51file, and mailto resources.
52
53=item *
54
55Provides a full object oriented interface or
56a very simple procedural interface.
57
58=item *
59
60Supports the basic and digest authorization schemes.
61
62=item *
63
64Supports transparent redirect handling.
65
66=item *
67
68Supports access through proxy servers.
69
70=item *
71
72Provides parser for F<robots.txt> files and a framework for constructing robots.
73
74=item *
75
76Supports parsing of HTML forms.
77
78=item *
79
80Implements HTTP content negotiation algorithm that can
81be used both in protocol modules and in server scripts (like CGI
82scripts).
83
84=item *
85
86Supports HTTP cookies.
87
88=item *
89
90Some simple command line clients, for instance C<lwp-request> and C<lwp-download>.
91
92=back
93
94
95=head1 HTTP STYLE COMMUNICATION
96
97
98The libwww-perl library is based on HTTP style communication. This
99section tries to describe what that means.
100
101Let us start with this quote from the HTTP specification document
102<URL:http://www.w3.org/pub/WWW/Protocols/>:
103
104=over 3
105
106=item
107
108The HTTP protocol is based on a request/response paradigm. A client
109establishes a connection with a server and sends a request to the
110server in the form of a request method, URI, and protocol version,
111followed by a MIME-like message containing request modifiers, client
112information, and possible body content. The server responds with a
113status line, including the message's protocol version and a success or
114error code, followed by a MIME-like message containing server
115information, entity meta-information, and possible body content.
116
117=back
118
119What this means to libwww-perl is that communication always take place
120through these steps: First a I<request> object is created and
121configured. This object is then passed to a server and we get a
122I<response> object in return that we can examine. A request is always
123independent of any previous requests, i.e. the service is stateless.
124The same simple model is used for any kind of service we want to
125access.
126
127For example, if we want to fetch a document from a remote file server,
128then we send it a request that contains a name for that document and
129the response will contain the document itself. If we access a search
130engine, then the content of the request will contain the query
131parameters and the response will contain the query result. If we want
132to send a mail message to somebody then we send a request object which
133contains our message to the mail server and the response object will
134contain an acknowledgment that tells us that the message has been
135accepted and will be forwarded to the recipient(s).
136
137It is as simple as that!
138
139
140=head2 The Request Object
141
142The libwww-perl request object has the class name C<HTTP::Request>.
143The fact that the class name uses C<HTTP::> as a
144prefix only implies that we use the HTTP model of communication. It
145does not limit the kind of services we can try to pass this I<request>
146to. For instance, we will send C<HTTP::Request>s both to ftp and
147gopher servers, as well as to the local file system.
148
149The main attributes of the request objects are:
150
151=over 3
152
153=item *
154
155The B<method> is a short string that tells what kind of
156request this is. The most common methods are B<GET>, B<PUT>,
157B<POST> and B<HEAD>.
158
159=item *
160
161The B<uri> is a string denoting the protocol, server and
162the name of the "document" we want to access. The B<uri> might
163also encode various other parameters.
164
165=item *
166
167The B<headers> contain additional information about the
168request and can also used to describe the content. The headers
169are a set of keyword/value pairs.
170
171=item *
172
173The B<content> is an arbitrary amount of data.
174
175=back
176
177=head2 The Response Object
178
179The libwww-perl response object has the class name C<HTTP::Response>.
180The main attributes of objects of this class are:
181
182=over 3
183
184=item *
185
186The B<code> is a numerical value that indicates the overall
187outcome of the request.
188
189=item *
190
191The B<message> is a short, human readable string that
192corresponds to the I<code>.
193
194=item *
195
196The B<headers> contain additional information about the
197response and describe the content.
198
199=item *
200
201The B<content> is an arbitrary amount of data.
202
203=back
204
205Since we don't want to handle all possible I<code> values directly in
206our programs, a libwww-perl response object has methods that can be
207used to query what kind of response this is. The most commonly used
208response classification methods are:
209
210=over 3
211
212=item is_success()
213
214The request was was successfully received, understood or accepted.
215
216=item is_error()
217
218The request failed. The server or the resource might not be
219available, access to the resource might be denied or other things might
220have failed for some reason.
221
222=back
223
224=head2 The User Agent
225
226Let us assume that we have created a I<request> object. What do we
227actually do with it in order to receive a I<response>?
228
229The answer is that you pass it to a I<user agent> object and this
230object takes care of all the things that need to be done
231(like low-level communication and error handling) and returns
232a I<response> object. The user agent represents your
233application on the network and provides you with an interface that
234can accept I<requests> and return I<responses>.
235
236The user agent is an interface layer between
237your application code and the network. Through this interface you are
238able to access the various servers on the network.
239
240The class name for the user agent is C<LWP::UserAgent>. Every
241libwww-perl application that wants to communicate should create at
242least one object of this class. The main method provided by this
243object is request(). This method takes an C<HTTP::Request> object as
244argument and (eventually) returns a C<HTTP::Response> object.
245
246The user agent has many other attributes that let you
247configure how it will interact with the network and with your
248application.
249
250=over 3
251
252=item *
253
254The B<timeout> specifies how much time we give remote servers to
255respond before the library disconnects and creates an
256internal I<timeout> response.
257
258=item *
259
260The B<agent> specifies the name that your application should use when it
261presents itself on the network.
262
263=item *
264
265The B<from> attribute can be set to the e-mail address of the person
266responsible for running the application. If this is set, then the
267address will be sent to the servers with every request.
268
269=item *
270
271The B<parse_head> specifies whether we should initialize response
272headers from the E<lt>head> section of HTML documents.
273
274=item *
275
276The B<proxy> and B<no_proxy> attributes specify if and when to go through
277a proxy server. <URL:http://www.w3.org/pub/WWW/Proxies/>
278
279=item *
280
281The B<credentials> provide a way to set up user names and
282passwords needed to access certain services.
283
284=back
285
286Many applications want even more control over how they interact
287with the network and they get this by sub-classing
288C<LWP::UserAgent>. The library includes a
289sub-class, C<LWP::RobotUA>, for robot applications.
290
291=head2 An Example
292
293This example shows how the user agent, a request and a response are
294represented in actual perl code:
295
296 # Create a user agent object
297 use LWP::UserAgent;
298 my $ua = LWP::UserAgent->new;
299 $ua->agent("MyApp/0.1 ");
300
301 # Create a request
302 my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
303 $req->content_type('application/x-www-form-urlencoded');
304 $req->content('query=libwww-perl&mode=dist');
305
306 # Pass request to the user agent and get a response back
307 my $res = $ua->request($req);
308
309 # Check the outcome of the response
310 if ($res->is_success) {
311 print $res->content;
312 }
313 else {
314 print $res->status_line, "\n";
315 }
316
317The $ua is created once when the application starts up. New request
318objects should normally created for each request sent.
319
320
321=head1 NETWORK SUPPORT
322
323This section discusses the various protocol schemes and
324the HTTP style methods that headers may be used for each.
325
326For all requests, a "User-Agent" header is added and initialized from
327the $ua->agent attribute before the request is handed to the network
328layer. In the same way, a "From" header is initialized from the
329$ua->from attribute.
330
331For all responses, the library adds a header called "Client-Date".
332This header holds the time when the response was received by
333your application. The format and semantics of the header are the
334same as the server created "Date" header. You may also encounter other
335"Client-XXX" headers. They are all generated by the library
336internally and are not received from the servers.
337
338=head2 HTTP Requests
339
340HTTP requests are just handed off to an HTTP server and it
341decides what happens. Few servers implement methods beside the usual
342"GET", "HEAD", "POST" and "PUT", but CGI-scripts may implement
343any method they like.
344
345If the server is not available then the library will generate an
346internal error response.
347
348The library automatically adds a "Host" and a "Content-Length" header
349to the HTTP request before it is sent over the network.
350
351For a GET request you might want to add a "If-Modified-Since" or
352"If-None-Match" header to make the request conditional.
353
354For a POST request you should add the "Content-Type" header. When you
355try to emulate HTML E<lt>FORM> handling you should usually let the value
356of the "Content-Type" header be "application/x-www-form-urlencoded".
357See L<lwpcook> for examples of this.
358
359The libwww-perl HTTP implementation currently support the HTTP/1.1
360and HTTP/1.0 protocol.
361
362The library allows you to access proxy server through HTTP. This
363means that you can set up the library to forward all types of request
364through the HTTP protocol module. See L<LWP::UserAgent> for
365documentation of this.
366
367
368=head2 HTTPS Requests
369
370HTTPS requests are HTTP requests over an encrypted network connection
371using the SSL protocol developed by Netscape. Everything about HTTP
372requests above also apply to HTTPS requests. In addition the library
373will add the headers "Client-SSL-Cipher", "Client-SSL-Cert-Subject" and
374"Client-SSL-Cert-Issuer" to the response. These headers denote the
375encryption method used and the name of the server owner.
376
377The request can contain the header "If-SSL-Cert-Subject" in order to
378make the request conditional on the content of the server certificate.
379If the certificate subject does not match, no request is sent to the
380server and an internally generated error response is returned. The
381value of the "If-SSL-Cert-Subject" header is interpreted as a Perl
382regular expression.
383
384
385=head2 FTP Requests
386
387The library currently supports GET, HEAD and PUT requests. GET
388retrieves a file or a directory listing from an FTP server. PUT
389stores a file on a ftp server.
390
391You can specify a ftp account for servers that want this in addition
392to user name and password. This is specified by including an "Account"
393header in the request.
394
395User name/password can be specified using basic authorization or be
396encoded in the URL. Failed logins return an UNAUTHORIZED response with
397"WWW-Authenticate: Basic" and can be treated like basic authorization
398for HTTP.
399
400The library supports ftp ASCII transfer mode by specifying the "type=a"
401parameter in the URL. It also supports transfer of ranges for FTP transfers
402using the "Range" header.
403
404Directory listings are by default returned unprocessed (as returned
405from the ftp server) with the content media type reported to be
406"text/ftp-dir-listing". The C<File::Listing> module provides methods
407for parsing of these directory listing.
408
409The ftp module is also able to convert directory listings to HTML and
410this can be requested via the standard HTTP content negotiation
411mechanisms (add an "Accept: text/html" header in the request if you
412want this).
413
414For normal file retrievals, the "Content-Type" is guessed based on the
415file name suffix. See L<LWP::MediaTypes>.
416
417The "If-Modified-Since" request header works for servers that implement
418the MDTM command. It will probably not work for directory listings though.
419
420Example:
421
422 $req = HTTP::Request->new(GET => 'ftp://me:[email protected]/');
423 $req->header(Accept => "text/html, */*;q=0.1");
424
425=head2 News Requests
426
427Access to the USENET News system is implemented through the NNTP
428protocol. The name of the news server is obtained from the
429NNTP_SERVER environment variable and defaults to "news". It is not
430possible to specify the hostname of the NNTP server in news: URLs.
431
432The library supports GET and HEAD to retrieve news articles through the
433NNTP protocol. You can also post articles to newsgroups by using
434(surprise!) the POST method.
435
436GET on newsgroups is not implemented yet.
437
438Examples:
439
440 $req = HTTP::Request->new(GET => 'news:[email protected]');
441
442 $req = HTTP::Request->new(POST => 'news:comp.lang.perl.test');
443 $req->header(Subject => 'This is a test',
444 From => '[email protected]');
445 $req->content(<<EOT);
446 This is the content of the message that we are sending to
447 the world.
448 EOT
449
450
451=head2 Gopher Request
452
453The library supports the GET and HEAD methods for gopher requests. All
454request header values are ignored. HEAD cheats and returns a
455response without even talking to server.
456
457Gopher menus are always converted to HTML.
458
459The response "Content-Type" is generated from the document type
460encoded (as the first letter) in the request URL path itself.
461
462Example:
463
464 $req = HTTP::Request->new(GET => 'gopher://gopher.sn.no/');
465
466
467
468=head2 File Request
469
470The library supports GET and HEAD methods for file requests. The
471"If-Modified-Since" header is supported. All other headers are
472ignored. The I<host> component of the file URL must be empty or set
473to "localhost". Any other I<host> value will be treated as an error.
474
475Directories are always converted to an HTML document. For normal
476files, the "Content-Type" and "Content-Encoding" in the response are
477guessed based on the file suffix.
478
479Example:
480
481 $req = HTTP::Request->new(GET => 'file:/etc/passwd');
482
483
484=head2 Mailto Request
485
486You can send (aka "POST") mail messages using the library. All
487headers specified for the request are passed on to the mail system.
488The "To" header is initialized from the mail address in the URL.
489
490Example:
491
492 $req = HTTP::Request->new(POST => 'mailto:[email protected]');
493 $req->header(Subject => "subscribe");
494 $req->content("Please subscribe me to the libwww-perl mailing list!\n");
495
496=head2 CPAN Requests
497
498URLs with scheme C<cpan:> are redirected to the a suitable CPAN
499mirror. If you have your own local mirror of CPAN you might tell LWP
500to use it for C<cpan:> URLs by an assignment like this:
501
502 $LWP::Protocol::cpan::CPAN = "file:/local/CPAN/";
503
504Suitable CPAN mirrors are also picked up from the configuration for
505the CPAN.pm, so if you have used that module a suitable mirror should
506be picked automatically. If neither of these apply, then a redirect
507to the generic CPAN http location is issued.
508
509Example request to download the newest perl:
510
511 $req = HTTP::Request->new(GET => "cpan:src/latest.tar.gz");
512
513
514=head1 OVERVIEW OF CLASSES AND PACKAGES
515
516This table should give you a quick overview of the classes provided by the
517library. Indentation shows class inheritance.
518
519 LWP::MemberMixin -- Access to member variables of Perl5 classes
520 LWP::UserAgent -- WWW user agent class
521 LWP::RobotUA -- When developing a robot applications
522 LWP::Protocol -- Interface to various protocol schemes
523 LWP::Protocol::http -- http:// access
524 LWP::Protocol::file -- file:// access
525 LWP::Protocol::ftp -- ftp:// access
526 ...
527
528 LWP::Authen::Basic -- Handle 401 and 407 responses
529 LWP::Authen::Digest
530
531 HTTP::Headers -- MIME/RFC822 style header (used by HTTP::Message)
532 HTTP::Message -- HTTP style message
533 HTTP::Request -- HTTP request
534 HTTP::Response -- HTTP response
535 HTTP::Daemon -- A HTTP server class
536
537 WWW::RobotRules -- Parse robots.txt files
538 WWW::RobotRules::AnyDBM_File -- Persistent RobotRules
539
540 Net::HTTP -- Low level HTTP client
541
542The following modules provide various functions and definitions.
543
544 LWP -- This file. Library version number and documentation.
545 LWP::MediaTypes -- MIME types configuration (text/html etc.)
546 LWP::Simple -- Simplified procedural interface for common functions
547 HTTP::Status -- HTTP status code (200 OK etc)
548 HTTP::Date -- Date parsing module for HTTP date formats
549 HTTP::Negotiate -- HTTP content negotiation calculation
550 File::Listing -- Parse directory listings
551 HTML::Form -- Processing for <form>s in HTML documents
552
553
554=head1 MORE DOCUMENTATION
555
556All modules contain detailed information on the interfaces they
557provide. The L<lwpcook> manpage is the libwww-perl cookbook that contain
558examples of typical usage of the library. You might want to take a
559look at how the scripts L<lwp-request>, L<lwp-rget> and L<lwp-mirror>
560are implemented.
561
562=head1 ENVIRONMENT
563
564The following environment variables are used by LWP:
565
566=over
567
568=item HOME
569
570The C<LWP::MediaTypes> functions will look for the F<.media.types> and
571F<.mime.types> files relative to you home directory.
572
573=item http_proxy
574
575=item ftp_proxy
576
577=item xxx_proxy
578
579=item no_proxy
580
581These environment variables can be set to enable communication through
582a proxy server. See the description of the C<env_proxy> method in
583L<LWP::UserAgent>.
584
585=item PERL_LWP_USE_HTTP_10
586
587Enable the old HTTP/1.0 protocol driver instead of the new HTTP/1.1
588driver. You might want to set this to a TRUE value if you discover
589that your old LWP applications fails after you installed LWP-5.60 or
590better.
591
592=item PERL_HTTP_URI_CLASS
593
594Used to decide what URI objects to instantiate. The default is C<URI>.
595You might want to set it to C<URI::URL> for compatibility with old times.
596
597=back
598
599=head1 AUTHORS
600
601LWP was made possible by contributions from Adam Newby, Albert
602Dvornik, Alexandre Duret-Lutz, Andreas Gustafsson, Andreas König,
603Andrew Pimlott, Andy Lester, Ben Coleman, Benjamin Low, Ben Low, Ben
604Tilly, Blair Zajac, Bob Dalgleish, BooK, Brad Hughes, Brian
605J. Murrell, Brian McCauley, Charles C. Fu, Charles Lane, Chris Nandor,
606Christian Gilmore, Chris W. Unger, Craig Macdonald, Dale Couch, Dan
607Kubb, Dave Dunkin, Dave W. Smith, David Coppit, David Dick, David
608D. Kilzer, Doug MacEachern, Edward Avis, erik, Gary Shea, Gisle Aas,
609Graham Barr, Gurusamy Sarathy, Hans de Graaff, Harald Joerg, Harry
610Bochner, Hugo, Ilya Zakharevich, INOUE Yoshinari, Ivan Panchenko, Jack
611Shirazi, James Tillman, Jan Dubois, Jared Rhine, Jim Stern, Joao
612Lopes, John Klar, Johnny Lee, Josh Kronengold, Josh Rai, Joshua
613Chamas, Joshua Hoblitt, Kartik Subbarao, Keiichiro Nagano, Ken
614Williams, KONISHI Katsuhiro, Lee T Lindley, Liam Quinn, Marc Hedlund,
615Marc Langheinrich, Mark D. Anderson, Marko Asplund, Mark Stosberg,
616Markus B KrÃŒger, Markus Laker, Martijn Koster, Martin Thurn, Matthew
617Eldridge, Matthew.van.Eerde, Matt Sergeant, Michael A. Chase, Michael
618Quaranta, Michael Thompson, Mike Schilli, Moshe Kaminsky, Nathan
619Torkington, Nicolai Langfeldt, Norton Allen, Olly Betts, Paul
620J. Schinder, peterm, Philip GuentherDaniel Buenzli, Pon Hwa Lin,
621Radoslaw Zielinski, Radu Greab, Randal L. Schwartz, Richard Chen,
622Robin Barker, Roy Fielding, Sander van Zoest, Sean M. Burke,
623shildreth, Slaven Rezic, Steve A Fink, Steve Hay, Steven Butler,
624Steve_Kilbane, Takanori Ugai, Thomas Lotterer, Tim Bunce, Tom Hughes,
625Tony Finch, Ville SkyttÀ, Ward Vandewege, William York, Yale Huang,
626and Yitzchak Scott-Thoennes.
627
628LWP owes a lot in motivation, design, and code, to the libwww-perl
629library for Perl4 by Roy Fielding, which included work from Alberto
630Accomazzi, James Casey, Brooks Cutter, Martijn Koster, Oscar
631Nierstrasz, Mel Melchner, Gertjan van Oosten, Jared Rhine, Jack
632Shirazi, Gene Spafford, Marc VanHeyningen, Steven E. Brenner, Marion
633Hakanson, Waldemar Kebsch, Tony Sanders, and Larry Wall; see the
634libwww-perl-0.40 library for details.
635
636=head1 COPYRIGHT
637
638 Copyright 1995-2009, Gisle Aas
639 Copyright 1995, Martijn Koster
640
641This library is free software; you can redistribute it and/or
642modify it under the same terms as Perl itself.
643
644=head1 AVAILABILITY
645
646The latest version of this library is likely to be available from CPAN
647as well as:
648
649 http://github.com/gisle/libwww-perl
650
651The best place to discuss this code is on the <[email protected]>
652mailing list.
653
654=cut
Note: See TracBrowser for help on using the repository browser.