source: main/trunk/greenstone2/perllib/cpan/LWP.pm@ 27174

Last change on this file since 27174 was 27174, checked in by davidb, 11 years ago

Perl modules from CPAN that are used in supporting activate.pl, but not part of the Perl core. Only PMs included.

File size: 21.1 KB
Line 
1package LWP;
2
3$VERSION = "6.05";
4sub Version { $VERSION; }
5
6require 5.008;
7require LWP::UserAgent; # this should load everything you need
8
91;
10
11__END__
12
13=encoding utf-8
14
15=head1 NAME
16
17LWP - The World-Wide Web library for Perl
18
19=head1 SYNOPSIS
20
21 use LWP;
22 print "This is libwww-perl-$LWP::VERSION\n";
23
24
25=head1 DESCRIPTION
26
27The libwww-perl collection is a set of Perl modules which provides a
28simple and consistent application programming interface (API) to the
29World-Wide Web. The main focus of the library is to provide classes
30and functions that allow you to write WWW clients. The library also
31contain modules that are of more general use and even classes that
32help you implement simple HTTP servers.
33
34Most modules in this library provide an object oriented API. The user
35agent, requests sent and responses received from the WWW server are
36all represented by objects. This makes a simple and powerful
37interface to these services. The interface is easy to extend
38and customize for your own needs.
39
40The main features of the library are:
41
42=over 3
43
44=item *
45
46Contains various reusable components (modules) that can be
47used separately or together.
48
49=item *
50
51Provides an object oriented model of HTTP-style communication. Within
52this framework we currently support access to http, https, gopher, ftp, news,
53file, and mailto resources.
54
55=item *
56
57Provides a full object oriented interface or
58a very simple procedural interface.
59
60=item *
61
62Supports the basic and digest authorization schemes.
63
64=item *
65
66Supports transparent redirect handling.
67
68=item *
69
70Supports access through proxy servers.
71
72=item *
73
74Provides parser for F<robots.txt> files and a framework for constructing robots.
75
76=item *
77
78Supports parsing of HTML forms.
79
80=item *
81
82Implements HTTP content negotiation algorithm that can
83be used both in protocol modules and in server scripts (like CGI
84scripts).
85
86=item *
87
88Supports HTTP cookies.
89
90=item *
91
92Some simple command line clients, for instance C<lwp-request> and C<lwp-download>.
93
94=back
95
96
97=head1 HTTP STYLE COMMUNICATION
98
99
100The libwww-perl library is based on HTTP style communication. This
101section tries to describe what that means.
102
103Let us start with this quote from the HTTP specification document
104<URL:http://www.w3.org/Protocols/>:
105
106=over 3
107
108=item
109
110The HTTP protocol is based on a request/response paradigm. A client
111establishes a connection with a server and sends a request to the
112server in the form of a request method, URI, and protocol version,
113followed by a MIME-like message containing request modifiers, client
114information, and possible body content. The server responds with a
115status line, including the message's protocol version and a success or
116error code, followed by a MIME-like message containing server
117information, entity meta-information, and possible body content.
118
119=back
120
121What this means to libwww-perl is that communication always take place
122through these steps: First a I<request> object is created and
123configured. This object is then passed to a server and we get a
124I<response> object in return that we can examine. A request is always
125independent of any previous requests, i.e. the service is stateless.
126The same simple model is used for any kind of service we want to
127access.
128
129For example, if we want to fetch a document from a remote file server,
130then we send it a request that contains a name for that document and
131the response will contain the document itself. If we access a search
132engine, then the content of the request will contain the query
133parameters and the response will contain the query result. If we want
134to send a mail message to somebody then we send a request object which
135contains our message to the mail server and the response object will
136contain an acknowledgment that tells us that the message has been
137accepted and will be forwarded to the recipient(s).
138
139It is as simple as that!
140
141
142=head2 The Request Object
143
144The libwww-perl request object has the class name C<HTTP::Request>.
145The fact that the class name uses C<HTTP::> as a
146prefix only implies that we use the HTTP model of communication. It
147does not limit the kind of services we can try to pass this I<request>
148to. For instance, we will send C<HTTP::Request>s both to ftp and
149gopher servers, as well as to the local file system.
150
151The main attributes of the request objects are:
152
153=over 3
154
155=item *
156
157B<method> is a short string that tells what kind of
158request this is. The most common methods are B<GET>, B<PUT>,
159B<POST> and B<HEAD>.
160
161=item *
162
163B<uri> is a string denoting the protocol, server and
164the name of the "document" we want to access. The B<uri> might
165also encode various other parameters.
166
167=item *
168
169B<headers> contains additional information about the
170request and can also used to describe the content. The headers
171are a set of keyword/value pairs.
172
173=item *
174
175B<content> is an arbitrary amount of data.
176
177=back
178
179=head2 The Response Object
180
181The libwww-perl response object has the class name C<HTTP::Response>.
182The main attributes of objects of this class are:
183
184=over 3
185
186=item *
187
188B<code> is a numerical value that indicates the overall
189outcome of the request.
190
191=item *
192
193B<message> is a short, human readable string that
194corresponds to the I<code>.
195
196=item *
197
198B<headers> contains additional information about the
199response and describe the content.
200
201=item *
202
203B<content> is an arbitrary amount of data.
204
205=back
206
207Since we don't want to handle all possible I<code> values directly in
208our programs, a libwww-perl response object has methods that can be
209used to query what kind of response this is. The most commonly used
210response classification methods are:
211
212=over 3
213
214=item is_success()
215
216The request was successfully received, understood or accepted.
217
218=item is_error()
219
220The request failed. The server or the resource might not be
221available, access to the resource might be denied or other things might
222have failed for some reason.
223
224=back
225
226=head2 The User Agent
227
228Let us assume that we have created a I<request> object. What do we
229actually do with it in order to receive a I<response>?
230
231The answer is that you pass it to a I<user agent> object and this
232object takes care of all the things that need to be done
233(like low-level communication and error handling) and returns
234a I<response> object. The user agent represents your
235application on the network and provides you with an interface that
236can accept I<requests> and return I<responses>.
237
238The user agent is an interface layer between
239your application code and the network. Through this interface you are
240able to access the various servers on the network.
241
242The class name for the user agent is C<LWP::UserAgent>. Every
243libwww-perl application that wants to communicate should create at
244least one object of this class. The main method provided by this
245object is request(). This method takes an C<HTTP::Request> object as
246argument and (eventually) returns a C<HTTP::Response> object.
247
248The user agent has many other attributes that let you
249configure how it will interact with the network and with your
250application.
251
252=over 3
253
254=item *
255
256B<timeout> specifies how much time we give remote servers to
257respond before the library disconnects and creates an
258internal I<timeout> response.
259
260=item *
261
262B<agent> specifies the name that your application uses when it
263presents itself on the network.
264
265=item *
266
267B<from> can be set to the e-mail address of the person
268responsible for running the application. If this is set, then the
269address will be sent to the servers with every request.
270
271=item *
272
273B<parse_head> specifies whether we should initialize response
274headers from the E<lt>head> section of HTML documents.
275
276=item *
277
278B<proxy> and B<no_proxy> specify if and when to go through
279a proxy server. <URL:http://www.w3.org/History/1994/WWW/Proxies/>
280
281=item *
282
283B<credentials> provides a way to set up user names and
284passwords needed to access certain services.
285
286=back
287
288Many applications want even more control over how they interact
289with the network and they get this by sub-classing
290C<LWP::UserAgent>. The library includes a
291sub-class, C<LWP::RobotUA>, for robot applications.
292
293=head2 An Example
294
295This example shows how the user agent, a request and a response are
296represented in actual perl code:
297
298 # Create a user agent object
299 use LWP::UserAgent;
300 my $ua = LWP::UserAgent->new;
301 $ua->agent("MyApp/0.1 ");
302
303 # Create a request
304 my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
305 $req->content_type('application/x-www-form-urlencoded');
306 $req->content('query=libwww-perl&mode=dist');
307
308 # Pass request to the user agent and get a response back
309 my $res = $ua->request($req);
310
311 # Check the outcome of the response
312 if ($res->is_success) {
313 print $res->content;
314 }
315 else {
316 print $res->status_line, "\n";
317 }
318
319The $ua is created once when the application starts up. New request
320objects should normally created for each request sent.
321
322
323=head1 NETWORK SUPPORT
324
325This section discusses the various protocol schemes and
326the HTTP style methods that headers may be used for each.
327
328For all requests, a "User-Agent" header is added and initialized from
329the $ua->agent attribute before the request is handed to the network
330layer. In the same way, a "From" header is initialized from the
331$ua->from attribute.
332
333For all responses, the library adds a header called "Client-Date".
334This header holds the time when the response was received by
335your application. The format and semantics of the header are the
336same as the server created "Date" header. You may also encounter other
337"Client-XXX" headers. They are all generated by the library
338internally and are not received from the servers.
339
340=head2 HTTP Requests
341
342HTTP requests are just handed off to an HTTP server and it
343decides what happens. Few servers implement methods beside the usual
344"GET", "HEAD", "POST" and "PUT", but CGI-scripts may implement
345any method they like.
346
347If the server is not available then the library will generate an
348internal error response.
349
350The library automatically adds a "Host" and a "Content-Length" header
351to the HTTP request before it is sent over the network.
352
353For a GET request you might want to add a "If-Modified-Since" or
354"If-None-Match" header to make the request conditional.
355
356For a POST request you should add the "Content-Type" header. When you
357try to emulate HTML E<lt>FORM> handling you should usually let the value
358of the "Content-Type" header be "application/x-www-form-urlencoded".
359See L<lwpcook> for examples of this.
360
361The libwww-perl HTTP implementation currently support the HTTP/1.1
362and HTTP/1.0 protocol.
363
364The library allows you to access proxy server through HTTP. This
365means that you can set up the library to forward all types of request
366through the HTTP protocol module. See L<LWP::UserAgent> for
367documentation of this.
368
369
370=head2 HTTPS Requests
371
372HTTPS requests are HTTP requests over an encrypted network connection
373using the SSL protocol developed by Netscape. Everything about HTTP
374requests above also apply to HTTPS requests. In addition the library
375will add the headers "Client-SSL-Cipher", "Client-SSL-Cert-Subject" and
376"Client-SSL-Cert-Issuer" to the response. These headers denote the
377encryption method used and the name of the server owner.
378
379The request can contain the header "If-SSL-Cert-Subject" in order to
380make the request conditional on the content of the server certificate.
381If the certificate subject does not match, no request is sent to the
382server and an internally generated error response is returned. The
383value of the "If-SSL-Cert-Subject" header is interpreted as a Perl
384regular expression.
385
386
387=head2 FTP Requests
388
389The library currently supports GET, HEAD and PUT requests. GET
390retrieves a file or a directory listing from an FTP server. PUT
391stores a file on a ftp server.
392
393You can specify a ftp account for servers that want this in addition
394to user name and password. This is specified by including an "Account"
395header in the request.
396
397User name/password can be specified using basic authorization or be
398encoded in the URL. Failed logins return an UNAUTHORIZED response with
399"WWW-Authenticate: Basic" and can be treated like basic authorization
400for HTTP.
401
402The library supports ftp ASCII transfer mode by specifying the "type=a"
403parameter in the URL. It also supports transfer of ranges for FTP transfers
404using the "Range" header.
405
406Directory listings are by default returned unprocessed (as returned
407from the ftp server) with the content media type reported to be
408"text/ftp-dir-listing". The C<File::Listing> module provides methods
409for parsing of these directory listing.
410
411The ftp module is also able to convert directory listings to HTML and
412this can be requested via the standard HTTP content negotiation
413mechanisms (add an "Accept: text/html" header in the request if you
414want this).
415
416For normal file retrievals, the "Content-Type" is guessed based on the
417file name suffix. See L<LWP::MediaTypes>.
418
419The "If-Modified-Since" request header works for servers that implement
420the MDTM command. It will probably not work for directory listings though.
421
422Example:
423
424 $req = HTTP::Request->new(GET => 'ftp://me:[email protected]/');
425 $req->header(Accept => "text/html, */*;q=0.1");
426
427=head2 News Requests
428
429Access to the USENET News system is implemented through the NNTP
430protocol. The name of the news server is obtained from the
431NNTP_SERVER environment variable and defaults to "news". It is not
432possible to specify the hostname of the NNTP server in news: URLs.
433
434The library supports GET and HEAD to retrieve news articles through the
435NNTP protocol. You can also post articles to newsgroups by using
436(surprise!) the POST method.
437
438GET on newsgroups is not implemented yet.
439
440Examples:
441
442 $req = HTTP::Request->new(GET => 'news:[email protected]');
443
444 $req = HTTP::Request->new(POST => 'news:comp.lang.perl.test');
445 $req->header(Subject => 'This is a test',
446 From => '[email protected]');
447 $req->content(<<EOT);
448 This is the content of the message that we are sending to
449 the world.
450 EOT
451
452
453=head2 Gopher Request
454
455The library supports the GET and HEAD methods for gopher requests. All
456request header values are ignored. HEAD cheats and returns a
457response without even talking to server.
458
459Gopher menus are always converted to HTML.
460
461The response "Content-Type" is generated from the document type
462encoded (as the first letter) in the request URL path itself.
463
464Example:
465
466 $req = HTTP::Request->new(GET => 'gopher://gopher.sn.no/');
467
468
469
470=head2 File Request
471
472The library supports GET and HEAD methods for file requests. The
473"If-Modified-Since" header is supported. All other headers are
474ignored. The I<host> component of the file URL must be empty or set
475to "localhost". Any other I<host> value will be treated as an error.
476
477Directories are always converted to an HTML document. For normal
478files, the "Content-Type" and "Content-Encoding" in the response are
479guessed based on the file suffix.
480
481Example:
482
483 $req = HTTP::Request->new(GET => 'file:/etc/passwd');
484
485
486=head2 Mailto Request
487
488You can send (aka "POST") mail messages using the library. All
489headers specified for the request are passed on to the mail system.
490The "To" header is initialized from the mail address in the URL.
491
492Example:
493
494 $req = HTTP::Request->new(POST => 'mailto:[email protected]');
495 $req->header(Subject => "subscribe");
496 $req->content("Please subscribe me to the libwww-perl mailing list!\n");
497
498=head2 CPAN Requests
499
500URLs with scheme C<cpan:> are redirected to the a suitable CPAN
501mirror. If you have your own local mirror of CPAN you might tell LWP
502to use it for C<cpan:> URLs by an assignment like this:
503
504 $LWP::Protocol::cpan::CPAN = "file:/local/CPAN/";
505
506Suitable CPAN mirrors are also picked up from the configuration for
507the CPAN.pm, so if you have used that module a suitable mirror should
508be picked automatically. If neither of these apply, then a redirect
509to the generic CPAN http location is issued.
510
511Example request to download the newest perl:
512
513 $req = HTTP::Request->new(GET => "cpan:src/latest.tar.gz");
514
515
516=head1 OVERVIEW OF CLASSES AND PACKAGES
517
518This table should give you a quick overview of the classes provided by the
519library. Indentation shows class inheritance.
520
521 LWP::MemberMixin -- Access to member variables of Perl5 classes
522 LWP::UserAgent -- WWW user agent class
523 LWP::RobotUA -- When developing a robot applications
524 LWP::Protocol -- Interface to various protocol schemes
525 LWP::Protocol::http -- http:// access
526 LWP::Protocol::file -- file:// access
527 LWP::Protocol::ftp -- ftp:// access
528 ...
529
530 LWP::Authen::Basic -- Handle 401 and 407 responses
531 LWP::Authen::Digest
532
533 HTTP::Headers -- MIME/RFC822 style header (used by HTTP::Message)
534 HTTP::Message -- HTTP style message
535 HTTP::Request -- HTTP request
536 HTTP::Response -- HTTP response
537 HTTP::Daemon -- A HTTP server class
538
539 WWW::RobotRules -- Parse robots.txt files
540 WWW::RobotRules::AnyDBM_File -- Persistent RobotRules
541
542 Net::HTTP -- Low level HTTP client
543
544The following modules provide various functions and definitions.
545
546 LWP -- This file. Library version number and documentation.
547 LWP::MediaTypes -- MIME types configuration (text/html etc.)
548 LWP::Simple -- Simplified procedural interface for common functions
549 HTTP::Status -- HTTP status code (200 OK etc)
550 HTTP::Date -- Date parsing module for HTTP date formats
551 HTTP::Negotiate -- HTTP content negotiation calculation
552 File::Listing -- Parse directory listings
553 HTML::Form -- Processing for <form>s in HTML documents
554
555
556=head1 MORE DOCUMENTATION
557
558All modules contain detailed information on the interfaces they
559provide. The L<lwpcook> manpage is the libwww-perl cookbook that contain
560examples of typical usage of the library. You might want to take a
561look at how the scripts L<lwp-request>, L<lwp-download>, L<lwp-dump>
562and L<lwp-mirror> are implemented.
563
564=head1 ENVIRONMENT
565
566The following environment variables are used by LWP:
567
568=over
569
570=item HOME
571
572The C<LWP::MediaTypes> functions will look for the F<.media.types> and
573F<.mime.types> files relative to you home directory.
574
575=item http_proxy
576
577=item ftp_proxy
578
579=item xxx_proxy
580
581=item no_proxy
582
583These environment variables can be set to enable communication through
584a proxy server. See the description of the C<env_proxy> method in
585L<LWP::UserAgent>.
586
587=item PERL_LWP_ENV_PROXY
588
589If set to a TRUE value, then the C<LWP::UserAgent> will by default call
590C<env_proxy> during initialization. This makes LWP honor the proxy variables
591described above.
592
593=item PERL_LWP_SSL_VERIFY_HOSTNAME
594
595The default C<verify_hostname> setting for C<LWP::UserAgent>. If
596not set the default will be 1. Set it as 0 to disable hostname
597verification (the default prior to libwww-perl 5.840.
598
599=item PERL_LWP_SSL_CA_FILE
600
601=item PERL_LWP_SSL_CA_PATH
602
603The file and/or directory
604where the trusted Certificate Authority certificates
605is located. See L<LWP::UserAgent> for details.
606
607=item PERL_HTTP_URI_CLASS
608
609Used to decide what URI objects to instantiate. The default is C<URI>.
610You might want to set it to C<URI::URL> for compatibility with old times.
611
612=back
613
614=head1 AUTHORS
615
616LWP was made possible by contributions from Adam Newby, Albert
617Dvornik, Alexandre Duret-Lutz, Andreas Gustafsson, Andreas König,
618Andrew Pimlott, Andy Lester, Ben Coleman, Benjamin Low, Ben Low, Ben
619Tilly, Blair Zajac, Bob Dalgleish, BooK, Brad Hughes, Brian
620J. Murrell, Brian McCauley, Charles C. Fu, Charles Lane, Chris Nandor,
621Christian Gilmore, Chris W. Unger, Craig Macdonald, Dale Couch, Dan
622Kubb, Dave Dunkin, Dave W. Smith, David Coppit, David Dick, David
623D. Kilzer, Doug MacEachern, Edward Avis, erik, Gary Shea, Gisle Aas,
624Graham Barr, Gurusamy Sarathy, Hans de Graaff, Harald Joerg, Harry
625Bochner, Hugo, Ilya Zakharevich, INOUE Yoshinari, Ivan Panchenko, Jack
626Shirazi, James Tillman, Jan Dubois, Jared Rhine, Jim Stern, Joao
627Lopes, John Klar, Johnny Lee, Josh Kronengold, Josh Rai, Joshua
628Chamas, Joshua Hoblitt, Kartik Subbarao, Keiichiro Nagano, Ken
629Williams, KONISHI Katsuhiro, Lee T Lindley, Liam Quinn, Marc Hedlund,
630Marc Langheinrich, Mark D. Anderson, Marko Asplund, Mark Stosberg,
631Markus B KrÃŒger, Markus Laker, Martijn Koster, Martin Thurn, Matthew
632Eldridge, Matthew.van.Eerde, Matt Sergeant, Michael A. Chase, Michael
633Quaranta, Michael Thompson, Mike Schilli, Moshe Kaminsky, Nathan
634Torkington, Nicolai Langfeldt, Norton Allen, Olly Betts, Paul
635J. Schinder, peterm, Philip GuentherDaniel Buenzli, Pon Hwa Lin,
636Radoslaw Zielinski, Radu Greab, Randal L. Schwartz, Richard Chen,
637Robin Barker, Roy Fielding, Sander van Zoest, Sean M. Burke,
638shildreth, Slaven Rezic, Steve A Fink, Steve Hay, Steven Butler,
639Steve_Kilbane, Takanori Ugai, Thomas Lotterer, Tim Bunce, Tom Hughes,
640Tony Finch, Ville SkyttÀ, Ward Vandewege, William York, Yale Huang,
641and Yitzchak Scott-Thoennes.
642
643LWP owes a lot in motivation, design, and code, to the libwww-perl
644library for Perl4 by Roy Fielding, which included work from Alberto
645Accomazzi, James Casey, Brooks Cutter, Martijn Koster, Oscar
646Nierstrasz, Mel Melchner, Gertjan van Oosten, Jared Rhine, Jack
647Shirazi, Gene Spafford, Marc VanHeyningen, Steven E. Brenner, Marion
648Hakanson, Waldemar Kebsch, Tony Sanders, and Larry Wall; see the
649libwww-perl-0.40 library for details.
650
651=head1 COPYRIGHT
652
653 Copyright 1995-2009, Gisle Aas
654 Copyright 1995, Martijn Koster
655
656This library is free software; you can redistribute it and/or
657modify it under the same terms as Perl itself.
658
659=head1 AVAILABILITY
660
661The latest version of this library is likely to be available from CPAN
662as well as:
663
664 http://github.com/libwww-perl/libwww-perl
665
666The best place to discuss this code is on the <[email protected]>
667mailing list.
668
669=cut
Note: See TracBrowser for help on using the repository browser.