1 | package PerlIO;
|
---|
2 |
|
---|
3 | our $VERSION = '1.04';
|
---|
4 |
|
---|
5 | # Map layer name to package that defines it
|
---|
6 | our %alias;
|
---|
7 |
|
---|
8 | sub import
|
---|
9 | {
|
---|
10 | my $class = shift;
|
---|
11 | while (@_)
|
---|
12 | {
|
---|
13 | my $layer = shift;
|
---|
14 | if (exists $alias{$layer})
|
---|
15 | {
|
---|
16 | $layer = $alias{$layer}
|
---|
17 | }
|
---|
18 | else
|
---|
19 | {
|
---|
20 | $layer = "${class}::$layer";
|
---|
21 | }
|
---|
22 | eval "require $layer";
|
---|
23 | warn $@ if $@;
|
---|
24 | }
|
---|
25 | }
|
---|
26 |
|
---|
27 | sub F_UTF8 () { 0x8000 }
|
---|
28 |
|
---|
29 | 1;
|
---|
30 | __END__
|
---|
31 |
|
---|
32 | =head1 NAME
|
---|
33 |
|
---|
34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
|
---|
35 |
|
---|
36 | =head1 SYNOPSIS
|
---|
37 |
|
---|
38 | open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files
|
---|
39 |
|
---|
40 | open($fh,"<","his.jpg"); # portably open a binary file for reading
|
---|
41 | binmode($fh);
|
---|
42 |
|
---|
43 | Shell:
|
---|
44 | PERLIO=perlio perl ....
|
---|
45 |
|
---|
46 | =head1 DESCRIPTION
|
---|
47 |
|
---|
48 | When an undefined layer 'foo' is encountered in an C<open> or
|
---|
49 | C<binmode> layer specification then C code performs the equivalent of:
|
---|
50 |
|
---|
51 | use PerlIO 'foo';
|
---|
52 |
|
---|
53 | The perl code in PerlIO.pm then attempts to locate a layer by doing
|
---|
54 |
|
---|
55 | require PerlIO::foo;
|
---|
56 |
|
---|
57 | Otherwise the C<PerlIO> package is a place holder for additional
|
---|
58 | PerlIO related functions.
|
---|
59 |
|
---|
60 | The following layers are currently defined:
|
---|
61 |
|
---|
62 | =over 4
|
---|
63 |
|
---|
64 | =item :unix
|
---|
65 |
|
---|
66 | Lowest level layer which provides basic PerlIO operations in terms of
|
---|
67 | UNIX/POSIX numeric file descriptor calls
|
---|
68 | (open(), read(), write(), lseek(), close()).
|
---|
69 |
|
---|
70 | =item :stdio
|
---|
71 |
|
---|
72 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note
|
---|
73 | that as this is "real" stdio it will ignore any layers beneath it and
|
---|
74 | got straight to the operating system via the C library as usual.
|
---|
75 |
|
---|
76 | =item :perlio
|
---|
77 |
|
---|
78 | A from scratch implementation of buffering for PerlIO. Provides fast
|
---|
79 | access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt>
|
---|
80 | and in general attempts to minimize data copying.
|
---|
81 |
|
---|
82 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
|
---|
83 |
|
---|
84 | =item :crlf
|
---|
85 |
|
---|
86 | A layer that implements DOS/Windows like CRLF line endings. On read
|
---|
87 | converts pairs of CR,LF to a single "\n" newline character. On write
|
---|
88 | converts each "\n" to a CR,LF pair. Note that this layer likes to be
|
---|
89 | one of its kind: it silently ignores attempts to be pushed into the
|
---|
90 | layer stack more than once.
|
---|
91 |
|
---|
92 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z
|
---|
93 | as being an end-of-file marker.
|
---|
94 |
|
---|
95 | (Gory details follow) To be more exact what happens is this: after
|
---|
96 | pushing itself to the stack, the C<:crlf> layer checks all the layers
|
---|
97 | below itself to find the first layer that is capable of being a CRLF
|
---|
98 | layer but is not yet enabled to be a CRLF layer. If it finds such a
|
---|
99 | layer, it enables the CRLFness of that other deeper layer, and then
|
---|
100 | pops itself off the stack. If not, fine, use the one we just pushed.
|
---|
101 |
|
---|
102 | The end result is that a C<:crlf> means "please enable the first CRLF
|
---|
103 | layer you can find, and if you can't find one, here would be a good
|
---|
104 | spot to place a new one."
|
---|
105 |
|
---|
106 | Based on the C<:perlio> layer.
|
---|
107 |
|
---|
108 | =item :mmap
|
---|
109 |
|
---|
110 | A layer which implements "reading" of files by using C<mmap()> to
|
---|
111 | make (whole) file appear in the process's address space, and then
|
---|
112 | using that as PerlIO's "buffer". This I<may> be faster in certain
|
---|
113 | circumstances for large files, and may result in less physical memory
|
---|
114 | use when multiple processes are reading the same file.
|
---|
115 |
|
---|
116 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio>
|
---|
117 | layer. Writes also behave like C<:perlio> layer as C<mmap()> for write
|
---|
118 | needs extra house-keeping (to extend the file) which negates any advantage.
|
---|
119 |
|
---|
120 | The C<:mmap> layer will not exist if platform does not support C<mmap()>.
|
---|
121 |
|
---|
122 | =item :utf8
|
---|
123 |
|
---|
124 | Declares that the stream accepts perl's internal encoding of
|
---|
125 | characters. (Which really is UTF-8 on ASCII machines, but is
|
---|
126 | UTF-EBCDIC on EBCDIC machines.) This allows any character perl can
|
---|
127 | represent to be read from or written to the stream. The UTF-X encoding
|
---|
128 | is chosen to render simple text parts (i.e. non-accented letters,
|
---|
129 | digits and common punctuation) human readable in the encoded file.
|
---|
130 |
|
---|
131 | Here is how to write your native data out using UTF-8 (or UTF-EBCDIC)
|
---|
132 | and then read it back in.
|
---|
133 |
|
---|
134 | open(F, ">:utf8", "data.utf");
|
---|
135 | print F $out;
|
---|
136 | close(F);
|
---|
137 |
|
---|
138 | open(F, "<:utf8", "data.utf");
|
---|
139 | $in = <F>;
|
---|
140 | close(F);
|
---|
141 |
|
---|
142 | =item :bytes
|
---|
143 |
|
---|
144 | This is the inverse of C<:utf8> layer. It turns off the flag
|
---|
145 | on the layer below so that data read from it is considered to
|
---|
146 | be "octets" i.e. characters in range 0..255 only. Likewise
|
---|
147 | on output perl will warn if a "wide" character is written
|
---|
148 | to a such a stream.
|
---|
149 |
|
---|
150 | =item :raw
|
---|
151 |
|
---|
152 | The C<:raw> layer is I<defined> as being identical to calling
|
---|
153 | C<binmode($fh)> - the stream is made suitable for passing binary data
|
---|
154 | i.e. each byte is passed as-is. The stream will still be
|
---|
155 | buffered.
|
---|
156 |
|
---|
157 | In Perl 5.6 and some books the C<:raw> layer (previously sometimes also
|
---|
158 | referred to as a "discipline") is documented as the inverse of the
|
---|
159 | C<:crlf> layer. That is no longer the case - other layers which would
|
---|
160 | alter binary nature of the stream are also disabled. If you want UNIX
|
---|
161 | line endings on a platform that normally does CRLF translation, but still
|
---|
162 | want UTF-8 or encoding defaults the appropriate thing to do is to add
|
---|
163 | C<:perlio> to PERLIO environment variable.
|
---|
164 |
|
---|
165 | The implementation of C<:raw> is as a pseudo-layer which when "pushed"
|
---|
166 | pops itself and then any layers which do not declare themselves as suitable
|
---|
167 | for binary data. (Undoing :utf8 and :crlf are implemented by clearing
|
---|
168 | flags rather than popping layers but that is an implementation detail.)
|
---|
169 |
|
---|
170 | As a consequence of the fact that C<:raw> normally pops layers
|
---|
171 | it usually only makes sense to have it as the only or first element in
|
---|
172 | a layer specification. When used as the first element it provides
|
---|
173 | a known base on which to build e.g.
|
---|
174 |
|
---|
175 | open($fh,":raw:utf8",...)
|
---|
176 |
|
---|
177 | will construct a "binary" stream, but then enable UTF-8 translation.
|
---|
178 |
|
---|
179 | =item :pop
|
---|
180 |
|
---|
181 | A pseudo layer that removes the top-most layer. Gives perl code
|
---|
182 | a way to manipulate the layer stack. Should be considered
|
---|
183 | as experimental. Note that C<:pop> only works on real layers
|
---|
184 | and will not undo the effects of pseudo layers like C<:utf8>.
|
---|
185 | An example of a possible use might be:
|
---|
186 |
|
---|
187 | open($fh,...)
|
---|
188 | ...
|
---|
189 | binmode($fh,":encoding(...)"); # next chunk is encoded
|
---|
190 | ...
|
---|
191 | binmode($fh,":pop"); # back to un-encoded
|
---|
192 |
|
---|
193 | A more elegant (and safer) interface is needed.
|
---|
194 |
|
---|
195 | =item :win32
|
---|
196 |
|
---|
197 | On Win32 platforms this I<experimental> layer uses native "handle" IO
|
---|
198 | rather than unix-like numeric file descriptor layer. Known to be
|
---|
199 | buggy as of perl 5.8.2.
|
---|
200 |
|
---|
201 | =back
|
---|
202 |
|
---|
203 | =head2 Custom Layers
|
---|
204 |
|
---|
205 | It is possible to write custom layers in addition to the above builtin
|
---|
206 | ones, both in C/XS and Perl. Two such layers (and one example written
|
---|
207 | in Perl using the latter) come with the Perl distribution.
|
---|
208 |
|
---|
209 | =over 4
|
---|
210 |
|
---|
211 | =item :encoding
|
---|
212 |
|
---|
213 | Use C<:encoding(ENCODING)> either in open() or binmode() to install
|
---|
214 | a layer that does transparently character set and encoding transformations,
|
---|
215 | for example from Shift-JIS to Unicode. Note that under C<stdio>
|
---|
216 | an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding>
|
---|
217 | for more information.
|
---|
218 |
|
---|
219 | =item :via
|
---|
220 |
|
---|
221 | Use C<:via(MODULE)> either in open() or binmode() to install a layer
|
---|
222 | that does whatever transformation (for example compression /
|
---|
223 | decompression, encryption / decryption) to the filehandle.
|
---|
224 | See L<PerlIO::via> for more information.
|
---|
225 |
|
---|
226 | =back
|
---|
227 |
|
---|
228 | =head2 Alternatives to raw
|
---|
229 |
|
---|
230 | To get a binary stream an alternate method is to use:
|
---|
231 |
|
---|
232 | open($fh,"whatever")
|
---|
233 | binmode($fh);
|
---|
234 |
|
---|
235 | this has advantage of being backward compatible with how such things have
|
---|
236 | had to be coded on some platforms for years.
|
---|
237 |
|
---|
238 | To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>)
|
---|
239 | in the open call:
|
---|
240 |
|
---|
241 | open($fh,"<:unix",$path)
|
---|
242 |
|
---|
243 | =head2 Defaults and how to override them
|
---|
244 |
|
---|
245 | If the platform is MS-DOS like and normally does CRLF to "\n"
|
---|
246 | translation for text files then the default layers are :
|
---|
247 |
|
---|
248 | unix crlf
|
---|
249 |
|
---|
250 | (The low level "unix" layer may be replaced by a platform specific low
|
---|
251 | level layer.)
|
---|
252 |
|
---|
253 | Otherwise if C<Configure> found out how to do "fast" IO using system's
|
---|
254 | stdio, then the default layers are:
|
---|
255 |
|
---|
256 | unix stdio
|
---|
257 |
|
---|
258 | Otherwise the default layers are
|
---|
259 |
|
---|
260 | unix perlio
|
---|
261 |
|
---|
262 | These defaults may change once perlio has been better tested and tuned.
|
---|
263 |
|
---|
264 | The default can be overridden by setting the environment variable
|
---|
265 | PERLIO to a space separated list of layers (C<unix> or platform low
|
---|
266 | level layer is always pushed first).
|
---|
267 |
|
---|
268 | This can be used to see the effect of/bugs in the various layers e.g.
|
---|
269 |
|
---|
270 | cd .../perl/t
|
---|
271 | PERLIO=stdio ./perl harness
|
---|
272 | PERLIO=perlio ./perl harness
|
---|
273 |
|
---|
274 | For the various value of PERLIO see L<perlrun/PERLIO>.
|
---|
275 |
|
---|
276 | =head2 Querying the layers of filehandles
|
---|
277 |
|
---|
278 | The following returns the B<names> of the PerlIO layers on a filehandle.
|
---|
279 |
|
---|
280 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
|
---|
281 |
|
---|
282 | The layers are returned in the order an open() or binmode() call would
|
---|
283 | use them. Note that the "default stack" depends on the operating
|
---|
284 | system and on the Perl version, and both the compile-time and
|
---|
285 | runtime configurations of Perl.
|
---|
286 |
|
---|
287 | The following table summarizes the default layers on UNIX-like and
|
---|
288 | DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>:
|
---|
289 |
|
---|
290 | PERLIO UNIX-like DOS-like
|
---|
291 | ------ --------- --------
|
---|
292 | unset / "" unix perlio / stdio [1] unix crlf
|
---|
293 | stdio unix perlio / stdio [1] stdio
|
---|
294 | perlio unix perlio unix perlio
|
---|
295 | mmap unix mmap unix mmap
|
---|
296 |
|
---|
297 | # [1] "stdio" if Configure found out how to do "fast stdio" (depends
|
---|
298 | # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio"
|
---|
299 |
|
---|
300 | By default the layers from the input side of the filehandle is
|
---|
301 | returned, to get the output side use the optional C<output> argument:
|
---|
302 |
|
---|
303 | my @layers = PerlIO::get_layers($fh, output => 1);
|
---|
304 |
|
---|
305 | (Usually the layers are identical on either side of a filehandle but
|
---|
306 | for example with sockets there may be differences, or if you have
|
---|
307 | been using the C<open> pragma.)
|
---|
308 |
|
---|
309 | There is no set_layers(), nor does get_layers() return a tied array
|
---|
310 | mirroring the stack, or anything fancy like that. This is not
|
---|
311 | accidental or unintentional. The PerlIO layer stack is a bit more
|
---|
312 | complicated than just a stack (see for example the behaviour of C<:raw>).
|
---|
313 | You are supposed to use open() and binmode() to manipulate the stack.
|
---|
314 |
|
---|
315 | B<Implementation details follow, please close your eyes.>
|
---|
316 |
|
---|
317 | The arguments to layers are by default returned in parenthesis after
|
---|
318 | the name of the layer, and certain layers (like C<utf8>) are not real
|
---|
319 | layers but instead flags on real layers: to get all of these returned
|
---|
320 | separately use the optional C<details> argument:
|
---|
321 |
|
---|
322 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
|
---|
323 |
|
---|
324 | The result will be up to be three times the number of layers:
|
---|
325 | the first element will be a name, the second element the arguments
|
---|
326 | (unspecified arguments will be C<undef>), the third element the flags,
|
---|
327 | the fourth element a name again, and so forth.
|
---|
328 |
|
---|
329 | B<You may open your eyes now.>
|
---|
330 |
|
---|
331 | =head1 AUTHOR
|
---|
332 |
|
---|
333 | Nick Ing-Simmons E<lt>[email protected]<gt>
|
---|
334 |
|
---|
335 | =head1 SEE ALSO
|
---|
336 |
|
---|
337 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>,
|
---|
338 | L<Encode>
|
---|
339 |
|
---|
340 | =cut
|
---|