Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

perlfaq5.pod@ 14489

Last change on this file since 14489 was 14489, checked in by oranfry, 17 years ago
upgrading to perl 5.8
File size: 38.5 KB

Line
1	=head1 NAME
2
3	perlfaq5 - Files and Formats ($Revision: 1.42 $, $Date: 2005/12/31 00:54:37 $)
4
5	=head1 DESCRIPTION
6
7	This section deals with I/O and the "f" issues: filehandles, flushing,
8	formats, and footers.
9
10	=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
11	X<flush> X<buffer> X<unbuffer> X<autoflush>
12
13	Perl does not support truly unbuffered output (except
14	insofar as you can C<syswrite(OUT, $char, 1)>), although it
15	does support is "command buffering", in which a physical
16	write is performed after every output command.
17
18	The C standard I/O library (stdio) normally buffers
19	characters sent to devices so that there isn't a system call
20	for each byte. In most stdio implementations, the type of
21	output buffering and the size of the buffer varies according
22	to the type of device. Perl's print() and write() functions
23	normally buffer output, while syswrite() bypasses buffering
24	all together.
25
26	If you want your output to be sent immediately when you
27	execute print() or write() (for instance, for some network
28	protocols), you must set the handle's autoflush flag. This
29	flag is the Perl variable $\| and when it is set to a true
30	value, Perl will flush the handle's buffer after each
31	print() or write(). Setting $\| affects buffering only for
32	the currently selected default file handle. You choose this
33	handle with the one argument select() call (see
34	L<perlvar/$E<verbar>> and L<perlfunc/select>).
35
36	Use select() to choose the desired handle, then set its
37	per-filehandle variables.
38
39	$old_fh = select(OUTPUT_HANDLE);
40	$\| = 1;
41	select($old_fh);
42
43	Some idioms can handle this in a single statement:
44
45	select((select(OUTPUT_HANDLE), $\| = 1)[0]);
46
47	$\| = 1, select $_ for select OUTPUT_HANDLE;
48
49	Some modules offer object-oriented access to handles and their
50	variables, although they may be overkill if this is the only
51	thing you do with them. You can use IO::Handle:
52
53	use IO::Handle;
54	open(DEV, ">/dev/printer"); # but is this?
55	DEV->autoflush(1);
56
57	or IO::Socket:
58
59	use IO::Socket; # this one is kinda a pipe?
60	my $sock = IO::Socket::INET->new( 'www.example.com:80' );
61
62	$sock->autoflush();
63
64	=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
65	X<file, editing>
66
67	Use the Tie::File module, which is included in the standard
68	distribution since Perl 5.8.0.
69
70	=head2 How do I count the number of lines in a file?
71	X<file, counting lines> X<lines> X<line>
72
73	One fairly efficient way is to count newlines in the file. The
74	following program uses a feature of tr///, as documented in L<perlop>.
75	If your text file doesn't end with a newline, then it's not really a
76	proper text file, so this may report one fewer line than you expect.
77
78	$lines = 0;
79	open(FILE, $filename) or die "Can't open `$filename': $!";
80	while (sysread FILE, $buffer, 4096) {
81	$lines += ($buffer =~ tr/\n//);
82	}
83	close FILE;
84
85	This assumes no funny games with newline translations.
86
87	=head2 How can I use Perl's C<-i> option from within a program?
88	X<-i> X<in-place>
89
90	C<-i> sets the value of Perl's C<$^I> variable, which in turn affects
91	the behavior of C<< <> >>; see L<perlrun> for more details. By
92	modifying the appropriate variables directly, you can get the same
93	behavior within a larger program. For example:
94
95	# ...
96	{
97	local($^I, @ARGV) = ('.orig', glob("*.c"));
98	while (<>) {
99	if ($. == 1) {
100	print "This line should appear at the top of each file\n";
101	}
102	s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
103	print;
104	close ARGV if eof; # Reset $.
105	}
106	}
107	# $^I and @ARGV return to their old values here
108
109	This block modifies all the C<.c> files in the current directory,
110	leaving a backup of the original data from each file in a new
111	C<.c.orig> file.
112
113	=head2 How can I copy a file?
114	X<copy> X<file, copy>
115
116	(contributed by brian d foy)
117
118	Use the File::Copy module. It comes with Perl and can do a
119	true copy across file systems, and it does its magic in
120	a portable fashion.
121
122	use File::Copy;
123
124	copy( $original, $new_copy ) or die "Copy failed: $!";
125
126	If you can't use File::Copy, you'll have to do the work yourself:
127	open the original file, open the destination file, then print
128	to the destination file as you read the original.
129
130	=head2 How do I make a temporary file name?
131	X<file, temporary>
132
133	If you don't need to know the name of the file, you can use C<open()>
134	with C<undef> in place of the file name. The C<open()> function
135	creates an anonymous temporary file.
136
137	open my $tmp, '+>', undef or die $!;
138
139	Otherwise, you can use the File::Temp module.
140
141	use File::Temp qw/ tempfile tempdir /;
142
143	$dir = tempdir( CLEANUP => 1 );
144	($fh, $filename) = tempfile( DIR => $dir );
145
146	# or if you don't need to know the filename
147
148	$fh = tempfile( DIR => $dir );
149
150	The File::Temp has been a standard module since Perl 5.6.1. If you
151	don't have a modern enough Perl installed, use the C<new_tmpfile>
152	class method from the IO::File module to get a filehandle opened for
153	reading and writing. Use it if you don't need to know the file's name:
154
155	use IO::File;
156	$fh = IO::File->new_tmpfile()
157	or die "Unable to make new temporary file: $!";
158
159	If you're committed to creating a temporary file by hand, use the
160	process ID and/or the current time-value. If you need to have many
161	temporary files in one process, use a counter:
162
163	BEGIN {
164	use Fcntl;
165	my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMPDIR} \|\| $ENV{TEMP};
166	my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
167	sub temp_file {
168	local *FH;
169	my $count = 0;
170	until (defined(fileno(FH)) \|\| $count++ > 100) {
171	$base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
172	# O_EXCL is required for security reasons.
173	sysopen(FH, $base_name, O_WRONLY\|O_EXCL\|O_CREAT);
174	}
175	if (defined(fileno(FH))
176	return (*FH, $base_name);
177	} else {
178	return ();
179	}
180	}
181	}
182
183	=head2 How can I manipulate fixed-record-length files?
184	X<fixed-length> X<file, fixed-length records>
185
186	The most efficient way is using L<pack()\|perlfunc/"pack"> and
187	L<unpack()\|perlfunc/"unpack">. This is faster than using
188	L<substr()\|perlfunc/"substr"> when taking many, many strings. It is
189	slower for just a few.
190
191	Here is a sample chunk of code to break up and put back together again
192	some fixed-format input lines, in this case from the output of a normal,
193	Berkeley-style ps:
194
195	# sample input line:
196	# 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
197	my $PS_T = 'A6 A4 A7 A5 A*';
198	open my $ps, '-\|', 'ps';
199	print scalar <$ps>;
200	my @fields = qw( pid tt stat time command );
201	while (<$ps>) {
202	my %process;
203	@process{@fields} = unpack($PS_T, $_);
204	for my $field ( @fields ) {
205	print "$field: <$process{$field}>\n";
206	}
207	print 'line=', pack($PS_T, @process{@fields} ), "\n";
208	}
209
210	We've used a hash slice in order to easily handle the fields of each row.
211	Storing the keys in an array means it's easy to operate on them as a
212	group or loop over them with for. It also avoids polluting the program
213	with global variables and using symbolic references.
214
215	=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
216	X<filehandle, local> X<filehandle, passing> X<filehandle, reference>
217
218	As of perl5.6, open() autovivifies file and directory handles
219	as references if you pass it an uninitialized scalar variable.
220	You can then pass these references just like any other scalar,
221	and use them in the place of named handles.
222
223	open my $fh, $file_name;
224
225	open local $fh, $file_name;
226
227	print $fh "Hello World!\n";
228
229	process_file( $fh );
230
231	Before perl5.6, you had to deal with various typeglob idioms
232	which you may see in older code.
233
234	open FILE, "> $filename";
235	process_typeglob( *FILE );
236	process_reference( \*FILE );
237
238	sub process_typeglob { local *FH = shift; print FH "Typeglob!" }
239	sub process_reference { local $fh = shift; print $fh "Reference!" }
240
241	If you want to create many anonymous handles, you should
242	check out the Symbol or IO::Handle modules.
243
244	=head2 How can I use a filehandle indirectly?
245	X<filehandle, indirect>
246
247	An indirect filehandle is using something other than a symbol
248	in a place that a filehandle is expected. Here are ways
249	to get indirect filehandles:
250
251	$fh = SOME_FH; # bareword is strict-subs hostile
252	$fh = "SOME_FH"; # strict-refs hostile; same package only
253	$fh = *SOME_FH; # typeglob
254	$fh = \*SOME_FH; # ref to typeglob (bless-able)
255	$fh = SOME_FH{IO}; # blessed IO::Handle from SOME_FH typeglob
256
257	Or, you can use the C<new> method from one of the IO::* modules to
258	create an anonymous filehandle, store that in a scalar variable,
259	and use it as though it were a normal filehandle.
260
261	use IO::Handle; # 5.004 or higher
262	$fh = IO::Handle->new();
263
264	Then use any of those as you would a normal filehandle. Anywhere that
265	Perl is expecting a filehandle, an indirect filehandle may be used
266	instead. An indirect filehandle is just a scalar variable that contains
267	a filehandle. Functions like C<print>, C<open>, C<seek>, or
268	the C<< <FH> >> diamond operator will accept either a named filehandle
269	or a scalar variable containing one:
270
271	($ifh, $ofh, $efh) = (STDIN, STDOUT, *STDERR);
272	print $ofh "Type it: ";
273	$got = <$ifh>
274	print $efh "What was that: $got";
275
276	If you're passing a filehandle to a function, you can write
277	the function in two ways:
278
279	sub accept_fh {
280	my $fh = shift;
281	print $fh "Sending to indirect filehandle\n";
282	}
283
284	Or it can localize a typeglob and use the filehandle directly:
285
286	sub accept_fh {
287	local *FH = shift;
288	print FH "Sending to localized filehandle\n";
289	}
290
291	Both styles work with either objects or typeglobs of real filehandles.
292	(They might also work with strings under some circumstances, but this
293	is risky.)
294
295	accept_fh(*STDOUT);
296	accept_fh($handle);
297
298	In the examples above, we assigned the filehandle to a scalar variable
299	before using it. That is because only simple scalar variables, not
300	expressions or subscripts of hashes or arrays, can be used with
301	built-ins like C<print>, C<printf>, or the diamond operator. Using
302	something other than a simple scalar variable as a filehandle is
303	illegal and won't even compile:
304
305	@fd = (STDIN, STDOUT, *STDERR);
306	print $fd[1] "Type it: "; # WRONG
307	$got = <$fd[0]> # WRONG
308	print $fd[2] "What was that: $got"; # WRONG
309
310	With C<print> and C<printf>, you get around this by using a block and
311	an expression where you would place the filehandle:
312
313	print { $fd[1] } "funny stuff\n";
314	printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
315	# Pity the poor deadbeef.
316
317	That block is a proper block like any other, so you can put more
318	complicated code there. This sends the message out to one of two places:
319
320	$ok = -x "/bin/cat";
321	print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
322	print { $fd[ 1+ ($ok \|\| 0) ] } "cat stat $ok\n";
323
324	This approach of treating C<print> and C<printf> like object methods
325	calls doesn't work for the diamond operator. That's because it's a
326	real operator, not just a function with a comma-less argument. Assuming
327	you've been storing typeglobs in your structure as we did above, you
328	can use the built-in function named C<readline> to read a record just
329	as C<< <> >> does. Given the initialization shown above for @fd, this
330	would work, but only because readline() requires a typeglob. It doesn't
331	work with objects or strings, which might be a bug we haven't fixed yet.
332
333	$got = readline($fd[0]);
334
335	Let it be noted that the flakiness of indirect filehandles is not
336	related to whether they're strings, typeglobs, objects, or anything else.
337	It's the syntax of the fundamental operators. Playing the object
338	game doesn't help you at all here.
339
340	=head2 How can I set up a footer format to be used with write()?
341	X<footer>
342
343	There's no builtin way to do this, but L<perlform> has a couple of
344	techniques to make it possible for the intrepid hacker.
345
346	=head2 How can I write() into a string?
347	X<write, into a string>
348
349	See L<perlform/"Accessing Formatting Internals"> for an swrite() function.
350
351	=head2 How can I output my numbers with commas added?
352	X<number, commify>
353
354	(contributed by brian d foy and Benjamin Goldberg)
355
356	You can use L<Number::Format> to separate places in a number.
357	It handles locale information for those of you who want to insert
358	full stops instead (or anything else that they want to use,
359	really).
360
361	This subroutine will add commas to your number:
362
363	sub commify {
364	local $_ = shift;
365	1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
366	return $_;
367	}
368
369	This regex from Benjamin Goldberg will add commas to numbers:
370
371	s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))\|\G\d{3}(?=\d))/$1,/g;
372
373	It is easier to see with comments:
374
375	s/(
376	^[-+]? # beginning of number.
377	\d+? # first digits before first comma
378	(?= # followed by, (but not included in the match) :
379	(?>(?:\d{3})+) # some positive multiple of three digits.
380	(?!\d) # an exact multiple, not x * 3 + 1 or whatever.
381	)
382	\| # or:
383	\G\d{3} # after the last group, get three digits
384	(?=\d) # but they have to have more digits after them.
385	)/$1,/xg;
386
387	=head2 How can I translate tildes (~) in a filename?
388	X<tilde> X<tilde expansion>
389
390	Use the <> (glob()) operator, documented in L<perlfunc>. Older
391	versions of Perl require that you have a shell installed that groks
392	tildes. Recent perl versions have this feature built in. The
393	File::KGlob module (available from CPAN) gives more portable glob
394	functionality.
395
396	Within Perl, you may use this directly:
397
398	$filename =~ s{
399	^ ~ # find a leading tilde
400	( # save this in $1
401	[^/] # a non-slash character
402	* # repeated 0 or more times (0 means me)
403	)
404	}{
405	$1
406	? (getpwnam($1))[7]
407	: ( $ENV{HOME} \|\| $ENV{LOGDIR} )
408	}ex;
409
410	=head2 How come when I open a file read-write it wipes it out?
411	X<clobber> X<read-write> X<clobbering> X<truncate> X<truncating>
412
413	Because you're using something like this, which truncates the file and
414	I<then> gives you read-write access:
415
416	open(FH, "+> /path/name"); # WRONG (almost always)
417
418	Whoops. You should instead use this, which will fail if the file
419	doesn't exist.
420
421	open(FH, "+< /path/name"); # open for update
422
423	Using ">" always clobbers or creates. Using "<" never does
424	either. The "+" doesn't change this.
425
426	Here are examples of many kinds of file opens. Those using sysopen()
427	all assume
428
429	use Fcntl;
430
431	To open file for reading:
432
433	open(FH, "< $path") \|\| die $!;
434	sysopen(FH, $path, O_RDONLY) \|\| die $!;
435
436	To open file for writing, create new file if needed or else truncate old file:
437
438	open(FH, "> $path") \|\| die $!;
439	sysopen(FH, $path, O_WRONLY\|O_TRUNC\|O_CREAT) \|\| die $!;
440	sysopen(FH, $path, O_WRONLY\|O_TRUNC\|O_CREAT, 0666) \|\| die $!;
441
442	To open file for writing, create new file, file must not exist:
443
444	sysopen(FH, $path, O_WRONLY\|O_EXCL\|O_CREAT) \|\| die $!;
445	sysopen(FH, $path, O_WRONLY\|O_EXCL\|O_CREAT, 0666) \|\| die $!;
446
447	To open file for appending, create if necessary:
448
449	open(FH, ">> $path") \|\| die $!;
450	sysopen(FH, $path, O_WRONLY\|O_APPEND\|O_CREAT) \|\| die $!;
451	sysopen(FH, $path, O_WRONLY\|O_APPEND\|O_CREAT, 0666) \|\| die $!;
452
453	To open file for appending, file must exist:
454
455	sysopen(FH, $path, O_WRONLY\|O_APPEND) \|\| die $!;
456
457	To open file for update, file must exist:
458
459	open(FH, "+< $path") \|\| die $!;
460	sysopen(FH, $path, O_RDWR) \|\| die $!;
461
462	To open file for update, create file if necessary:
463
464	sysopen(FH, $path, O_RDWR\|O_CREAT) \|\| die $!;
465	sysopen(FH, $path, O_RDWR\|O_CREAT, 0666) \|\| die $!;
466
467	To open file for update, file must not exist:
468
469	sysopen(FH, $path, O_RDWR\|O_EXCL\|O_CREAT) \|\| die $!;
470	sysopen(FH, $path, O_RDWR\|O_EXCL\|O_CREAT, 0666) \|\| die $!;
471
472	To open a file without blocking, creating if necessary:
473
474	sysopen(FH, "/foo/somefile", O_WRONLY\|O_NDELAY\|O_CREAT)
475	or die "can't open /foo/somefile: $!":
476
477	Be warned that neither creation nor deletion of files is guaranteed to
478	be an atomic operation over NFS. That is, two processes might both
479	successfully create or unlink the same file! Therefore O_EXCL
480	isn't as exclusive as you might wish.
481
482	See also the new L<perlopentut> if you have it (new for 5.6).
483
484	=head2 Why do I sometimes get an "Argument list too long" when I use E<lt>*E<gt>?
485	X<argument list too long>
486
487	The C<< <> >> operator performs a globbing operation (see above).
488	In Perl versions earlier than v5.6.0, the internal glob() operator forks
489	csh(1) to do the actual glob expansion, but
490	csh can't handle more than 127 items and so gives the error message
491	C<Argument list too long>. People who installed tcsh as csh won't
492	have this problem, but their users may be surprised by it.
493
494	To get around this, either upgrade to Perl v5.6.0 or later, do the glob
495	yourself with readdir() and patterns, or use a module like File::KGlob,
496	one that doesn't use the shell to do globbing.
497
498	=head2 Is there a leak/bug in glob()?
499	X<glob>
500
501	Due to the current implementation on some operating systems, when you
502	use the glob() function or its angle-bracket alias in a scalar
503	context, you may cause a memory leak and/or unpredictable behavior. It's
504	best therefore to use glob() only in list context.
505
506	=head2 How can I open a file with a leading ">" or trailing blanks?
507	X<filename, special characters>
508
509	(contributed by Brian McCauley)
510
511	The special two argument form of Perl's open() function ignores
512	trailing blanks in filenames and infers the mode from certain leading
513	characters (or a trailing "\|"). In older versions of Perl this was the
514	only version of open() and so it is prevalent in old code and books.
515
516	Unless you have a particular reason to use the two argument form you
517	should use the three argument form of open() which does not treat any
518	charcters in the filename as special.
519
520	open FILE, "<", " file "; # filename is " file "
521	open FILE, ">", ">file"; # filename is ">file"
522
523	=head2 How can I reliably rename a file?
524	X<rename> X<mv> X<move> X<file, rename> X<ren>
525
526	If your operating system supports a proper mv(1) utility or its
527	functional equivalent, this works:
528
529	rename($old, $new) or system("mv", $old, $new);
530
531	It may be more portable to use the File::Copy module instead.
532	You just copy to the new file to the new name (checking return
533	values), then delete the old one. This isn't really the same
534	semantically as a rename(), which preserves meta-information like
535	permissions, timestamps, inode info, etc.
536
537	Newer versions of File::Copy export a move() function.
538
539	=head2 How can I lock a file?
540	X<lock> X<file, lock> X<flock>
541
542	Perl's builtin flock() function (see L<perlfunc> for details) will call
543	flock(2) if that exists, fcntl(2) if it doesn't (on perl version 5.004 and
544	later), and lockf(3) if neither of the two previous system calls exists.
545	On some systems, it may even use a different form of native locking.
546	Here are some gotchas with Perl's flock():
547
548	=over 4
549
550	=item 1
551
552	Produces a fatal error if none of the three system calls (or their
553	close equivalent) exists.
554
555	=item 2
556
557	lockf(3) does not provide shared locking, and requires that the
558	filehandle be open for writing (or appending, or read/writing).
559
560	=item 3
561
562	Some versions of flock() can't lock files over a network (e.g. on NFS file
563	systems), so you'd need to force the use of fcntl(2) when you build Perl.
564	But even this is dubious at best. See the flock entry of L<perlfunc>
565	and the F<INSTALL> file in the source distribution for information on
566	building Perl to do this.
567
568	Two potentially non-obvious but traditional flock semantics are that
569	it waits indefinitely until the lock is granted, and that its locks are
570	I<merely advisory>. Such discretionary locks are more flexible, but
571	offer fewer guarantees. This means that files locked with flock() may
572	be modified by programs that do not also use flock(). Cars that stop
573	for red lights get on well with each other, but not with cars that don't
574	stop for red lights. See the perlport manpage, your port's specific
575	documentation, or your system-specific local manpages for details. It's
576	best to assume traditional behavior if you're writing portable programs.
577	(If you're not, you should as always feel perfectly free to write
578	for your own system's idiosyncrasies (sometimes called "features").
579	Slavish adherence to portability concerns shouldn't get in the way of
580	your getting your job done.)
581
582	For more information on file locking, see also
583	L<perlopentut/"File Locking"> if you have it (new for 5.6).
584
585	=back
586
587	=head2 Why can't I just open(FH, "E<gt>file.lock")?
588	X<lock, lockfile race condition>
589
590	A common bit of code B<NOT TO USE> is this:
591
592	sleep(3) while -e "file.lock"; # PLEASE DO NOT USE
593	open(LCK, "> file.lock"); # THIS BROKEN CODE
594
595	This is a classic race condition: you take two steps to do something
596	which must be done in one. That's why computer hardware provides an
597	atomic test-and-set instruction. In theory, this "ought" to work:
598
599	sysopen(FH, "file.lock", O_WRONLY\|O_EXCL\|O_CREAT)
600	or die "can't open file.lock: $!";
601
602	except that lamentably, file creation (and deletion) is not atomic
603	over NFS, so this won't work (at least, not every time) over the net.
604	Various schemes involving link() have been suggested, but
605	these tend to involve busy-wait, which is also subdesirable.
606
607	=head2 I still don't get locking. I just want to increment the number in the file. How can I do this?
608	X<counter> X<file, counter>
609
610	Didn't anyone ever tell you web-page hit counters were useless?
611	They don't count number of hits, they're a waste of time, and they serve
612	only to stroke the writer's vanity. It's better to pick a random number;
613	they're more realistic.
614
615	Anyway, this is what you can do if you can't help yourself.
616
617	use Fcntl qw(:DEFAULT :flock);
618	sysopen(FH, "numfile", O_RDWR\|O_CREAT) or die "can't open numfile: $!";
619	flock(FH, LOCK_EX) or die "can't flock numfile: $!";
620	$num = <FH> \|\| 0;
621	seek(FH, 0, 0) or die "can't rewind numfile: $!";
622	truncate(FH, 0) or die "can't truncate numfile: $!";
623	(print FH $num+1, "\n") or die "can't write numfile: $!";
624	close FH or die "can't close numfile: $!";
625
626	Here's a much better web-page hit counter:
627
628	$hits = int( (time() - 850_000_000) / rand(1_000) );
629
630	If the count doesn't impress your friends, then the code might. :-)
631
632	=head2 All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
633	X<append> X<file, append>
634
635	If you are on a system that correctly implements flock() and you use the
636	example appending code from "perldoc -f flock" everything will be OK
637	even if the OS you are on doesn't implement append mode correctly (if
638	such a system exists.) So if you are happy to restrict yourself to OSs
639	that implement flock() (and that's not really much of a restriction)
640	then that is what you should do.
641
642	If you know you are only going to use a system that does correctly
643	implement appending (i.e. not Win32) then you can omit the seek() from
644	the above code.
645
646	If you know you are only writing code to run on an OS and filesystem that
647	does implement append mode correctly (a local filesystem on a modern
648	Unix for example), and you keep the file in block-buffered mode and you
649	write less than one buffer-full of output between each manual flushing
650	of the buffer then each bufferload is almost guaranteed to be written to
651	the end of the file in one chunk without getting intermingled with
652	anyone else's output. You can also use the syswrite() function which is
653	simply a wrapper around your systems write(2) system call.
654
655	There is still a small theoretical chance that a signal will interrupt
656	the system level write() operation before completion. There is also a
657	possibility that some STDIO implementations may call multiple system
658	level write()s even if the buffer was empty to start. There may be some
659	systems where this probability is reduced to zero.
660
661	=head2 How do I randomly update a binary file?
662	X<file, binary patch>
663
664	If you're just trying to patch a binary, in many cases something as
665	simple as this works:
666
667	perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
668
669	However, if you have fixed sized records, then you might do something more
670	like this:
671
672	$RECSIZE = 220; # size of record, in bytes
673	$recno = 37; # which record to update
674	open(FH, "+<somewhere") \|\| die "can't update somewhere: $!";
675	seek(FH, $recno * $RECSIZE, 0);
676	read(FH, $record, $RECSIZE) == $RECSIZE \|\| die "can't read record $recno: $!";
677	# munge the record
678	seek(FH, -$RECSIZE, 1);
679	print FH $record;
680	close FH;
681
682	Locking and error checking are left as an exercise for the reader.
683	Don't forget them or you'll be quite sorry.
684
685	=head2 How do I get a file's timestamp in perl?
686	X<timestamp> X<file, timestamp>
687
688	If you want to retrieve the time at which the file was last
689	read, written, or had its meta-data (owner, etc) changed,
690	you use the B<-A>, B<-M>, or B<-C> file test operations as
691	documented in L<perlfunc>. These retrieve the age of the
692	file (measured against the start-time of your program) in
693	days as a floating point number. Some platforms may not have
694	all of these times. See L<perlport> for details. To
695	retrieve the "raw" time in seconds since the epoch, you
696	would call the stat function, then use localtime(),
697	gmtime(), or POSIX::strftime() to convert this into
698	human-readable form.
699
700	Here's an example:
701
702	$write_secs = (stat($file))[9];
703	printf "file %s updated at %s\n", $file,
704	scalar localtime($write_secs);
705
706	If you prefer something more legible, use the File::stat module
707	(part of the standard distribution in version 5.004 and later):
708
709	# error checking left as an exercise for reader.
710	use File::stat;
711	use Time::localtime;
712	$date_string = ctime(stat($file)->mtime);
713	print "file $file updated at $date_string\n";
714
715	The POSIX::strftime() approach has the benefit of being,
716	in theory, independent of the current locale. See L<perllocale>
717	for details.
718
719	=head2 How do I set a file's timestamp in perl?
720	X<timestamp> X<file, timestamp>
721
722	You use the utime() function documented in L<perlfunc/utime>.
723	By way of example, here's a little program that copies the
724	read and write times from its first argument to all the rest
725	of them.
726
727	if (@ARGV < 2) {
728	die "usage: cptimes timestamp_file other_files ...\n";
729	}
730	$timestamp = shift;
731	($atime, $mtime) = (stat($timestamp))[8,9];
732	utime $atime, $mtime, @ARGV;
733
734	Error checking is, as usual, left as an exercise for the reader.
735
736	The perldoc for utime also has an example that has the same
737	effect as touch(1) on files that I<already exist>.
738
739	Certain file systems have a limited ability to store the times
740	on a file at the expected level of precision. For example, the
741	FAT and HPFS filesystem are unable to create dates on files with
742	a finer granularity than two seconds. This is a limitation of
743	the filesystems, not of utime().
744
745	=head2 How do I print to more than one file at once?
746	X<print, to multiple files>
747
748	To connect one filehandle to several output filehandles,
749	you can use the IO::Tee or Tie::FileHandle::Multiplex modules.
750
751	If you only have to do this once, you can print individually
752	to each filehandle.
753
754	for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
755
756	=head2 How can I read in an entire file all at once?
757	X<slurp> X<file, slurping>
758
759	You can use the File::Slurp module to do it in one step.
760
761	use File::Slurp;
762
763	$all_of_it = read_file($filename); # entire file in scalar
764	@all_lines = read_file($filename); # one line perl element
765
766	The customary Perl approach for processing all the lines in a file is to
767	do so one line at a time:
768
769	open (INPUT, $file) \|\| die "can't open $file: $!";
770	while (<INPUT>) {
771	chomp;
772	# do something with $_
773	}
774	close(INPUT) \|\| die "can't close $file: $!";
775
776	This is tremendously more efficient than reading the entire file into
777	memory as an array of lines and then processing it one element at a time,
778	which is often--if not almost always--the wrong approach. Whenever
779	you see someone do this:
780
781	@lines = <INPUT>;
782
783	you should think long and hard about why you need everything loaded at
784	once. It's just not a scalable solution. You might also find it more
785	fun to use the standard Tie::File module, or the DB_File module's
786	$DB_RECNO bindings, which allow you to tie an array to a file so that
787	accessing an element the array actually accesses the corresponding
788	line in the file.
789
790	You can read the entire filehandle contents into a scalar.
791
792	{
793	local(*INPUT, $/);
794	open (INPUT, $file) \|\| die "can't open $file: $!";
795	$var = <INPUT>;
796	}
797
798	That temporarily undefs your record separator, and will automatically
799	close the file at block exit. If the file is already open, just use this:
800
801	$var = do { local $/; <INPUT> };
802
803	For ordinary files you can also use the read function.
804
805	read( INPUT, $var, -s INPUT );
806
807	The third argument tests the byte size of the data on the INPUT filehandle
808	and reads that many bytes into the buffer $var.
809
810	=head2 How can I read in a file by paragraphs?
811	X<file, reading by paragraphs>
812
813	Use the C<$/> variable (see L<perlvar> for details). You can either
814	set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">,
815	for instance, gets treated as two paragraphs and not three), or
816	C<"\n\n"> to accept empty paragraphs.
817
818	Note that a blank line must have no blanks in it. Thus
819	S<C<"fred\n \nstuff\n\n">> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
820
821	=head2 How can I read a single character from a file? From the keyboard?
822	X<getc> X<file, reading one character at a time>
823
824	You can use the builtin C<getc()> function for most filehandles, but
825	it won't (easily) work on a terminal device. For STDIN, either use
826	the Term::ReadKey module from CPAN or use the sample code in
827	L<perlfunc/getc>.
828
829	If your system supports the portable operating system programming
830	interface (POSIX), you can use the following code, which you'll note
831	turns off echo processing as well.
832
833	#!/usr/bin/perl -w
834	use strict;
835	$\| = 1;
836	for (1..4) {
837	my $got;
838	print "gimme: ";
839	$got = getone();
840	print "--> $got\n";
841	}
842	exit;
843
844	BEGIN {
845	use POSIX qw(:termios_h);
846
847	my ($term, $oterm, $echo, $noecho, $fd_stdin);
848
849	$fd_stdin = fileno(STDIN);
850
851	$term = POSIX::Termios->new();
852	$term->getattr($fd_stdin);
853	$oterm = $term->getlflag();
854
855	$echo = ECHO \| ECHOK \| ICANON;
856	$noecho = $oterm & ~$echo;
857
858	sub cbreak {
859	$term->setlflag($noecho);
860	$term->setcc(VTIME, 1);
861	$term->setattr($fd_stdin, TCSANOW);
862	}
863
864	sub cooked {
865	$term->setlflag($oterm);
866	$term->setcc(VTIME, 0);
867	$term->setattr($fd_stdin, TCSANOW);
868	}
869
870	sub getone {
871	my $key = '';
872	cbreak();
873	sysread(STDIN, $key, 1);
874	cooked();
875	return $key;
876	}
877
878	}
879
880	END { cooked() }
881
882	The Term::ReadKey module from CPAN may be easier to use. Recent versions
883	include also support for non-portable systems as well.
884
885	use Term::ReadKey;
886	open(TTY, "</dev/tty");
887	print "Gimme a char: ";
888	ReadMode "raw";
889	$key = ReadKey 0, *TTY;
890	ReadMode "normal";
891	printf "\nYou said %s, char number %03d\n",
892	$key, ord $key;
893
894	=head2 How can I tell whether there's a character waiting on a filehandle?
895
896	The very first thing you should do is look into getting the Term::ReadKey
897	extension from CPAN. As we mentioned earlier, it now even has limited
898	support for non-portable (read: not open systems, closed, proprietary,
899	not POSIX, not Unix, etc) systems.
900
901	You should also check out the Frequently Asked Questions list in
902	comp.unix.* for things like this: the answer is essentially the same.
903	It's very system dependent. Here's one solution that works on BSD
904	systems:
905
906	sub key_ready {
907	my($rin, $nfd);
908	vec($rin, fileno(STDIN), 1) = 1;
909	return $nfd = select($rin,undef,undef,0);
910	}
911
912	If you want to find out how many characters are waiting, there's
913	also the FIONREAD ioctl call to be looked at. The I<h2ph> tool that
914	comes with Perl tries to convert C include files to Perl code, which
915	can be C<require>d. FIONREAD ends up defined as a function in the
916	I<sys/ioctl.ph> file:
917
918	require 'sys/ioctl.ph';
919
920	$size = pack("L", 0);
921	ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n";
922	$size = unpack("L", $size);
923
924	If I<h2ph> wasn't installed or doesn't work for you, you can
925	I<grep> the include files by hand:
926
927	% grep FIONREAD /usr/include//
928	/usr/include/asm/ioctls.h:#define FIONREAD 0x541B
929
930	Or write a small C program using the editor of champions:
931
932	% cat > fionread.c
933	#include <sys/ioctl.h>
934	main() {
935	printf("%#08x\n", FIONREAD);
936	}
937	^D
938	% cc -o fionread fionread.c
939	% ./fionread
940	0x4004667f
941
942	And then hard code it, leaving porting as an exercise to your successor.
943
944	$FIONREAD = 0x4004667f; # XXX: opsys dependent
945
946	$size = pack("L", 0);
947	ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n";
948	$size = unpack("L", $size);
949
950	FIONREAD requires a filehandle connected to a stream, meaning that sockets,
951	pipes, and tty devices work, but I<not> files.
952
953	=head2 How do I do a C<tail -f> in perl?
954	X<tail>
955
956	First try
957
958	seek(GWFILE, 0, 1);
959
960	The statement C<seek(GWFILE, 0, 1)> doesn't change the current position,
961	but it does clear the end-of-file condition on the handle, so that the
962	next <GWFILE> makes Perl try again to read something.
963
964	If that doesn't work (it relies on features of your stdio implementation),
965	then you need something more like this:
966
967	for (;;) {
968	for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
969	# search for some stuff and put it into files
970	}
971	# sleep for a while
972	seek(GWFILE, $curpos, 0); # seek to where we had been
973	}
974
975	If this still doesn't work, look into the POSIX module. POSIX defines
976	the clearerr() method, which can remove the end of file condition on a
977	filehandle. The method: read until end of file, clearerr(), read some
978	more. Lather, rinse, repeat.
979
980	There's also a File::Tail module from CPAN.
981
982	=head2 How do I dup() a filehandle in Perl?
983	X<dup>
984
985	If you check L<perlfunc/open>, you'll see that several of the ways
986	to call open() should do the trick. For example:
987
988	open(LOG, ">>/foo/logfile");
989	open(STDERR, ">&LOG");
990
991	Or even with a literal numeric descriptor:
992
993	$fd = $ENV{MHCONTEXTFD};
994	open(MHCONTEXT, "<&=$fd"); # like fdopen(3S)
995
996	Note that "<&STDIN" makes a copy, but "<&=STDIN" make
997	an alias. That means if you close an aliased handle, all
998	aliases become inaccessible. This is not true with
999	a copied one.
1000
1001	Error checking, as always, has been left as an exercise for the reader.
1002
1003	=head2 How do I close a file descriptor by number?
1004	X<file, closing file descriptors>
1005
1006	This should rarely be necessary, as the Perl close() function is to be
1007	used for things that Perl opened itself, even if it was a dup of a
1008	numeric descriptor as with MHCONTEXT above. But if you really have
1009	to, you may be able to do this:
1010
1011	require 'sys/syscall.ph';
1012	$rc = syscall(&SYS_close, $fd + 0); # must force numeric
1013	die "can't sysclose $fd: $!" unless $rc == -1;
1014
1015	Or, just use the fdopen(3S) feature of open():
1016
1017	{
1018	local *F;
1019	open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!";
1020	close F;
1021	}
1022
1023	=head2 Why can't I use "C:\temp\foo" in DOS paths? Why doesn't `C:\temp\foo.exe` work?
1024	X<filename, DOS issues>
1025
1026	Whoops! You just put a tab and a formfeed into that filename!
1027	Remember that within double quoted strings ("like\this"), the
1028	backslash is an escape character. The full list of these is in
1029	L<perlop/Quote and Quote-like Operators>. Unsurprisingly, you don't
1030	have a file called "c:(tab)emp(formfeed)oo" or
1031	"c:(tab)emp(formfeed)oo.exe" on your legacy DOS filesystem.
1032
1033	Either single-quote your strings, or (preferably) use forward slashes.
1034	Since all DOS and Windows versions since something like MS-DOS 2.0 or so
1035	have treated C</> and C<\> the same in a path, you might as well use the
1036	one that doesn't clash with Perl--or the POSIX shell, ANSI C and C++,
1037	awk, Tcl, Java, or Python, just to mention a few. POSIX paths
1038	are more portable, too.
1039
1040	=head2 Why doesn't glob(".") get all the files?
1041	X<glob>
1042
1043	Because even on non-Unix ports, Perl's glob function follows standard
1044	Unix globbing semantics. You'll need C<glob("*")> to get all (non-hidden)
1045	files. This makes glob() portable even to legacy systems. Your
1046	port may include proprietary globbing functions as well. Check its
1047	documentation for details.
1048
1049	=head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl?
1050
1051	This is elaborately and painstakingly described in the
1052	F<file-dir-perms> article in the "Far More Than You Ever Wanted To
1053	Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz .
1054
1055	The executive summary: learn how your filesystem works. The
1056	permissions on a file say what can happen to the data in that file.
1057	The permissions on a directory say what can happen to the list of
1058	files in that directory. If you delete a file, you're removing its
1059	name from the directory (so the operation depends on the permissions
1060	of the directory, not of the file). If you try to write to the file,
1061	the permissions of the file govern whether you're allowed to.
1062
1063	=head2 How do I select a random line from a file?
1064	X<file, selecting a random line>
1065
1066	Here's an algorithm from the Camel Book:
1067
1068	srand;
1069	rand($.) < 1 && ($line = $_) while <>;
1070
1071	This has a significant advantage in space over reading the whole file
1072	in. You can find a proof of this method in I<The Art of Computer
1073	Programming>, Volume 2, Section 3.4.2, by Donald E. Knuth.
1074
1075	You can use the File::Random module which provides a function
1076	for that algorithm:
1077
1078	use File::Random qw/random_line/;
1079	my $line = random_line($filename);
1080
1081	Another way is to use the Tie::File module, which treats the entire
1082	file as an array. Simply access a random array element.
1083
1084	=head2 Why do I get weird spaces when I print an array of lines?
1085
1086	Saying
1087
1088	print "@lines\n";
1089
1090	joins together the elements of C<@lines> with a space between them.
1091	If C<@lines> were C<("little", "fluffy", "clouds")> then the above
1092	statement would print
1093
1094	little fluffy clouds
1095
1096	but if each element of C<@lines> was a line of text, ending a newline
1097	character C<("little\n", "fluffy\n", "clouds\n")> then it would print:
1098
1099	little
1100	fluffy
1101	clouds
1102
1103	If your array contains lines, just print them:
1104
1105	print @lines;
1106
1107	=head1 AUTHOR AND COPYRIGHT
1108
1109	Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and
1110	other authors as noted. All rights reserved.
1111
1112	This documentation is free; you can redistribute it and/or modify it
1113	under the same terms as Perl itself.
1114
1115	Irrespective of its distribution, all code examples here are in the public
1116	domain. You are permitted and encouraged to use this code and any
1117	derivatives thereof in your own programs for fun or for profit as you
1118	see fit. A simple comment in the code giving credit to the FAQ would
1119	be courteous but is not required.

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format