1 | =head1 NAME
|
---|
2 | X<format> X<report> X<chart>
|
---|
3 |
|
---|
4 | perlform - Perl formats
|
---|
5 |
|
---|
6 | =head1 DESCRIPTION
|
---|
7 |
|
---|
8 | Perl has a mechanism to help you generate simple reports and charts. To
|
---|
9 | facilitate this, Perl helps you code up your output page close to how it
|
---|
10 | will look when it's printed. It can keep track of things like how many
|
---|
11 | lines are on a page, what page you're on, when to print page headers,
|
---|
12 | etc. Keywords are borrowed from FORTRAN: format() to declare and write()
|
---|
13 | to execute; see their entries in L<perlfunc>. Fortunately, the layout is
|
---|
14 | much more legible, more like BASIC's PRINT USING statement. Think of it
|
---|
15 | as a poor man's nroff(1).
|
---|
16 | X<nroff>
|
---|
17 |
|
---|
18 | Formats, like packages and subroutines, are declared rather than
|
---|
19 | executed, so they may occur at any point in your program. (Usually it's
|
---|
20 | best to keep them all together though.) They have their own namespace
|
---|
21 | apart from all the other "types" in Perl. This means that if you have a
|
---|
22 | function named "Foo", it is not the same thing as having a format named
|
---|
23 | "Foo". However, the default name for the format associated with a given
|
---|
24 | filehandle is the same as the name of the filehandle. Thus, the default
|
---|
25 | format for STDOUT is named "STDOUT", and the default format for filehandle
|
---|
26 | TEMP is named "TEMP". They just look the same. They aren't.
|
---|
27 |
|
---|
28 | Output record formats are declared as follows:
|
---|
29 |
|
---|
30 | format NAME =
|
---|
31 | FORMLIST
|
---|
32 | .
|
---|
33 |
|
---|
34 | If the name is omitted, format "STDOUT" is defined. A single "." in
|
---|
35 | column 1 is used to terminate a format. FORMLIST consists of a sequence
|
---|
36 | of lines, each of which may be one of three types:
|
---|
37 |
|
---|
38 | =over 4
|
---|
39 |
|
---|
40 | =item 1.
|
---|
41 |
|
---|
42 | A comment, indicated by putting a '#' in the first column.
|
---|
43 |
|
---|
44 | =item 2.
|
---|
45 |
|
---|
46 | A "picture" line giving the format for one output line.
|
---|
47 |
|
---|
48 | =item 3.
|
---|
49 |
|
---|
50 | An argument line supplying values to plug into the previous picture line.
|
---|
51 |
|
---|
52 | =back
|
---|
53 |
|
---|
54 | Picture lines contain output field definitions, intermingled with
|
---|
55 | literal text. These lines do not undergo any kind of variable interpolation.
|
---|
56 | Field definitions are made up from a set of characters, for starting and
|
---|
57 | extending a field to its desired width. This is the complete set of
|
---|
58 | characters for field definitions:
|
---|
59 | X<format, picture line>
|
---|
60 | X<@> X<^> X<< < >> X<< | >> X<< > >> X<#> X<0> X<.> X<...>
|
---|
61 | X<@*> X<^*> X<~> X<~~>
|
---|
62 |
|
---|
63 | @ start of regular field
|
---|
64 | ^ start of special field
|
---|
65 | < pad character for left adjustification
|
---|
66 | | pad character for centering
|
---|
67 | > pad character for right adjustificat
|
---|
68 | # pad character for a right justified numeric field
|
---|
69 | 0 instead of first #: pad number with leading zeroes
|
---|
70 | . decimal point within a numeric field
|
---|
71 | ... terminate a text field, show "..." as truncation evidence
|
---|
72 | @* variable width field for a multi-line value
|
---|
73 | ^* variable width field for next line of a multi-line value
|
---|
74 | ~ suppress line with all fields empty
|
---|
75 | ~~ repeat line until all fields are exhausted
|
---|
76 |
|
---|
77 | Each field in a picture line starts with either "@" (at) or "^" (caret),
|
---|
78 | indicating what we'll call, respectively, a "regular" or "special" field.
|
---|
79 | The choice of pad characters determines whether a field is textual or
|
---|
80 | numeric. The tilde operators are not part of a field. Let's look at
|
---|
81 | the various possibilities in detail.
|
---|
82 |
|
---|
83 |
|
---|
84 | =head2 Text Fields
|
---|
85 | X<format, text field>
|
---|
86 |
|
---|
87 | The length of the field is supplied by padding out the field with multiple
|
---|
88 | "E<lt>", "E<gt>", or "|" characters to specify a non-numeric field with,
|
---|
89 | respectively, left justification, right justification, or centering.
|
---|
90 | For a regular field, the value (up to the first newline) is taken and
|
---|
91 | printed according to the selected justification, truncating excess characters.
|
---|
92 | If you terminate a text field with "...", three dots will be shown if
|
---|
93 | the value is truncated. A special text field may be used to do rudimentary
|
---|
94 | multi-line text block filling; see L</Using Fill Mode> for details.
|
---|
95 |
|
---|
96 | Example:
|
---|
97 | format STDOUT =
|
---|
98 | @<<<<<< @|||||| @>>>>>>
|
---|
99 | "left", "middle", "right"
|
---|
100 | .
|
---|
101 | Output:
|
---|
102 | left middle right
|
---|
103 |
|
---|
104 |
|
---|
105 | =head2 Numeric Fields
|
---|
106 | X<#> X<format, numeric field>
|
---|
107 |
|
---|
108 | Using "#" as a padding character specifies a numeric field, with
|
---|
109 | right justification. An optional "." defines the position of the
|
---|
110 | decimal point. With a "0" (zero) instead of the first "#", the
|
---|
111 | formatted number will be padded with leading zeroes if necessary.
|
---|
112 | A special numeric field is blanked out if the value is undefined.
|
---|
113 | If the resulting value would exceed the width specified the field is
|
---|
114 | filled with "#" as overflow evidence.
|
---|
115 |
|
---|
116 | Example:
|
---|
117 | format STDOUT =
|
---|
118 | @### @.### @##.### @### @### ^####
|
---|
119 | 42, 3.1415, undef, 0, 10000, undef
|
---|
120 | .
|
---|
121 | Output:
|
---|
122 | 42 3.142 0.000 0 ####
|
---|
123 |
|
---|
124 |
|
---|
125 | =head2 The Field @* for Variable Width Multi-Line Text
|
---|
126 | X<@*>
|
---|
127 |
|
---|
128 | The field "@*" can be used for printing multi-line, nontruncated
|
---|
129 | values; it should (but need not) appear by itself on a line. A final
|
---|
130 | line feed is chomped off, but all other characters are emitted verbatim.
|
---|
131 |
|
---|
132 |
|
---|
133 | =head2 The Field ^* for Variable Width One-line-at-a-time Text
|
---|
134 | X<^*>
|
---|
135 |
|
---|
136 | Like "@*", this is a variable width field. The value supplied must be a
|
---|
137 | scalar variable. Perl puts the first line (up to the first "\n") of the
|
---|
138 | text into the field, and then chops off the front of the string so that
|
---|
139 | the next time the variable is referenced, more of the text can be printed.
|
---|
140 | The variable will I<not> be restored.
|
---|
141 |
|
---|
142 | Example:
|
---|
143 | $text = "line 1\nline 2\nline 3";
|
---|
144 | format STDOUT =
|
---|
145 | Text: ^*
|
---|
146 | $text
|
---|
147 | ~~ ^*
|
---|
148 | $text
|
---|
149 | .
|
---|
150 | Output:
|
---|
151 | Text: line 1
|
---|
152 | line 2
|
---|
153 | line 3
|
---|
154 |
|
---|
155 |
|
---|
156 | =head2 Specifying Values
|
---|
157 | X<format, specifying values>
|
---|
158 |
|
---|
159 | The values are specified on the following format line in the same order as
|
---|
160 | the picture fields. The expressions providing the values must be
|
---|
161 | separated by commas. They are all evaluated in a list context
|
---|
162 | before the line is processed, so a single list expression could produce
|
---|
163 | multiple list elements. The expressions may be spread out to more than
|
---|
164 | one line if enclosed in braces. If so, the opening brace must be the first
|
---|
165 | token on the first line. If an expression evaluates to a number with a
|
---|
166 | decimal part, and if the corresponding picture specifies that the decimal
|
---|
167 | part should appear in the output (that is, any picture except multiple "#"
|
---|
168 | characters B<without> an embedded "."), the character used for the decimal
|
---|
169 | point is B<always> determined by the current LC_NUMERIC locale. This
|
---|
170 | means that, if, for example, the run-time environment happens to specify a
|
---|
171 | German locale, "," will be used instead of the default ".". See
|
---|
172 | L<perllocale> and L<"WARNINGS"> for more information.
|
---|
173 |
|
---|
174 |
|
---|
175 | =head2 Using Fill Mode
|
---|
176 | X<format, fill mode>
|
---|
177 |
|
---|
178 | On text fields the caret enables a kind of fill mode. Instead of an
|
---|
179 | arbitrary expression, the value supplied must be a scalar variable
|
---|
180 | that contains a text string. Perl puts the next portion of the text into
|
---|
181 | the field, and then chops off the front of the string so that the next time
|
---|
182 | the variable is referenced, more of the text can be printed. (Yes, this
|
---|
183 | means that the variable itself is altered during execution of the write()
|
---|
184 | call, and is not restored.) The next portion of text is determined by
|
---|
185 | a crude line breaking algorithm. You may use the carriage return character
|
---|
186 | (C<\r>) to force a line break. You can change which characters are legal
|
---|
187 | to break on by changing the variable C<$:> (that's
|
---|
188 | $FORMAT_LINE_BREAK_CHARACTERS if you're using the English module) to a
|
---|
189 | list of the desired characters.
|
---|
190 |
|
---|
191 | Normally you would use a sequence of fields in a vertical stack associated
|
---|
192 | with the same scalar variable to print out a block of text. You might wish
|
---|
193 | to end the final field with the text "...", which will appear in the output
|
---|
194 | if the text was too long to appear in its entirety.
|
---|
195 |
|
---|
196 |
|
---|
197 | =head2 Suppressing Lines Where All Fields Are Void
|
---|
198 | X<format, suppressing lines>
|
---|
199 |
|
---|
200 | Using caret fields can produce lines where all fields are blank. You can
|
---|
201 | suppress such lines by putting a "~" (tilde) character anywhere in the
|
---|
202 | line. The tilde will be translated to a space upon output.
|
---|
203 |
|
---|
204 |
|
---|
205 | =head2 Repeating Format Lines
|
---|
206 | X<format, repeating lines>
|
---|
207 |
|
---|
208 | If you put two contiguous tilde characters "~~" anywhere into a line,
|
---|
209 | the line will be repeated until all the fields on the line are exhausted,
|
---|
210 | i.e. undefined. For special (caret) text fields this will occur sooner or
|
---|
211 | later, but if you use a text field of the at variety, the expression you
|
---|
212 | supply had better not give the same value every time forever! (C<shift(@f)>
|
---|
213 | is a simple example that would work.) Don't use a regular (at) numeric
|
---|
214 | field in such lines, because it will never go blank.
|
---|
215 |
|
---|
216 |
|
---|
217 | =head2 Top of Form Processing
|
---|
218 | X<format, top of form> X<top> X<header>
|
---|
219 |
|
---|
220 | Top-of-form processing is by default handled by a format with the
|
---|
221 | same name as the current filehandle with "_TOP" concatenated to it.
|
---|
222 | It's triggered at the top of each page. See L<perlfunc/write>.
|
---|
223 |
|
---|
224 | Examples:
|
---|
225 |
|
---|
226 | # a report on the /etc/passwd file
|
---|
227 | format STDOUT_TOP =
|
---|
228 | Passwd File
|
---|
229 | Name Login Office Uid Gid Home
|
---|
230 | ------------------------------------------------------------------
|
---|
231 | .
|
---|
232 | format STDOUT =
|
---|
233 | @<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
|
---|
234 | $name, $login, $office,$uid,$gid, $home
|
---|
235 | .
|
---|
236 |
|
---|
237 |
|
---|
238 | # a report from a bug report form
|
---|
239 | format STDOUT_TOP =
|
---|
240 | Bug Reports
|
---|
241 | @<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
|
---|
242 | $system, $%, $date
|
---|
243 | ------------------------------------------------------------------
|
---|
244 | .
|
---|
245 | format STDOUT =
|
---|
246 | Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
247 | $subject
|
---|
248 | Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
249 | $index, $description
|
---|
250 | Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
251 | $priority, $date, $description
|
---|
252 | From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
253 | $from, $description
|
---|
254 | Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
255 | $programmer, $description
|
---|
256 | ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
257 | $description
|
---|
258 | ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
259 | $description
|
---|
260 | ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
261 | $description
|
---|
262 | ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
263 | $description
|
---|
264 | ~ ^<<<<<<<<<<<<<<<<<<<<<<<...
|
---|
265 | $description
|
---|
266 | .
|
---|
267 |
|
---|
268 | It is possible to intermix print()s with write()s on the same output
|
---|
269 | channel, but you'll have to handle C<$-> (C<$FORMAT_LINES_LEFT>)
|
---|
270 | yourself.
|
---|
271 |
|
---|
272 | =head2 Format Variables
|
---|
273 | X<format variables>
|
---|
274 | X<format, variables>
|
---|
275 |
|
---|
276 | The current format name is stored in the variable C<$~> (C<$FORMAT_NAME>),
|
---|
277 | and the current top of form format name is in C<$^> (C<$FORMAT_TOP_NAME>).
|
---|
278 | The current output page number is stored in C<$%> (C<$FORMAT_PAGE_NUMBER>),
|
---|
279 | and the number of lines on the page is in C<$=> (C<$FORMAT_LINES_PER_PAGE>).
|
---|
280 | Whether to autoflush output on this handle is stored in C<$|>
|
---|
281 | (C<$OUTPUT_AUTOFLUSH>). The string output before each top of page (except
|
---|
282 | the first) is stored in C<$^L> (C<$FORMAT_FORMFEED>). These variables are
|
---|
283 | set on a per-filehandle basis, so you'll need to select() into a different
|
---|
284 | one to affect them:
|
---|
285 |
|
---|
286 | select((select(OUTF),
|
---|
287 | $~ = "My_Other_Format",
|
---|
288 | $^ = "My_Top_Format"
|
---|
289 | )[0]);
|
---|
290 |
|
---|
291 | Pretty ugly, eh? It's a common idiom though, so don't be too surprised
|
---|
292 | when you see it. You can at least use a temporary variable to hold
|
---|
293 | the previous filehandle: (this is a much better approach in general,
|
---|
294 | because not only does legibility improve, you now have intermediary
|
---|
295 | stage in the expression to single-step the debugger through):
|
---|
296 |
|
---|
297 | $ofh = select(OUTF);
|
---|
298 | $~ = "My_Other_Format";
|
---|
299 | $^ = "My_Top_Format";
|
---|
300 | select($ofh);
|
---|
301 |
|
---|
302 | If you use the English module, you can even read the variable names:
|
---|
303 |
|
---|
304 | use English '-no_match_vars';
|
---|
305 | $ofh = select(OUTF);
|
---|
306 | $FORMAT_NAME = "My_Other_Format";
|
---|
307 | $FORMAT_TOP_NAME = "My_Top_Format";
|
---|
308 | select($ofh);
|
---|
309 |
|
---|
310 | But you still have those funny select()s. So just use the FileHandle
|
---|
311 | module. Now, you can access these special variables using lowercase
|
---|
312 | method names instead:
|
---|
313 |
|
---|
314 | use FileHandle;
|
---|
315 | format_name OUTF "My_Other_Format";
|
---|
316 | format_top_name OUTF "My_Top_Format";
|
---|
317 |
|
---|
318 | Much better!
|
---|
319 |
|
---|
320 | =head1 NOTES
|
---|
321 |
|
---|
322 | Because the values line may contain arbitrary expressions (for at fields,
|
---|
323 | not caret fields), you can farm out more sophisticated processing
|
---|
324 | to other functions, like sprintf() or one of your own. For example:
|
---|
325 |
|
---|
326 | format Ident =
|
---|
327 | @<<<<<<<<<<<<<<<
|
---|
328 | &commify($n)
|
---|
329 | .
|
---|
330 |
|
---|
331 | To get a real at or caret into the field, do this:
|
---|
332 |
|
---|
333 | format Ident =
|
---|
334 | I have an @ here.
|
---|
335 | "@"
|
---|
336 | .
|
---|
337 |
|
---|
338 | To center a whole line of text, do something like this:
|
---|
339 |
|
---|
340 | format Ident =
|
---|
341 | @|||||||||||||||||||||||||||||||||||||||||||||||
|
---|
342 | "Some text line"
|
---|
343 | .
|
---|
344 |
|
---|
345 | There is no builtin way to say "float this to the right hand side
|
---|
346 | of the page, however wide it is." You have to specify where it goes.
|
---|
347 | The truly desperate can generate their own format on the fly, based
|
---|
348 | on the current number of columns, and then eval() it:
|
---|
349 |
|
---|
350 | $format = "format STDOUT = \n"
|
---|
351 | . '^' . '<' x $cols . "\n"
|
---|
352 | . '$entry' . "\n"
|
---|
353 | . "\t^" . "<" x ($cols-8) . "~~\n"
|
---|
354 | . '$entry' . "\n"
|
---|
355 | . ".\n";
|
---|
356 | print $format if $Debugging;
|
---|
357 | eval $format;
|
---|
358 | die $@ if $@;
|
---|
359 |
|
---|
360 | Which would generate a format looking something like this:
|
---|
361 |
|
---|
362 | format STDOUT =
|
---|
363 | ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
---|
364 | $entry
|
---|
365 | ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
|
---|
366 | $entry
|
---|
367 | .
|
---|
368 |
|
---|
369 | Here's a little program that's somewhat like fmt(1):
|
---|
370 |
|
---|
371 | format =
|
---|
372 | ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
|
---|
373 | $_
|
---|
374 |
|
---|
375 | .
|
---|
376 |
|
---|
377 | $/ = '';
|
---|
378 | while (<>) {
|
---|
379 | s/\s*\n\s*/ /g;
|
---|
380 | write;
|
---|
381 | }
|
---|
382 |
|
---|
383 | =head2 Footers
|
---|
384 | X<format, footer> X<footer>
|
---|
385 |
|
---|
386 | While $FORMAT_TOP_NAME contains the name of the current header format,
|
---|
387 | there is no corresponding mechanism to automatically do the same thing
|
---|
388 | for a footer. Not knowing how big a format is going to be until you
|
---|
389 | evaluate it is one of the major problems. It's on the TODO list.
|
---|
390 |
|
---|
391 | Here's one strategy: If you have a fixed-size footer, you can get footers
|
---|
392 | by checking $FORMAT_LINES_LEFT before each write() and print the footer
|
---|
393 | yourself if necessary.
|
---|
394 |
|
---|
395 | Here's another strategy: Open a pipe to yourself, using C<open(MYSELF, "|-")>
|
---|
396 | (see L<perlfunc/open()>) and always write() to MYSELF instead of STDOUT.
|
---|
397 | Have your child process massage its STDIN to rearrange headers and footers
|
---|
398 | however you like. Not very convenient, but doable.
|
---|
399 |
|
---|
400 | =head2 Accessing Formatting Internals
|
---|
401 | X<format, internals>
|
---|
402 |
|
---|
403 | For low-level access to the formatting mechanism. you may use formline()
|
---|
404 | and access C<$^A> (the $ACCUMULATOR variable) directly.
|
---|
405 |
|
---|
406 | For example:
|
---|
407 |
|
---|
408 | $str = formline <<'END', 1,2,3;
|
---|
409 | @<<< @||| @>>>
|
---|
410 | END
|
---|
411 |
|
---|
412 | print "Wow, I just stored `$^A' in the accumulator!\n";
|
---|
413 |
|
---|
414 | Or to make an swrite() subroutine, which is to write() what sprintf()
|
---|
415 | is to printf(), do this:
|
---|
416 |
|
---|
417 | use Carp;
|
---|
418 | sub swrite {
|
---|
419 | croak "usage: swrite PICTURE ARGS" unless @_;
|
---|
420 | my $format = shift;
|
---|
421 | $^A = "";
|
---|
422 | formline($format,@_);
|
---|
423 | return $^A;
|
---|
424 | }
|
---|
425 |
|
---|
426 | $string = swrite(<<'END', 1, 2, 3);
|
---|
427 | Check me out
|
---|
428 | @<<< @||| @>>>
|
---|
429 | END
|
---|
430 | print $string;
|
---|
431 |
|
---|
432 | =head1 WARNINGS
|
---|
433 |
|
---|
434 | The lone dot that ends a format can also prematurely end a mail
|
---|
435 | message passing through a misconfigured Internet mailer (and based on
|
---|
436 | experience, such misconfiguration is the rule, not the exception). So
|
---|
437 | when sending format code through mail, you should indent it so that
|
---|
438 | the format-ending dot is not on the left margin; this will prevent
|
---|
439 | SMTP cutoff.
|
---|
440 |
|
---|
441 | Lexical variables (declared with "my") are not visible within a
|
---|
442 | format unless the format is declared within the scope of the lexical
|
---|
443 | variable. (They weren't visible at all before version 5.001.)
|
---|
444 |
|
---|
445 | Formats are the only part of Perl that unconditionally use information
|
---|
446 | from a program's locale; if a program's environment specifies an
|
---|
447 | LC_NUMERIC locale, it is always used to specify the decimal point
|
---|
448 | character in formatted output. Perl ignores all other aspects of locale
|
---|
449 | handling unless the C<use locale> pragma is in effect. Formatted output
|
---|
450 | cannot be controlled by C<use locale> because the pragma is tied to the
|
---|
451 | block structure of the program, and, for historical reasons, formats
|
---|
452 | exist outside that block structure. See L<perllocale> for further
|
---|
453 | discussion of locale handling.
|
---|
454 |
|
---|
455 | Within strings that are to be displayed in a fixed length text field,
|
---|
456 | each control character is substituted by a space. (But remember the
|
---|
457 | special meaning of C<\r> when using fill mode.) This is done to avoid
|
---|
458 | misalignment when control characters "disappear" on some output media.
|
---|
459 |
|
---|