1 |
|
---|
2 | =head1 NAME
|
---|
3 |
|
---|
4 | perlpodspec - Plain Old Documentation: format specification and notes
|
---|
5 |
|
---|
6 | =head1 DESCRIPTION
|
---|
7 |
|
---|
8 | This document is detailed notes on the Pod markup language. Most
|
---|
9 | people will only have to read L<perlpod|perlpod> to know how to write
|
---|
10 | in Pod, but this document may answer some incidental questions to do
|
---|
11 | with parsing and rendering Pod.
|
---|
12 |
|
---|
13 | In this document, "must" / "must not", "should" /
|
---|
14 | "should not", and "may" have their conventional (cf. RFC 2119)
|
---|
15 | meanings: "X must do Y" means that if X doesn't do Y, it's against
|
---|
16 | this specification, and should really be fixed. "X should do Y"
|
---|
17 | means that it's recommended, but X may fail to do Y, if there's a
|
---|
18 | good reason. "X may do Y" is merely a note that X can do Y at
|
---|
19 | will (although it is up to the reader to detect any connotation of
|
---|
20 | "and I think it would be I<nice> if X did Y" versus "it wouldn't
|
---|
21 | really I<bother> me if X did Y").
|
---|
22 |
|
---|
23 | Notably, when I say "the parser should do Y", the
|
---|
24 | parser may fail to do Y, if the calling application explicitly
|
---|
25 | requests that the parser I<not> do Y. I often phrase this as
|
---|
26 | "the parser should, by default, do Y." This doesn't I<require>
|
---|
27 | the parser to provide an option for turning off whatever
|
---|
28 | feature Y is (like expanding tabs in verbatim paragraphs), although
|
---|
29 | it implicates that such an option I<may> be provided.
|
---|
30 |
|
---|
31 | =head1 Pod Definitions
|
---|
32 |
|
---|
33 | Pod is embedded in files, typically Perl source files -- although you
|
---|
34 | can write a file that's nothing but Pod.
|
---|
35 |
|
---|
36 | A B<line> in a file consists of zero or more non-newline characters,
|
---|
37 | terminated by either a newline or the end of the file.
|
---|
38 |
|
---|
39 | A B<newline sequence> is usually a platform-dependent concept, but
|
---|
40 | Pod parsers should understand it to mean any of CR (ASCII 13), LF
|
---|
41 | (ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in
|
---|
42 | addition to any other system-specific meaning. The first CR/CRLF/LF
|
---|
43 | sequence in the file may be used as the basis for identifying the
|
---|
44 | newline sequence for parsing the rest of the file.
|
---|
45 |
|
---|
46 | A B<blank line> is a line consisting entirely of zero or more spaces
|
---|
47 | (ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file.
|
---|
48 | A B<non-blank line> is a line containing one or more characters other
|
---|
49 | than space or tab (and terminated by a newline or end-of-file).
|
---|
50 |
|
---|
51 | (I<Note:> Many older Pod parsers did not accept a line consisting of
|
---|
52 | spaces/tabs and then a newline as a blank line -- the only lines they
|
---|
53 | considered blank were lines consisting of I<no characters at all>,
|
---|
54 | terminated by a newline.)
|
---|
55 |
|
---|
56 | B<Whitespace> is used in this document as a blanket term for spaces,
|
---|
57 | tabs, and newline sequences. (By itself, this term usually refers
|
---|
58 | to literal whitespace. That is, sequences of whitespace characters
|
---|
59 | in Pod source, as opposed to "EE<lt>32>", which is a formatting
|
---|
60 | code that I<denotes> a whitespace character.)
|
---|
61 |
|
---|
62 | A B<Pod parser> is a module meant for parsing Pod (regardless of
|
---|
63 | whether this involves calling callbacks or building a parse tree or
|
---|
64 | directly formatting it). A B<Pod formatter> (or B<Pod translator>)
|
---|
65 | is a module or program that converts Pod to some other format (HTML,
|
---|
66 | plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a
|
---|
67 | formatter or translator, or might be a program that does something
|
---|
68 | else with the Pod (like wordcounting it, scanning for index points,
|
---|
69 | etc.).
|
---|
70 |
|
---|
71 | Pod content is contained in B<Pod blocks>. A Pod block starts with a
|
---|
72 | line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line
|
---|
73 | that matches C<m/\A=cut/> -- or up to the end of the file, if there is
|
---|
74 | no C<m/\A=cut/> line.
|
---|
75 |
|
---|
76 | =for comment
|
---|
77 | The current perlsyn says:
|
---|
78 | [beginquote]
|
---|
79 | Note that pod translators should look at only paragraphs beginning
|
---|
80 | with a pod directive (it makes parsing easier), whereas the compiler
|
---|
81 | actually knows to look for pod escapes even in the middle of a
|
---|
82 | paragraph. This means that the following secret stuff will be ignored
|
---|
83 | by both the compiler and the translators.
|
---|
84 | $a=3;
|
---|
85 | =secret stuff
|
---|
86 | warn "Neither POD nor CODE!?"
|
---|
87 | =cut back
|
---|
88 | print "got $a\n";
|
---|
89 | You probably shouldn't rely upon the warn() being podded out forever.
|
---|
90 | Not all pod translators are well-behaved in this regard, and perhaps
|
---|
91 | the compiler will become pickier.
|
---|
92 | [endquote]
|
---|
93 | I think that those paragraphs should just be removed; paragraph-based
|
---|
94 | parsing seems to have been largely abandoned, because of the hassle
|
---|
95 | with non-empty blank lines messing up what people meant by "paragraph".
|
---|
96 | Even if the "it makes parsing easier" bit were especially true,
|
---|
97 | it wouldn't be worth the confusion of having perl and pod2whatever
|
---|
98 | actually disagree on what can constitute a Pod block.
|
---|
99 |
|
---|
100 | Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph
|
---|
101 | consists of non-blank lines of text, separated by one or more blank
|
---|
102 | lines.
|
---|
103 |
|
---|
104 | For purposes of Pod processing, there are four types of paragraphs in
|
---|
105 | a Pod block:
|
---|
106 |
|
---|
107 | =over
|
---|
108 |
|
---|
109 | =item *
|
---|
110 |
|
---|
111 | A command paragraph (also called a "directive"). The first line of
|
---|
112 | this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are
|
---|
113 | typically one line, as in:
|
---|
114 |
|
---|
115 | =head1 NOTES
|
---|
116 |
|
---|
117 | =item *
|
---|
118 |
|
---|
119 | But they may span several (non-blank) lines:
|
---|
120 |
|
---|
121 | =for comment
|
---|
122 | Hm, I wonder what it would look like if
|
---|
123 | you tried to write a BNF for Pod from this.
|
---|
124 |
|
---|
125 | =head3 Dr. Strangelove, or: How I Learned to
|
---|
126 | Stop Worrying and Love the Bomb
|
---|
127 |
|
---|
128 | I<Some> command paragraphs allow formatting codes in their content
|
---|
129 | (i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in:
|
---|
130 |
|
---|
131 | =head1 Did You Remember to C<use strict;>?
|
---|
132 |
|
---|
133 | In other words, the Pod processing handler for "head1" will apply the
|
---|
134 | same processing to "Did You Remember to CE<lt>use strict;>?" that it
|
---|
135 | would to an ordinary paragraph -- i.e., formatting codes (like
|
---|
136 | "CE<lt>...>") are parsed and presumably formatted appropriately, and
|
---|
137 | whitespace in the form of literal spaces and/or tabs is not
|
---|
138 | significant.
|
---|
139 |
|
---|
140 | =item *
|
---|
141 |
|
---|
142 | A B<verbatim paragraph>. The first line of this paragraph must be a
|
---|
143 | literal space or tab, and this paragraph must not be inside a "=begin
|
---|
144 | I<identifier>", ... "=end I<identifier>" sequence unless
|
---|
145 | "I<identifier>" begins with a colon (":"). That is, if a paragraph
|
---|
146 | starts with a literal space or tab, but I<is> inside a
|
---|
147 | "=begin I<identifier>", ... "=end I<identifier>" region, then it's
|
---|
148 | a data paragraph, unless "I<identifier>" begins with a colon.
|
---|
149 |
|
---|
150 | Whitespace I<is> significant in verbatim paragraphs (although, in
|
---|
151 | processing, tabs are probably expanded).
|
---|
152 |
|
---|
153 | =item *
|
---|
154 |
|
---|
155 | An B<ordinary paragraph>. A paragraph is an ordinary paragraph
|
---|
156 | if its first line matches neither C<m/\A=[a-zA-Z]/> nor
|
---|
157 | C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>",
|
---|
158 | ... "=end I<identifier>" sequence unless "I<identifier>" begins with
|
---|
159 | a colon (":").
|
---|
160 |
|
---|
161 | =item *
|
---|
162 |
|
---|
163 | A B<data paragraph>. This is a paragraph that I<is> inside a "=begin
|
---|
164 | I<identifier>" ... "=end I<identifier>" sequence where
|
---|
165 | "I<identifier>" does I<not> begin with a literal colon (":"). In
|
---|
166 | some sense, a data paragraph is not part of Pod at all (i.e.,
|
---|
167 | effectively it's "out-of-band"), since it's not subject to most kinds
|
---|
168 | of Pod parsing; but it is specified here, since Pod
|
---|
169 | parsers need to be able to call an event for it, or store it in some
|
---|
170 | form in a parse tree, or at least just parse I<around> it.
|
---|
171 |
|
---|
172 | =back
|
---|
173 |
|
---|
174 | For example: consider the following paragraphs:
|
---|
175 |
|
---|
176 | # <- that's the 0th column
|
---|
177 |
|
---|
178 | =head1 Foo
|
---|
179 |
|
---|
180 | Stuff
|
---|
181 |
|
---|
182 | $foo->bar
|
---|
183 |
|
---|
184 | =cut
|
---|
185 |
|
---|
186 | Here, "=head1 Foo" and "=cut" are command paragraphs because the first
|
---|
187 | line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar"
|
---|
188 | is a verbatim paragraph, because its first line starts with a literal
|
---|
189 | whitespace character (and there's no "=begin"..."=end" region around).
|
---|
190 |
|
---|
191 | The "=begin I<identifier>" ... "=end I<identifier>" commands stop
|
---|
192 | paragraphs that they surround from being parsed as data or verbatim
|
---|
193 | paragraphs, if I<identifier> doesn't begin with a colon. This
|
---|
194 | is discussed in detail in the section
|
---|
195 | L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
|
---|
196 |
|
---|
197 | =head1 Pod Commands
|
---|
198 |
|
---|
199 | This section is intended to supplement and clarify the discussion in
|
---|
200 | L<perlpod/"Command Paragraph">. These are the currently recognized
|
---|
201 | Pod commands:
|
---|
202 |
|
---|
203 | =over
|
---|
204 |
|
---|
205 | =item "=head1", "=head2", "=head3", "=head4"
|
---|
206 |
|
---|
207 | This command indicates that the text in the remainder of the paragraph
|
---|
208 | is a heading. That text may contain formatting codes. Examples:
|
---|
209 |
|
---|
210 | =head1 Object Attributes
|
---|
211 |
|
---|
212 | =head3 What B<Not> to Do!
|
---|
213 |
|
---|
214 | =item "=pod"
|
---|
215 |
|
---|
216 | This command indicates that this paragraph begins a Pod block. (If we
|
---|
217 | are already in the middle of a Pod block, this command has no effect at
|
---|
218 | all.) If there is any text in this command paragraph after "=pod",
|
---|
219 | it must be ignored. Examples:
|
---|
220 |
|
---|
221 | =pod
|
---|
222 |
|
---|
223 | This is a plain Pod paragraph.
|
---|
224 |
|
---|
225 | =pod This text is ignored.
|
---|
226 |
|
---|
227 | =item "=cut"
|
---|
228 |
|
---|
229 | This command indicates that this line is the end of this previously
|
---|
230 | started Pod block. If there is any text after "=cut" on the line, it must be
|
---|
231 | ignored. Examples:
|
---|
232 |
|
---|
233 | =cut
|
---|
234 |
|
---|
235 | =cut The documentation ends here.
|
---|
236 |
|
---|
237 | =cut
|
---|
238 | # This is the first line of program text.
|
---|
239 | sub foo { # This is the second.
|
---|
240 |
|
---|
241 | It is an error to try to I<start> a Pod block with a "=cut" command. In
|
---|
242 | that case, the Pod processor must halt parsing of the input file, and
|
---|
243 | must by default emit a warning.
|
---|
244 |
|
---|
245 | =item "=over"
|
---|
246 |
|
---|
247 | This command indicates that this is the start of a list/indent
|
---|
248 | region. If there is any text following the "=over", it must consist
|
---|
249 | of only a nonzero positive numeral. The semantics of this numeral is
|
---|
250 | explained in the L</"About =over...=back Regions"> section, further
|
---|
251 | below. Formatting codes are not expanded. Examples:
|
---|
252 |
|
---|
253 | =over 3
|
---|
254 |
|
---|
255 | =over 3.5
|
---|
256 |
|
---|
257 | =over
|
---|
258 |
|
---|
259 | =item "=item"
|
---|
260 |
|
---|
261 | This command indicates that an item in a list begins here. Formatting
|
---|
262 | codes are processed. The semantics of the (optional) text in the
|
---|
263 | remainder of this paragraph are
|
---|
264 | explained in the L</"About =over...=back Regions"> section, further
|
---|
265 | below. Examples:
|
---|
266 |
|
---|
267 | =item
|
---|
268 |
|
---|
269 | =item *
|
---|
270 |
|
---|
271 | =item *
|
---|
272 |
|
---|
273 | =item 14
|
---|
274 |
|
---|
275 | =item 3.
|
---|
276 |
|
---|
277 | =item C<< $thing->stuff(I<dodad>) >>
|
---|
278 |
|
---|
279 | =item For transporting us beyond seas to be tried for pretended
|
---|
280 | offenses
|
---|
281 |
|
---|
282 | =item He is at this time transporting large armies of foreign
|
---|
283 | mercenaries to complete the works of death, desolation and
|
---|
284 | tyranny, already begun with circumstances of cruelty and perfidy
|
---|
285 | scarcely paralleled in the most barbarous ages, and totally
|
---|
286 | unworthy the head of a civilized nation.
|
---|
287 |
|
---|
288 | =item "=back"
|
---|
289 |
|
---|
290 | This command indicates that this is the end of the region begun
|
---|
291 | by the most recent "=over" command. It permits no text after the
|
---|
292 | "=back" command.
|
---|
293 |
|
---|
294 | =item "=begin formatname"
|
---|
295 |
|
---|
296 | This marks the following paragraphs (until the matching "=end
|
---|
297 | formatname") as being for some special kind of processing. Unless
|
---|
298 | "formatname" begins with a colon, the contained non-command
|
---|
299 | paragraphs are data paragraphs. But if "formatname" I<does> begin
|
---|
300 | with a colon, then non-command paragraphs are ordinary paragraphs
|
---|
301 | or data paragraphs. This is discussed in detail in the section
|
---|
302 | L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
|
---|
303 |
|
---|
304 | It is advised that formatnames match the regexp
|
---|
305 | C<m/\A:?[-a-zA-Z0-9_]+\z/>. Implementors should anticipate future
|
---|
306 | expansion in the semantics and syntax of the first parameter
|
---|
307 | to "=begin"/"=end"/"=for".
|
---|
308 |
|
---|
309 | =item "=end formatname"
|
---|
310 |
|
---|
311 | This marks the end of the region opened by the matching
|
---|
312 | "=begin formatname" region. If "formatname" is not the formatname
|
---|
313 | of the most recent open "=begin formatname" region, then this
|
---|
314 | is an error, and must generate an error message. This
|
---|
315 | is discussed in detail in the section
|
---|
316 | L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
|
---|
317 |
|
---|
318 | =item "=for formatname text..."
|
---|
319 |
|
---|
320 | This is synonymous with:
|
---|
321 |
|
---|
322 | =begin formatname
|
---|
323 |
|
---|
324 | text...
|
---|
325 |
|
---|
326 | =end formatname
|
---|
327 |
|
---|
328 | That is, it creates a region consisting of a single paragraph; that
|
---|
329 | paragraph is to be treated as a normal paragraph if "formatname"
|
---|
330 | begins with a ":"; if "formatname" I<doesn't> begin with a colon,
|
---|
331 | then "text..." will constitute a data paragraph. There is no way
|
---|
332 | to use "=for formatname text..." to express "text..." as a verbatim
|
---|
333 | paragraph.
|
---|
334 |
|
---|
335 | =item "=encoding encodingname"
|
---|
336 |
|
---|
337 | This command, which should occur early in the document (at least
|
---|
338 | before any non-US-ASCII data!), declares that this document is
|
---|
339 | encoded in the encoding I<encodingname>, which must be
|
---|
340 | an encoding name that L<Encoding> recognizes. (Encoding's list
|
---|
341 | of supported encodings, in L<Encoding::Supported>, is useful here.)
|
---|
342 | If the Pod parser cannot decode the declared encoding, it
|
---|
343 | should emit a warning and may abort parsing the document
|
---|
344 | altogether.
|
---|
345 |
|
---|
346 | A document having more than one "=encoding" line should be
|
---|
347 | considered an error. Pod processors may silently tolerate this if
|
---|
348 | the not-first "=encoding" lines are just duplicates of the
|
---|
349 | first one (e.g., if there's a "=use utf8" line, and later on
|
---|
350 | another "=use utf8" line). But Pod processors should complain if
|
---|
351 | there are contradictory "=encoding" lines in the same document
|
---|
352 | (e.g., if there is a "=encoding utf8" early in the document and
|
---|
353 | "=encoding big5" later). Pod processors that recognize BOMs
|
---|
354 | may also complain if they see an "=encoding" line
|
---|
355 | that contradicts the BOM (e.g., if a document with a UTF-16LE
|
---|
356 | BOM has an "=encoding shiftjis" line).
|
---|
357 |
|
---|
358 | =back
|
---|
359 |
|
---|
360 | If a Pod processor sees any command other than the ones listed
|
---|
361 | above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish",
|
---|
362 | or "=w123"), that processor must by default treat this as an
|
---|
363 | error. It must not process the paragraph beginning with that
|
---|
364 | command, must by default warn of this as an error, and may
|
---|
365 | abort the parse. A Pod parser may allow a way for particular
|
---|
366 | applications to add to the above list of known commands, and to
|
---|
367 | stipulate, for each additional command, whether formatting
|
---|
368 | codes should be processed.
|
---|
369 |
|
---|
370 | Future versions of this specification may add additional
|
---|
371 | commands.
|
---|
372 |
|
---|
373 |
|
---|
374 |
|
---|
375 | =head1 Pod Formatting Codes
|
---|
376 |
|
---|
377 | (Note that in previous drafts of this document and of perlpod,
|
---|
378 | formatting codes were referred to as "interior sequences", and
|
---|
379 | this term may still be found in the documentation for Pod parsers,
|
---|
380 | and in error messages from Pod processors.)
|
---|
381 |
|
---|
382 | There are two syntaxes for formatting codes:
|
---|
383 |
|
---|
384 | =over
|
---|
385 |
|
---|
386 | =item *
|
---|
387 |
|
---|
388 | A formatting code starts with a capital letter (just US-ASCII [A-Z])
|
---|
389 | followed by a "<", any number of characters, and ending with the first
|
---|
390 | matching ">". Examples:
|
---|
391 |
|
---|
392 | That's what I<you> think!
|
---|
393 |
|
---|
394 | What's C<dump()> for?
|
---|
395 |
|
---|
396 | X<C<chmod> and C<unlink()> Under Different Operating Systems>
|
---|
397 |
|
---|
398 | =item *
|
---|
399 |
|
---|
400 | A formatting code starts with a capital letter (just US-ASCII [A-Z])
|
---|
401 | followed by two or more "<"'s, one or more whitespace characters,
|
---|
402 | any number of characters, one or more whitespace characters,
|
---|
403 | and ending with the first matching sequence of two or more ">"'s, where
|
---|
404 | the number of ">"'s equals the number of "<"'s in the opening of this
|
---|
405 | formatting code. Examples:
|
---|
406 |
|
---|
407 | That's what I<< you >> think!
|
---|
408 |
|
---|
409 | C<<< open(X, ">>thing.dat") || die $! >>>
|
---|
410 |
|
---|
411 | B<< $foo->bar(); >>
|
---|
412 |
|
---|
413 | With this syntax, the whitespace character(s) after the "CE<lt><<"
|
---|
414 | and before the ">>" (or whatever letter) are I<not> renderable -- they
|
---|
415 | do not signify whitespace, are merely part of the formatting codes
|
---|
416 | themselves. That is, these are all synonymous:
|
---|
417 |
|
---|
418 | C<thing>
|
---|
419 | C<< thing >>
|
---|
420 | C<< thing >>
|
---|
421 | C<<< thing >>>
|
---|
422 | C<<<<
|
---|
423 | thing
|
---|
424 | >>>>
|
---|
425 |
|
---|
426 | and so on.
|
---|
427 |
|
---|
428 | =back
|
---|
429 |
|
---|
430 | In parsing Pod, a notably tricky part is the correct parsing of
|
---|
431 | (potentially nested!) formatting codes. Implementors should
|
---|
432 | consult the code in the C<parse_text> routine in Pod::Parser as an
|
---|
433 | example of a correct implementation.
|
---|
434 |
|
---|
435 | =over
|
---|
436 |
|
---|
437 | =item C<IE<lt>textE<gt>> -- italic text
|
---|
438 |
|
---|
439 | See the brief discussion in L<perlpod/"Formatting Codes">.
|
---|
440 |
|
---|
441 | =item C<BE<lt>textE<gt>> -- bold text
|
---|
442 |
|
---|
443 | See the brief discussion in L<perlpod/"Formatting Codes">.
|
---|
444 |
|
---|
445 | =item C<CE<lt>codeE<gt>> -- code text
|
---|
446 |
|
---|
447 | See the brief discussion in L<perlpod/"Formatting Codes">.
|
---|
448 |
|
---|
449 | =item C<FE<lt>filenameE<gt>> -- style for filenames
|
---|
450 |
|
---|
451 | See the brief discussion in L<perlpod/"Formatting Codes">.
|
---|
452 |
|
---|
453 | =item C<XE<lt>topic nameE<gt>> -- an index entry
|
---|
454 |
|
---|
455 | See the brief discussion in L<perlpod/"Formatting Codes">.
|
---|
456 |
|
---|
457 | This code is unusual in that most formatters completely discard
|
---|
458 | this code and its content. Other formatters will render it with
|
---|
459 | invisible codes that can be used in building an index of
|
---|
460 | the current document.
|
---|
461 |
|
---|
462 | =item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code
|
---|
463 |
|
---|
464 | Discussed briefly in L<perlpod/"Formatting Codes">.
|
---|
465 |
|
---|
466 | This code is unusual is that it should have no content. That is,
|
---|
467 | a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether
|
---|
468 | or not it complains, the I<potatoes> text should ignored.
|
---|
469 |
|
---|
470 | =item C<LE<lt>nameE<gt>> -- a hyperlink
|
---|
471 |
|
---|
472 | The complicated syntaxes of this code are discussed at length in
|
---|
473 | L<perlpod/"Formatting Codes">, and implementation details are
|
---|
474 | discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the
|
---|
475 | contents of LE<lt>content> is tricky. Notably, the content has to be
|
---|
476 | checked for whether it looks like a URL, or whether it has to be split
|
---|
477 | on literal "|" and/or "/" (in the right order!), and so on,
|
---|
478 | I<before> EE<lt>...> codes are resolved.
|
---|
479 |
|
---|
480 | =item C<EE<lt>escapeE<gt>> -- a character escape
|
---|
481 |
|
---|
482 | See L<perlpod/"Formatting Codes">, and several points in
|
---|
483 | L</Notes on Implementing Pod Processors>.
|
---|
484 |
|
---|
485 | =item C<SE<lt>textE<gt>> -- text contains non-breaking spaces
|
---|
486 |
|
---|
487 | This formatting code is syntactically simple, but semantically
|
---|
488 | complex. What it means is that each space in the printable
|
---|
489 | content of this code signifies a non-breaking space.
|
---|
490 |
|
---|
491 | Consider:
|
---|
492 |
|
---|
493 | C<$x ? $y : $z>
|
---|
494 |
|
---|
495 | S<C<$x ? $y : $z>>
|
---|
496 |
|
---|
497 | Both signify the monospace (c[ode] style) text consisting of
|
---|
498 | "$x", one space, "?", one space, ":", one space, "$z". The
|
---|
499 | difference is that in the latter, with the S code, those spaces
|
---|
500 | are not "normal" spaces, but instead are non-breaking spaces.
|
---|
501 |
|
---|
502 | =back
|
---|
503 |
|
---|
504 |
|
---|
505 | If a Pod processor sees any formatting code other than the ones
|
---|
506 | listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that
|
---|
507 | processor must by default treat this as an error.
|
---|
508 | A Pod parser may allow a way for particular
|
---|
509 | applications to add to the above list of known formatting codes;
|
---|
510 | a Pod parser might even allow a way to stipulate, for each additional
|
---|
511 | command, whether it requires some form of special processing, as
|
---|
512 | LE<lt>...> does.
|
---|
513 |
|
---|
514 | Future versions of this specification may add additional
|
---|
515 | formatting codes.
|
---|
516 |
|
---|
517 | Historical note: A few older Pod processors would not see a ">" as
|
---|
518 | closing a "CE<lt>" code, if the ">" was immediately preceded by
|
---|
519 | a "-". This was so that this:
|
---|
520 |
|
---|
521 | C<$foo->bar>
|
---|
522 |
|
---|
523 | would parse as equivalent to this:
|
---|
524 |
|
---|
525 | C<$foo-E<gt>bar>
|
---|
526 |
|
---|
527 | instead of as equivalent to a "C" formatting code containing
|
---|
528 | only "$foo-", and then a "bar>" outside the "C" formatting code. This
|
---|
529 | problem has since been solved by the addition of syntaxes like this:
|
---|
530 |
|
---|
531 | C<< $foo->bar >>
|
---|
532 |
|
---|
533 | Compliant parsers must not treat "->" as special.
|
---|
534 |
|
---|
535 | Formatting codes absolutely cannot span paragraphs. If a code is
|
---|
536 | opened in one paragraph, and no closing code is found by the end of
|
---|
537 | that paragraph, the Pod parser must close that formatting code,
|
---|
538 | and should complain (as in "Unterminated I code in the paragraph
|
---|
539 | starting at line 123: 'Time objects are not...'"). So these
|
---|
540 | two paragraphs:
|
---|
541 |
|
---|
542 | I<I told you not to do this!
|
---|
543 |
|
---|
544 | Don't make me say it again!>
|
---|
545 |
|
---|
546 | ...must I<not> be parsed as two paragraphs in italics (with the I
|
---|
547 | code starting in one paragraph and starting in another.) Instead,
|
---|
548 | the first paragraph should generate a warning, but that aside, the
|
---|
549 | above code must parse as if it were:
|
---|
550 |
|
---|
551 | I<I told you not to do this!>
|
---|
552 |
|
---|
553 | Don't make me say it again!E<gt>
|
---|
554 |
|
---|
555 | (In SGMLish jargon, all Pod commands are like block-level
|
---|
556 | elements, whereas all Pod formatting codes are like inline-level
|
---|
557 | elements.)
|
---|
558 |
|
---|
559 |
|
---|
560 |
|
---|
561 | =head1 Notes on Implementing Pod Processors
|
---|
562 |
|
---|
563 | The following is a long section of miscellaneous requirements
|
---|
564 | and suggestions to do with Pod processing.
|
---|
565 |
|
---|
566 | =over
|
---|
567 |
|
---|
568 | =item *
|
---|
569 |
|
---|
570 | Pod formatters should tolerate lines in verbatim blocks that are of
|
---|
571 | any length, even if that means having to break them (possibly several
|
---|
572 | times, for very long lines) to avoid text running off the side of the
|
---|
573 | page. Pod formatters may warn of such line-breaking. Such warnings
|
---|
574 | are particularly appropriate for lines are over 100 characters long, which
|
---|
575 | are usually not intentional.
|
---|
576 |
|
---|
577 | =item *
|
---|
578 |
|
---|
579 | Pod parsers must recognize I<all> of the three well-known newline
|
---|
580 | formats: CR, LF, and CRLF. See L<perlport|perlport>.
|
---|
581 |
|
---|
582 | =item *
|
---|
583 |
|
---|
584 | Pod parsers should accept input lines that are of any length.
|
---|
585 |
|
---|
586 | =item *
|
---|
587 |
|
---|
588 | Since Perl recognizes a Unicode Byte Order Mark at the start of files
|
---|
589 | as signaling that the file is Unicode encoded as in UTF-16 (whether
|
---|
590 | big-endian or little-endian) or UTF-8, Pod parsers should do the
|
---|
591 | same. Otherwise, the character encoding should be understood as
|
---|
592 | being UTF-8 if the first highbit byte sequence in the file seems
|
---|
593 | valid as a UTF-8 sequence, or otherwise as Latin-1.
|
---|
594 |
|
---|
595 | Future versions of this specification may specify
|
---|
596 | how Pod can accept other encodings. Presumably treatment of other
|
---|
597 | encodings in Pod parsing would be as in XML parsing: whatever the
|
---|
598 | encoding declared by a particular Pod file, content is to be
|
---|
599 | stored in memory as Unicode characters.
|
---|
600 |
|
---|
601 | =item *
|
---|
602 |
|
---|
603 | The well known Unicode Byte Order Marks are as follows: if the
|
---|
604 | file begins with the two literal byte values 0xFE 0xFF, this is
|
---|
605 | the BOM for big-endian UTF-16. If the file begins with the two
|
---|
606 | literal byte value 0xFF 0xFE, this is the BOM for little-endian
|
---|
607 | UTF-16. If the file begins with the three literal byte values
|
---|
608 | 0xEF 0xBB 0xBF, this is the BOM for UTF-8.
|
---|
609 |
|
---|
610 | =for comment
|
---|
611 | use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}";
|
---|
612 | 0xEF 0xBB 0xBF
|
---|
613 |
|
---|
614 | =for comment
|
---|
615 | If toke.c is modified to support UTF-32, add mention of those here.
|
---|
616 |
|
---|
617 | =item *
|
---|
618 |
|
---|
619 | A naive but sufficient heuristic for testing the first highbit
|
---|
620 | byte-sequence in a BOM-less file (whether in code or in Pod!), to see
|
---|
621 | whether that sequence is valid as UTF-8 (RFC 2279) is to check whether
|
---|
622 | that the first byte in the sequence is in the range 0xC0 - 0xFD
|
---|
623 | I<and> whether the next byte is in the range
|
---|
624 | 0x80 - 0xBF. If so, the parser may conclude that this file is in
|
---|
625 | UTF-8, and all highbit sequences in the file should be assumed to
|
---|
626 | be UTF-8. Otherwise the parser should treat the file as being
|
---|
627 | in Latin-1. In the unlikely circumstance that the first highbit
|
---|
628 | sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one
|
---|
629 | can cater to our heuristic (as well as any more intelligent heuristic)
|
---|
630 | by prefacing that line with a comment line containing a highbit
|
---|
631 | sequence that is clearly I<not> valid as UTF-8. A line consisting
|
---|
632 | of simply "#", an e-acute, and any non-highbit byte,
|
---|
633 | is sufficient to establish this file's encoding.
|
---|
634 |
|
---|
635 | =for comment
|
---|
636 | If/WHEN some brave soul makes these heuristics into a generic
|
---|
637 | text-file class (or PerlIO layer?), we can presumably delete
|
---|
638 | mention of these icky details from this file, and can instead
|
---|
639 | tell people to just use appropriate class/layer.
|
---|
640 | Auto-recognition of newline sequences would be another desirable
|
---|
641 | feature of such a class/layer.
|
---|
642 | HINT HINT HINT.
|
---|
643 |
|
---|
644 | =for comment
|
---|
645 | "The probability that a string of characters
|
---|
646 | in any other encoding appears as valid UTF-8 is low" - RFC2279
|
---|
647 |
|
---|
648 | =item *
|
---|
649 |
|
---|
650 | This document's requirements and suggestions about encodings
|
---|
651 | do not apply to Pod processors running on non-ASCII platforms,
|
---|
652 | notably EBCDIC platforms.
|
---|
653 |
|
---|
654 | =item *
|
---|
655 |
|
---|
656 | Pod processors must treat a "=for [label] [content...]" paragraph as
|
---|
657 | meaning the same thing as a "=begin [label]" paragraph, content, and
|
---|
658 | an "=end [label]" paragraph. (The parser may conflate these two
|
---|
659 | constructs, or may leave them distinct, in the expectation that the
|
---|
660 | formatter will nevertheless treat them the same.)
|
---|
661 |
|
---|
662 | =item *
|
---|
663 |
|
---|
664 | When rendering Pod to a format that allows comments (i.e., to nearly
|
---|
665 | any format other than plaintext), a Pod formatter must insert comment
|
---|
666 | text identifying its name and version number, and the name and
|
---|
667 | version numbers of any modules it might be using to process the Pod.
|
---|
668 | Minimal examples:
|
---|
669 |
|
---|
670 | %% POD::Pod2PS v3.14159, using POD::Parser v1.92
|
---|
671 |
|
---|
672 | <!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
|
---|
673 |
|
---|
674 | {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
|
---|
675 |
|
---|
676 | .\" Pod::Man version 3.14159, using POD::Parser version 1.92
|
---|
677 |
|
---|
678 | Formatters may also insert additional comments, including: the
|
---|
679 | release date of the Pod formatter program, the contact address for
|
---|
680 | the author(s) of the formatter, the current time, the name of input
|
---|
681 | file, the formatting options in effect, version of Perl used, etc.
|
---|
682 |
|
---|
683 | Formatters may also choose to note errors/warnings as comments,
|
---|
684 | besides or instead of emitting them otherwise (as in messages to
|
---|
685 | STDERR, or C<die>ing).
|
---|
686 |
|
---|
687 | =item *
|
---|
688 |
|
---|
689 | Pod parsers I<may> emit warnings or error messages ("Unknown E code
|
---|
690 | EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or
|
---|
691 | C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow
|
---|
692 | suppressing all such STDERR output, and instead allow an option for
|
---|
693 | reporting errors/warnings
|
---|
694 | in some other way, whether by triggering a callback, or noting errors
|
---|
695 | in some attribute of the document object, or some similarly unobtrusive
|
---|
696 | mechanism -- or even by appending a "Pod Errors" section to the end of
|
---|
697 | the parsed form of the document.
|
---|
698 |
|
---|
699 | =item *
|
---|
700 |
|
---|
701 | In cases of exceptionally aberrant documents, Pod parsers may abort the
|
---|
702 | parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where
|
---|
703 | possible, the parser library may simply close the input file
|
---|
704 | and add text like "*** Formatting Aborted ***" to the end of the
|
---|
705 | (partial) in-memory document.
|
---|
706 |
|
---|
707 | =item *
|
---|
708 |
|
---|
709 | In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>)
|
---|
710 | are understood (i.e., I<not> verbatim paragraphs, but I<including>
|
---|
711 | ordinary paragraphs, and command paragraphs that produce renderable
|
---|
712 | text, like "=head1"), literal whitespace should generally be considered
|
---|
713 | "insignificant", in that one literal space has the same meaning as any
|
---|
714 | (nonzero) number of literal spaces, literal newlines, and literal tabs
|
---|
715 | (as long as this produces no blank lines, since those would terminate
|
---|
716 | the paragraph). Pod parsers should compact literal whitespace in each
|
---|
717 | processed paragraph, but may provide an option for overriding this
|
---|
718 | (since some processing tasks do not require it), or may follow
|
---|
719 | additional special rules (for example, specially treating
|
---|
720 | period-space-space or period-newline sequences).
|
---|
721 |
|
---|
722 | =item *
|
---|
723 |
|
---|
724 | Pod parsers should not, by default, try to coerce apostrophe (') and
|
---|
725 | quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
|
---|
726 | turn backtick (`) into anything else but a single backtick character
|
---|
727 | (distinct from an openquote character!), nor "--" into anything but
|
---|
728 | two minus signs. They I<must never> do any of those things to text
|
---|
729 | in CE<lt>...> formatting codes, and never I<ever> to text in verbatim
|
---|
730 | paragraphs.
|
---|
731 |
|
---|
732 | =item *
|
---|
733 |
|
---|
734 | When rendering Pod to a format that has two kinds of hyphens (-), one
|
---|
735 | that's a non-breaking hyphen, and another that's a breakable hyphen
|
---|
736 | (as in "object-oriented", which can be split across lines as
|
---|
737 | "object-", newline, "oriented"), formatters are encouraged to
|
---|
738 | generally translate "-" to non-breaking hyphen, but may apply
|
---|
739 | heuristics to convert some of these to breaking hyphens.
|
---|
740 |
|
---|
741 | =item *
|
---|
742 |
|
---|
743 | Pod formatters should make reasonable efforts to keep words of Perl
|
---|
744 | code from being broken across lines. For example, "Foo::Bar" in some
|
---|
745 | formatting systems is seen as eligible for being broken across lines
|
---|
746 | as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should
|
---|
747 | be avoided where possible, either by disabling all line-breaking in
|
---|
748 | mid-word, or by wrapping particular words with internal punctuation
|
---|
749 | in "don't break this across lines" codes (which in some formats may
|
---|
750 | not be a single code, but might be a matter of inserting non-breaking
|
---|
751 | zero-width spaces between every pair of characters in a word.)
|
---|
752 |
|
---|
753 | =item *
|
---|
754 |
|
---|
755 | Pod parsers should, by default, expand tabs in verbatim paragraphs as
|
---|
756 | they are processed, before passing them to the formatter or other
|
---|
757 | processor. Parsers may also allow an option for overriding this.
|
---|
758 |
|
---|
759 | =item *
|
---|
760 |
|
---|
761 | Pod parsers should, by default, remove newlines from the end of
|
---|
762 | ordinary and verbatim paragraphs before passing them to the
|
---|
763 | formatter. For example, while the paragraph you're reading now
|
---|
764 | could be considered, in Pod source, to end with (and contain)
|
---|
765 | the newline(s) that end it, it should be processed as ending with
|
---|
766 | (and containing) the period character that ends this sentence.
|
---|
767 |
|
---|
768 | =item *
|
---|
769 |
|
---|
770 | Pod parsers, when reporting errors, should make some effort to report
|
---|
771 | an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near
|
---|
772 | line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph
|
---|
773 | number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where
|
---|
774 | this is problematic, the paragraph number should at least be
|
---|
775 | accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in
|
---|
776 | Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
|
---|
777 | the CE<lt>interest rate> attribute...'").
|
---|
778 |
|
---|
779 | =item *
|
---|
780 |
|
---|
781 | Pod parsers, when processing a series of verbatim paragraphs one
|
---|
782 | after another, should consider them to be one large verbatim
|
---|
783 | paragraph that happens to contain blank lines. I.e., these two
|
---|
784 | lines, which have a blank line between them:
|
---|
785 |
|
---|
786 | use Foo;
|
---|
787 |
|
---|
788 | print Foo->VERSION
|
---|
789 |
|
---|
790 | should be unified into one paragraph ("\tuse Foo;\n\n\tprint
|
---|
791 | Foo->VERSION") before being passed to the formatter or other
|
---|
792 | processor. Parsers may also allow an option for overriding this.
|
---|
793 |
|
---|
794 | While this might be too cumbersome to implement in event-based Pod
|
---|
795 | parsers, it is straightforward for parsers that return parse trees.
|
---|
796 |
|
---|
797 | =item *
|
---|
798 |
|
---|
799 | Pod formatters, where feasible, are advised to avoid splitting short
|
---|
800 | verbatim paragraphs (under twelve lines, say) across pages.
|
---|
801 |
|
---|
802 | =item *
|
---|
803 |
|
---|
804 | Pod parsers must treat a line with only spaces and/or tabs on it as a
|
---|
805 | "blank line" such as separates paragraphs. (Some older parsers
|
---|
806 | recognized only two adjacent newlines as a "blank line" but would not
|
---|
807 | recognize a newline, a space, and a newline, as a blank line. This
|
---|
808 | is noncompliant behavior.)
|
---|
809 |
|
---|
810 | =item *
|
---|
811 |
|
---|
812 | Authors of Pod formatters/processors should make every effort to
|
---|
813 | avoid writing their own Pod parser. There are already several in
|
---|
814 | CPAN, with a wide range of interface styles -- and one of them,
|
---|
815 | Pod::Parser, comes with modern versions of Perl.
|
---|
816 |
|
---|
817 | =item *
|
---|
818 |
|
---|
819 | Characters in Pod documents may be conveyed either as literals, or by
|
---|
820 | number in EE<lt>n> codes, or by an equivalent mnemonic, as in
|
---|
821 | EE<lt>eacute> which is exactly equivalent to EE<lt>233>.
|
---|
822 |
|
---|
823 | Characters in the range 32-126 refer to those well known US-ASCII
|
---|
824 | characters (also defined there by Unicode, with the same meaning),
|
---|
825 | which all Pod formatters must render faithfully. Characters
|
---|
826 | in the ranges 0-31 and 127-159 should not be used (neither as
|
---|
827 | literals, nor as EE<lt>number> codes), except for the
|
---|
828 | literal byte-sequences for newline (13, 13 10, or 10), and tab (9).
|
---|
829 |
|
---|
830 | Characters in the range 160-255 refer to Latin-1 characters (also
|
---|
831 | defined there by Unicode, with the same meaning). Characters above
|
---|
832 | 255 should be understood to refer to Unicode characters.
|
---|
833 |
|
---|
834 | =item *
|
---|
835 |
|
---|
836 | Be warned
|
---|
837 | that some formatters cannot reliably render characters outside 32-126;
|
---|
838 | and many are able to handle 32-126 and 160-255, but nothing above
|
---|
839 | 255.
|
---|
840 |
|
---|
841 | =item *
|
---|
842 |
|
---|
843 | Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for
|
---|
844 | less-than and greater-than, Pod parsers must understand "EE<lt>sol>"
|
---|
845 | for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar,
|
---|
846 | pipe). Pod parsers should also understand "EE<lt>lchevron>" and
|
---|
847 | "EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e.,
|
---|
848 | "left-pointing double angle quotation mark" = "left pointing
|
---|
849 | guillemet" and "right-pointing double angle quotation mark" = "right
|
---|
850 | pointing guillemet". (These look like little "<<" and ">>", and they
|
---|
851 | are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>"
|
---|
852 | and "EE<lt>raquo>".)
|
---|
853 |
|
---|
854 | =item *
|
---|
855 |
|
---|
856 | Pod parsers should understand all "EE<lt>html>" codes as defined
|
---|
857 | in the entity declarations in the most recent XHTML specification at
|
---|
858 | C<www.W3.org>. Pod parsers must understand at least the entities
|
---|
859 | that define characters in the range 160-255 (Latin-1). Pod parsers,
|
---|
860 | when faced with some unknown "EE<lt>I<identifier>>" code,
|
---|
861 | shouldn't simply replace it with nullstring (by default, at least),
|
---|
862 | but may pass it through as a string consisting of the literal characters
|
---|
863 | E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the
|
---|
864 | alternative option of processing such unknown
|
---|
865 | "EE<lt>I<identifier>>" codes by firing an event especially
|
---|
866 | for such codes, or by adding a special node-type to the in-memory
|
---|
867 | document tree. Such "EE<lt>I<identifier>>" may have special meaning
|
---|
868 | to some processors, or some processors may choose to add them to
|
---|
869 | a special error report.
|
---|
870 |
|
---|
871 | =item *
|
---|
872 |
|
---|
873 | Pod parsers must also support the XHTML codes "EE<lt>quot>" for
|
---|
874 | character 34 (doublequote, "), "EE<lt>amp>" for character 38
|
---|
875 | (ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, ').
|
---|
876 |
|
---|
877 | =item *
|
---|
878 |
|
---|
879 | Note that in all cases of "EE<lt>whatever>", I<whatever> (whether
|
---|
880 | an htmlname, or a number in any base) must consist only of
|
---|
881 | alphanumeric characters -- that is, I<whatever> must watch
|
---|
882 | C<m/\A\w+\z/>. So "EE<lt> 0 1 2 3 >" is invalid, because
|
---|
883 | it contains spaces, which aren't alphanumeric characters. This
|
---|
884 | presumably does not I<need> special treatment by a Pod processor;
|
---|
885 | " 0 1 2 3 " doesn't look like a number in any base, so it would
|
---|
886 | presumably be looked up in the table of HTML-like names. Since
|
---|
887 | there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ",
|
---|
888 | this will be treated as an error. However, Pod processors may
|
---|
889 | treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically>
|
---|
890 | invalid, potentially earning a different error message than the
|
---|
891 | error message (or warning, or event) generated by a merely unknown
|
---|
892 | (but theoretically valid) htmlname, as in "EE<lt>qacute>"
|
---|
893 | [sic]. However, Pod parsers are not required to make this
|
---|
894 | distinction.
|
---|
895 |
|
---|
896 | =item *
|
---|
897 |
|
---|
898 | Note that EE<lt>number> I<must not> be interpreted as simply
|
---|
899 | "codepoint I<number> in the current/native character set". It always
|
---|
900 | means only "the character represented by codepoint I<number> in
|
---|
901 | Unicode." (This is identical to the semantics of &#I<number>; in XML.)
|
---|
902 |
|
---|
903 | This will likely require many formatters to have tables mapping from
|
---|
904 | treatable Unicode codepoints (such as the "\xE9" for the e-acute
|
---|
905 | character) to the escape sequences or codes necessary for conveying
|
---|
906 | such sequences in the target output format. A converter to *roff
|
---|
907 | would, for example know that "\xE9" (whether conveyed literally, or via
|
---|
908 | a EE<lt>...> sequence) is to be conveyed as "e\\*'".
|
---|
909 | Similarly, a program rendering Pod in a Mac OS application window, would
|
---|
910 | presumably need to know that "\xE9" maps to codepoint 142 in MacRoman
|
---|
911 | encoding that (at time of writing) is native for Mac OS. Such
|
---|
912 | Unicode2whatever mappings are presumably already widely available for
|
---|
913 | common output formats. (Such mappings may be incomplete! Implementers
|
---|
914 | are not expected to bend over backwards in an attempt to render
|
---|
915 | Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
|
---|
916 | of the other weird things that Unicode can encode.) And
|
---|
917 | if a Pod document uses a character not found in such a mapping, the
|
---|
918 | formatter should consider it an unrenderable character.
|
---|
919 |
|
---|
920 | =item *
|
---|
921 |
|
---|
922 | If, surprisingly, the implementor of a Pod formatter can't find a
|
---|
923 | satisfactory pre-existing table mapping from Unicode characters to
|
---|
924 | escapes in the target format (e.g., a decent table of Unicode
|
---|
925 | characters to *roff escapes), it will be necessary to build such a
|
---|
926 | table. If you are in this circumstance, you should begin with the
|
---|
927 | characters in the range 0x00A0 - 0x00FF, which is mostly the heavily
|
---|
928 | used accented characters. Then proceed (as patience permits and
|
---|
929 | fastidiousness compels) through the characters that the (X)HTML
|
---|
930 | standards groups judged important enough to merit mnemonics
|
---|
931 | for. These are declared in the (X)HTML specifications at the
|
---|
932 | www.W3.org site. At time of writing (September 2001), the most recent
|
---|
933 | entity declaration files are:
|
---|
934 |
|
---|
935 | http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
|
---|
936 | http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
|
---|
937 | http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
|
---|
938 |
|
---|
939 | Then you can progress through any remaining notable Unicode characters
|
---|
940 | in the range 0x2000-0x204D (consult the character tables at
|
---|
941 | www.unicode.org), and whatever else strikes your fancy. For example,
|
---|
942 | in F<xhtml-symbol.ent>, there is the entry:
|
---|
943 |
|
---|
944 | <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech -->
|
---|
945 |
|
---|
946 | While the mapping "infin" to the character "\x{221E}" will (hopefully)
|
---|
947 | have been already handled by the Pod parser, the presence of the
|
---|
948 | character in this file means that it's reasonably important enough to
|
---|
949 | include in a formatter's table that maps from notable Unicode characters
|
---|
950 | to the codes necessary for rendering them. So for a Unicode-to-*roff
|
---|
951 | mapping, for example, this would merit the entry:
|
---|
952 |
|
---|
953 | "\x{221E}" => '\(in',
|
---|
954 |
|
---|
955 | It is eagerly hoped that in the future, increasing numbers of formats
|
---|
956 | (and formatters) will support Unicode characters directly (as (X)HTML
|
---|
957 | does with C<∞>, C<∞>, or C<∞>), reducing the need
|
---|
958 | for idiosyncratic mappings of Unicode-to-I<my_escapes>.
|
---|
959 |
|
---|
960 | =item *
|
---|
961 |
|
---|
962 | It is up to individual Pod formatter to display good judgment when
|
---|
963 | confronted with an unrenderable character (which is distinct from an
|
---|
964 | unknown EE<lt>thing> sequence that the parser couldn't resolve to
|
---|
965 | anything, renderable or not). It is good practice to map Latin letters
|
---|
966 | with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding
|
---|
967 | unaccented US-ASCII letters (like a simple character 101, "e"), but
|
---|
968 | clearly this is often not feasible, and an unrenderable character may
|
---|
969 | be represented as "?", or the like. In attempting a sane fallback
|
---|
970 | (as from EE<lt>233> to "e"), Pod formatters may use the
|
---|
971 | %Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or
|
---|
972 | L<Text::Unidecode|Text::Unidecode>, if available.
|
---|
973 |
|
---|
974 | For example, this Pod text:
|
---|
975 |
|
---|
976 | magic is enabled if you set C<$Currency> to 'E<euro>'.
|
---|
977 |
|
---|
978 | may be rendered as:
|
---|
979 | "magic is enabled if you set C<$Currency> to 'I<?>'" or as
|
---|
980 | "magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as
|
---|
981 | "magic is enabled if you set C<$Currency> to '[x20AC]', etc.
|
---|
982 |
|
---|
983 | A Pod formatter may also note, in a comment or warning, a list of what
|
---|
984 | unrenderable characters were encountered.
|
---|
985 |
|
---|
986 | =item *
|
---|
987 |
|
---|
988 | EE<lt>...> may freely appear in any formatting code (other than
|
---|
989 | in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The
|
---|
990 | EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The
|
---|
991 | EE<lt>euro>1,000,000 Solution|Million::Euros>".
|
---|
992 |
|
---|
993 | =item *
|
---|
994 |
|
---|
995 | Some Pod formatters output to formats that implement non-breaking
|
---|
996 | spaces as an individual character (which I'll call "NBSP"), and
|
---|
997 | others output to formats that implement non-breaking spaces just as
|
---|
998 | spaces wrapped in a "don't break this across lines" code. Note that
|
---|
999 | at the level of Pod, both sorts of codes can occur: Pod can contain a
|
---|
1000 | NBSP character (whether as a literal, or as a "EE<lt>160>" or
|
---|
1001 | "EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo
|
---|
1002 | IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in
|
---|
1003 | such codes are taken to represent non-breaking spaces. Pod
|
---|
1004 | parsers should consider supporting the optional parsing of "SE<lt>foo
|
---|
1005 | IE<lt>barE<gt> baz>" as if it were
|
---|
1006 | "fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the
|
---|
1007 | optional parsing of groups of words joined by NBSP's as if each group
|
---|
1008 | were in a SE<lt>...> code, so that formatters may use the
|
---|
1009 | representation that maps best to what the output format demands.
|
---|
1010 |
|
---|
1011 | =item *
|
---|
1012 |
|
---|
1013 | Some processors may find that the C<SE<lt>...E<gt>> code is easiest to
|
---|
1014 | implement by replacing each space in the parse tree under the content
|
---|
1015 | of the S, with an NBSP. But note: the replacement should apply I<not> to
|
---|
1016 | spaces in I<all> text, but I<only> to spaces in I<printable> text. (This
|
---|
1017 | distinction may or may not be evident in the particular tree/event
|
---|
1018 | model implemented by the Pod parser.) For example, consider this
|
---|
1019 | unusual case:
|
---|
1020 |
|
---|
1021 | S<L</Autoloaded Functions>>
|
---|
1022 |
|
---|
1023 | This means that the space in the middle of the visible link text must
|
---|
1024 | not be broken across lines. In other words, it's the same as this:
|
---|
1025 |
|
---|
1026 | L<"AutoloadedE<160>Functions"/Autoloaded Functions>
|
---|
1027 |
|
---|
1028 | However, a misapplied space-to-NBSP replacement could (wrongly)
|
---|
1029 | produce something equivalent to this:
|
---|
1030 |
|
---|
1031 | L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
|
---|
1032 |
|
---|
1033 | ...which is almost definitely not going to work as a hyperlink (assuming
|
---|
1034 | this formatter outputs a format supporting hypertext).
|
---|
1035 |
|
---|
1036 | Formatters may choose to just not support the S format code,
|
---|
1037 | especially in cases where the output format simply has no NBSP
|
---|
1038 | character/code and no code for "don't break this stuff across lines".
|
---|
1039 |
|
---|
1040 | =item *
|
---|
1041 |
|
---|
1042 | Besides the NBSP character discussed above, implementors are reminded
|
---|
1043 | of the existence of the other "special" character in Latin-1, the
|
---|
1044 | "soft hyphen" character, also known as "discretionary hyphen",
|
---|
1045 | i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> =
|
---|
1046 | C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation
|
---|
1047 | point. That is, it normally renders as nothing, but may render as a
|
---|
1048 | "-" if a formatter breaks the word at that point. Pod formatters
|
---|
1049 | should, as appropriate, do one of the following: 1) render this with
|
---|
1050 | a code with the same meaning (e.g., "\-" in RTF), 2) pass it through
|
---|
1051 | in the expectation that the formatter understands this character as
|
---|
1052 | such, or 3) delete it.
|
---|
1053 |
|
---|
1054 | For example:
|
---|
1055 |
|
---|
1056 | sigE<shy>action
|
---|
1057 | manuE<shy>script
|
---|
1058 | JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
|
---|
1059 |
|
---|
1060 | These signal to a formatter that if it is to hyphenate "sigaction"
|
---|
1061 | or "manuscript", then it should be done as
|
---|
1062 | "sig-I<[linebreak]>action" or "manu-I<[linebreak]>script"
|
---|
1063 | (and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't
|
---|
1064 | show up at all). And if it is
|
---|
1065 | to hyphenate "Jarkko" and/or "Hietaniemi", it can do
|
---|
1066 | so only at the points where there is a C<EE<lt>shyE<gt>> code.
|
---|
1067 |
|
---|
1068 | In practice, it is anticipated that this character will not be used
|
---|
1069 | often, but formatters should either support it, or delete it.
|
---|
1070 |
|
---|
1071 | =item *
|
---|
1072 |
|
---|
1073 | If you think that you want to add a new command to Pod (like, say, a
|
---|
1074 | "=biblio" command), consider whether you could get the same
|
---|
1075 | effect with a for or begin/end sequence: "=for biblio ..." or "=begin
|
---|
1076 | biblio" ... "=end biblio". Pod processors that don't understand
|
---|
1077 | "=for biblio", etc, will simply ignore it, whereas they may complain
|
---|
1078 | loudly if they see "=biblio".
|
---|
1079 |
|
---|
1080 | =item *
|
---|
1081 |
|
---|
1082 | Throughout this document, "Pod" has been the preferred spelling for
|
---|
1083 | the name of the documentation format. One may also use "POD" or
|
---|
1084 | "pod". For the documentation that is (typically) in the Pod
|
---|
1085 | format, you may use "pod", or "Pod", or "POD". Understanding these
|
---|
1086 | distinctions is useful; but obsessing over how to spell them, usually
|
---|
1087 | is not.
|
---|
1088 |
|
---|
1089 | =back
|
---|
1090 |
|
---|
1091 |
|
---|
1092 |
|
---|
1093 |
|
---|
1094 |
|
---|
1095 | =head1 About LE<lt>...E<gt> Codes
|
---|
1096 |
|
---|
1097 | As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...>
|
---|
1098 | code is the most complex of the Pod formatting codes. The points below
|
---|
1099 | will hopefully clarify what it means and how processors should deal
|
---|
1100 | with it.
|
---|
1101 |
|
---|
1102 | =over
|
---|
1103 |
|
---|
1104 | =item *
|
---|
1105 |
|
---|
1106 | In parsing an LE<lt>...> code, Pod parsers must distinguish at least
|
---|
1107 | four attributes:
|
---|
1108 |
|
---|
1109 | =over
|
---|
1110 |
|
---|
1111 | =item First:
|
---|
1112 |
|
---|
1113 | The link-text. If there is none, this must be undef. (E.g., in
|
---|
1114 | "LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions".
|
---|
1115 | In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no
|
---|
1116 | link text. Note that link text may contain formatting.)
|
---|
1117 |
|
---|
1118 | =item Second:
|
---|
1119 |
|
---|
1120 | The possibly inferred link-text -- i.e., if there was no real link
|
---|
1121 | text, then this is the text that we'll infer in its place. (E.g., for
|
---|
1122 | "LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".)
|
---|
1123 |
|
---|
1124 | =item Third:
|
---|
1125 |
|
---|
1126 | The name or URL, or undef if none. (E.g., in "LE<lt>Perl
|
---|
1127 | Functions|perlfunc>", the name -- also sometimes called the page --
|
---|
1128 | is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.)
|
---|
1129 |
|
---|
1130 | =item Fourth:
|
---|
1131 |
|
---|
1132 | The section (AKA "item" in older perlpods), or undef if none. E.g.,
|
---|
1133 | in L<Getopt::Std/DESCRIPTION>, "DESCRIPTION" is the section. (Note
|
---|
1134 | that this is not the same as a manpage section like the "5" in "man 5
|
---|
1135 | crontab". "Section Foo" in the Pod sense means the part of the text
|
---|
1136 | that's introduced by the heading or item whose text is "Foo".)
|
---|
1137 |
|
---|
1138 | =back
|
---|
1139 |
|
---|
1140 | Pod parsers may also note additional attributes including:
|
---|
1141 |
|
---|
1142 | =over
|
---|
1143 |
|
---|
1144 | =item Fifth:
|
---|
1145 |
|
---|
1146 | A flag for whether item 3 (if present) is a URL (like
|
---|
1147 | "http://lists.perl.org" is), in which case there should be no section
|
---|
1148 | attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or
|
---|
1149 | possibly a man page name (like "crontab(5)" is).
|
---|
1150 |
|
---|
1151 | =item Sixth:
|
---|
1152 |
|
---|
1153 | The raw original LE<lt>...> content, before text is split on
|
---|
1154 | "|", "/", etc, and before EE<lt>...> codes are expanded.
|
---|
1155 |
|
---|
1156 | =back
|
---|
1157 |
|
---|
1158 | (The above were numbered only for concise reference below. It is not
|
---|
1159 | a requirement that these be passed as an actual list or array.)
|
---|
1160 |
|
---|
1161 | For example:
|
---|
1162 |
|
---|
1163 | L<Foo::Bar>
|
---|
1164 | => undef, # link text
|
---|
1165 | "Foo::Bar", # possibly inferred link text
|
---|
1166 | "Foo::Bar", # name
|
---|
1167 | undef, # section
|
---|
1168 | 'pod', # what sort of link
|
---|
1169 | "Foo::Bar" # original content
|
---|
1170 |
|
---|
1171 | L<Perlport's section on NL's|perlport/Newlines>
|
---|
1172 | => "Perlport's section on NL's", # link text
|
---|
1173 | "Perlport's section on NL's", # possibly inferred link text
|
---|
1174 | "perlport", # name
|
---|
1175 | "Newlines", # section
|
---|
1176 | 'pod', # what sort of link
|
---|
1177 | "Perlport's section on NL's|perlport/Newlines" # orig. content
|
---|
1178 |
|
---|
1179 | L<perlport/Newlines>
|
---|
1180 | => undef, # link text
|
---|
1181 | '"Newlines" in perlport', # possibly inferred link text
|
---|
1182 | "perlport", # name
|
---|
1183 | "Newlines", # section
|
---|
1184 | 'pod', # what sort of link
|
---|
1185 | "perlport/Newlines" # original content
|
---|
1186 |
|
---|
1187 | L<crontab(5)/"DESCRIPTION">
|
---|
1188 | => undef, # link text
|
---|
1189 | '"DESCRIPTION" in crontab(5)', # possibly inferred link text
|
---|
1190 | "crontab(5)", # name
|
---|
1191 | "DESCRIPTION", # section
|
---|
1192 | 'man', # what sort of link
|
---|
1193 | 'crontab(5)/"DESCRIPTION"' # original content
|
---|
1194 |
|
---|
1195 | L</Object Attributes>
|
---|
1196 | => undef, # link text
|
---|
1197 | '"Object Attributes"', # possibly inferred link text
|
---|
1198 | undef, # name
|
---|
1199 | "Object Attributes", # section
|
---|
1200 | 'pod', # what sort of link
|
---|
1201 | "/Object Attributes" # original content
|
---|
1202 |
|
---|
1203 | L<http://www.perl.org/>
|
---|
1204 | => undef, # link text
|
---|
1205 | "http://www.perl.org/", # possibly inferred link text
|
---|
1206 | "http://www.perl.org/", # name
|
---|
1207 | undef, # section
|
---|
1208 | 'url', # what sort of link
|
---|
1209 | "http://www.perl.org/" # original content
|
---|
1210 |
|
---|
1211 | Note that you can distinguish URL-links from anything else by the
|
---|
1212 | fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So
|
---|
1213 | C<LE<lt>http://www.perl.comE<gt>> is a URL, but
|
---|
1214 | C<LE<lt>HTTP::ResponseE<gt>> isn't.
|
---|
1215 |
|
---|
1216 | =item *
|
---|
1217 |
|
---|
1218 | In case of LE<lt>...> codes with no "text|" part in them,
|
---|
1219 | older formatters have exhibited great variation in actually displaying
|
---|
1220 | the link or cross reference. For example, LE<lt>crontab(5)> would render
|
---|
1221 | as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage"
|
---|
1222 | or just "C<crontab(5)>".
|
---|
1223 |
|
---|
1224 | Pod processors must now treat "text|"-less links as follows:
|
---|
1225 |
|
---|
1226 | L<name> => L<name|name>
|
---|
1227 | L</section> => L<"section"|/section>
|
---|
1228 | L<name/section> => L<"section" in name|name/section>
|
---|
1229 |
|
---|
1230 | =item *
|
---|
1231 |
|
---|
1232 | Note that section names might contain markup. I.e., if a section
|
---|
1233 | starts with:
|
---|
1234 |
|
---|
1235 | =head2 About the C<-M> Operator
|
---|
1236 |
|
---|
1237 | or with:
|
---|
1238 |
|
---|
1239 | =item About the C<-M> Operator
|
---|
1240 |
|
---|
1241 | then a link to it would look like this:
|
---|
1242 |
|
---|
1243 | L<somedoc/About the C<-M> Operator>
|
---|
1244 |
|
---|
1245 | Formatters may choose to ignore the markup for purposes of resolving
|
---|
1246 | the link and use only the renderable characters in the section name,
|
---|
1247 | as in:
|
---|
1248 |
|
---|
1249 | <h1><a name="About_the_-M_Operator">About the <code>-M</code>
|
---|
1250 | Operator</h1>
|
---|
1251 |
|
---|
1252 | ...
|
---|
1253 |
|
---|
1254 | <a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
|
---|
1255 | Operator" in somedoc</a>
|
---|
1256 |
|
---|
1257 | =item *
|
---|
1258 |
|
---|
1259 | Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>>
|
---|
1260 | links from C<LE<lt>name/itemE<gt>> links (and their targets). These
|
---|
1261 | have been merged syntactically and semantically in the current
|
---|
1262 | specification, and I<section> can refer either to a "=headI<n> Heading
|
---|
1263 | Content" command or to a "=item Item Content" command. This
|
---|
1264 | specification does not specify what behavior should be in the case
|
---|
1265 | of a given document having several things all seeming to produce the
|
---|
1266 | same I<section> identifier (e.g., in HTML, several things all producing
|
---|
1267 | the same I<anchorname> in <a name="I<anchorname>">...</a>
|
---|
1268 | elements). Where Pod processors can control this behavior, they should
|
---|
1269 | use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the
|
---|
1270 | I<first> "Bar" section in Foo.
|
---|
1271 |
|
---|
1272 | But for some processors/formats this cannot be easily controlled; as
|
---|
1273 | with the HTML example, the behavior of multiple ambiguous
|
---|
1274 | <a name="I<anchorname>">...</a> is most easily just left up to
|
---|
1275 | browsers to decide.
|
---|
1276 |
|
---|
1277 | =item *
|
---|
1278 |
|
---|
1279 | Authors wanting to link to a particular (absolute) URL, must do so
|
---|
1280 | only with "LE<lt>scheme:...>" codes (like
|
---|
1281 | LE<lt>http://www.perl.org>), and must not attempt "LE<lt>Some Site
|
---|
1282 | Name|scheme:...>" codes. This restriction avoids many problems
|
---|
1283 | in parsing and rendering LE<lt>...> codes.
|
---|
1284 |
|
---|
1285 | =item *
|
---|
1286 |
|
---|
1287 | In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes
|
---|
1288 | for formatting or for EE<lt>...> escapes, as in:
|
---|
1289 |
|
---|
1290 | L<B<ummE<234>stuff>|...>
|
---|
1291 |
|
---|
1292 | For C<LE<lt>...E<gt>> codes without a "name|" part, only
|
---|
1293 | C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur -- no
|
---|
1294 | other formatting codes. That is, authors should not use
|
---|
1295 | "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>".
|
---|
1296 |
|
---|
1297 | Note, however, that formatting codes and ZE<lt>>'s can occur in any
|
---|
1298 | and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>,
|
---|
1299 | and I<url>).
|
---|
1300 |
|
---|
1301 | Authors must not nest LE<lt>...> codes. For example, "LE<lt>The
|
---|
1302 | LE<lt>Foo::Bar> man page>" should be treated as an error.
|
---|
1303 |
|
---|
1304 | =item *
|
---|
1305 |
|
---|
1306 | Note that Pod authors may use formatting codes inside the "text"
|
---|
1307 | part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">).
|
---|
1308 |
|
---|
1309 | In other words, this is valid:
|
---|
1310 |
|
---|
1311 | Go read L<the docs on C<$.>|perlvar/"$.">
|
---|
1312 |
|
---|
1313 | Some output formats that do allow rendering "LE<lt>...>" codes as
|
---|
1314 | hypertext, might not allow the link-text to be formatted; in
|
---|
1315 | that case, formatters will have to just ignore that formatting.
|
---|
1316 |
|
---|
1317 | =item *
|
---|
1318 |
|
---|
1319 | At time of writing, C<LE<lt>nameE<gt>> values are of two types:
|
---|
1320 | either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which
|
---|
1321 | might be a real Perl module or program in an @INC / PATH
|
---|
1322 | directory, or a .pod file in those places); or the name of a UNIX
|
---|
1323 | man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>>
|
---|
1324 | in ambiguous between a Pod page called "chmod", or the Unix man page
|
---|
1325 | "chmod" (in whatever man-section). However, the presence of a string
|
---|
1326 | in parens, as in "crontab(5)", is sufficient to signal that what
|
---|
1327 | is being discussed is not a Pod page, and so is presumably a
|
---|
1328 | UNIX man page. The distinction is of no importance to many
|
---|
1329 | Pod processors, but some processors that render to hypertext formats
|
---|
1330 | may need to distinguish them in order to know how to render a
|
---|
1331 | given C<LE<lt>fooE<gt>> code.
|
---|
1332 |
|
---|
1333 | =item *
|
---|
1334 |
|
---|
1335 | Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax
|
---|
1336 | (as in "C<LE<lt>Object AttributesE<gt>>"), which was not easily distinguishable
|
---|
1337 | from C<LE<lt>nameE<gt>> syntax. This syntax is no longer in the
|
---|
1338 | specification, and has been replaced by the C<LE<lt>"section"E<gt>> syntax
|
---|
1339 | (where the quotes were formerly optional). Pod parsers should tolerate
|
---|
1340 | the C<LE<lt>sectionE<gt>> syntax, for a while at least. The suggested
|
---|
1341 | heuristic for distinguishing C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>>
|
---|
1342 | is that if it contains any whitespace, it's a I<section>. Pod processors
|
---|
1343 | may warn about this being deprecated syntax.
|
---|
1344 |
|
---|
1345 | =back
|
---|
1346 |
|
---|
1347 | =head1 About =over...=back Regions
|
---|
1348 |
|
---|
1349 | "=over"..."=back" regions are used for various kinds of list-like
|
---|
1350 | structures. (I use the term "region" here simply as a collective
|
---|
1351 | term for everything from the "=over" to the matching "=back".)
|
---|
1352 |
|
---|
1353 | =over
|
---|
1354 |
|
---|
1355 | =item *
|
---|
1356 |
|
---|
1357 | The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ...
|
---|
1358 | "=back" is used for giving the formatter a clue as to how many
|
---|
1359 | "spaces" (ems, or roughly equivalent units) it should tab over,
|
---|
1360 | although many formatters will have to convert this to an absolute
|
---|
1361 | measurement that may not exactly match with the size of spaces (or M's)
|
---|
1362 | in the document's base font. Other formatters may have to completely
|
---|
1363 | ignore the number. The lack of any explicit I<indentlevel> parameter is
|
---|
1364 | equivalent to an I<indentlevel> value of 4. Pod processors may
|
---|
1365 | complain if I<indentlevel> is present but is not a positive number
|
---|
1366 | matching C<m/\A(\d*\.)?\d+\z/>.
|
---|
1367 |
|
---|
1368 | =item *
|
---|
1369 |
|
---|
1370 | Authors of Pod formatters are reminded that "=over" ... "=back" may
|
---|
1371 | map to several different constructs in your output format. For
|
---|
1372 | example, in converting Pod to (X)HTML, it can map to any of
|
---|
1373 | <ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
|
---|
1374 | <blockquote>...</blockquote>. Similarly, "=item" can map to <li> or
|
---|
1375 | <dt>.
|
---|
1376 |
|
---|
1377 | =item *
|
---|
1378 |
|
---|
1379 | Each "=over" ... "=back" region should be one of the following:
|
---|
1380 |
|
---|
1381 | =over
|
---|
1382 |
|
---|
1383 | =item *
|
---|
1384 |
|
---|
1385 | An "=over" ... "=back" region containing only "=item *" commands,
|
---|
1386 | each followed by some number of ordinary/verbatim paragraphs, other
|
---|
1387 | nested "=over" ... "=back" regions, "=for..." paragraphs, and
|
---|
1388 | "=begin"..."=end" regions.
|
---|
1389 |
|
---|
1390 | (Pod processors must tolerate a bare "=item" as if it were "=item
|
---|
1391 | *".) Whether "*" is rendered as a literal asterisk, an "o", or as
|
---|
1392 | some kind of real bullet character, is left up to the Pod formatter,
|
---|
1393 | and may depend on the level of nesting.
|
---|
1394 |
|
---|
1395 | =item *
|
---|
1396 |
|
---|
1397 | An "=over" ... "=back" region containing only
|
---|
1398 | C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them)
|
---|
1399 | followed by some number of ordinary/verbatim paragraphs, other nested
|
---|
1400 | "=over" ... "=back" regions, "=for..." paragraphs, and/or
|
---|
1401 | "=begin"..."=end" codes. Note that the numbers must start at 1
|
---|
1402 | in each section, and must proceed in order and without skipping
|
---|
1403 | numbers.
|
---|
1404 |
|
---|
1405 | (Pod processors must tolerate lines like "=item 1" as if they were
|
---|
1406 | "=item 1.", with the period.)
|
---|
1407 |
|
---|
1408 | =item *
|
---|
1409 |
|
---|
1410 | An "=over" ... "=back" region containing only "=item [text]"
|
---|
1411 | commands, each one (or each group of them) followed by some number of
|
---|
1412 | ordinary/verbatim paragraphs, other nested "=over" ... "=back"
|
---|
1413 | regions, or "=for..." paragraphs, and "=begin"..."=end" regions.
|
---|
1414 |
|
---|
1415 | The "=item [text]" paragraph should not match
|
---|
1416 | C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it
|
---|
1417 | match just C<m/\A=item\s*\z/>.
|
---|
1418 |
|
---|
1419 | =item *
|
---|
1420 |
|
---|
1421 | An "=over" ... "=back" region containing no "=item" paragraphs at
|
---|
1422 | all, and containing only some number of
|
---|
1423 | ordinary/verbatim paragraphs, and possibly also some nested "=over"
|
---|
1424 | ... "=back" regions, "=for..." paragraphs, and "=begin"..."=end"
|
---|
1425 | regions. Such an itemless "=over" ... "=back" region in Pod is
|
---|
1426 | equivalent in meaning to a "<blockquote>...</blockquote>" element in
|
---|
1427 | HTML.
|
---|
1428 |
|
---|
1429 | =back
|
---|
1430 |
|
---|
1431 | Note that with all the above cases, you can determine which type of
|
---|
1432 | "=over" ... "=back" you have, by examining the first (non-"=cut",
|
---|
1433 | non-"=pod") Pod paragraph after the "=over" command.
|
---|
1434 |
|
---|
1435 | =item *
|
---|
1436 |
|
---|
1437 | Pod formatters I<must> tolerate arbitrarily large amounts of text
|
---|
1438 | in the "=item I<text...>" paragraph. In practice, most such
|
---|
1439 | paragraphs are short, as in:
|
---|
1440 |
|
---|
1441 | =item For cutting off our trade with all parts of the world
|
---|
1442 |
|
---|
1443 | But they may be arbitrarily long:
|
---|
1444 |
|
---|
1445 | =item For transporting us beyond seas to be tried for pretended
|
---|
1446 | offenses
|
---|
1447 |
|
---|
1448 | =item He is at this time transporting large armies of foreign
|
---|
1449 | mercenaries to complete the works of death, desolation and
|
---|
1450 | tyranny, already begun with circumstances of cruelty and perfidy
|
---|
1451 | scarcely paralleled in the most barbarous ages, and totally
|
---|
1452 | unworthy the head of a civilized nation.
|
---|
1453 |
|
---|
1454 | =item *
|
---|
1455 |
|
---|
1456 | Pod processors should tolerate "=item *" / "=item I<number>" commands
|
---|
1457 | with no accompanying paragraph. The middle item is an example:
|
---|
1458 |
|
---|
1459 | =over
|
---|
1460 |
|
---|
1461 | =item 1
|
---|
1462 |
|
---|
1463 | Pick up dry cleaning.
|
---|
1464 |
|
---|
1465 | =item 2
|
---|
1466 |
|
---|
1467 | =item 3
|
---|
1468 |
|
---|
1469 | Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
|
---|
1470 |
|
---|
1471 | =back
|
---|
1472 |
|
---|
1473 | =item *
|
---|
1474 |
|
---|
1475 | No "=over" ... "=back" region can contain headings. Processors may
|
---|
1476 | treat such a heading as an error.
|
---|
1477 |
|
---|
1478 | =item *
|
---|
1479 |
|
---|
1480 | Note that an "=over" ... "=back" region should have some
|
---|
1481 | content. That is, authors should not have an empty region like this:
|
---|
1482 |
|
---|
1483 | =over
|
---|
1484 |
|
---|
1485 | =back
|
---|
1486 |
|
---|
1487 | Pod processors seeing such a contentless "=over" ... "=back" region,
|
---|
1488 | may ignore it, or may report it as an error.
|
---|
1489 |
|
---|
1490 | =item *
|
---|
1491 |
|
---|
1492 | Processors must tolerate an "=over" list that goes off the end of the
|
---|
1493 | document (i.e., which has no matching "=back"), but they may warn
|
---|
1494 | about such a list.
|
---|
1495 |
|
---|
1496 | =item *
|
---|
1497 |
|
---|
1498 | Authors of Pod formatters should note that this construct:
|
---|
1499 |
|
---|
1500 | =item Neque
|
---|
1501 |
|
---|
1502 | =item Porro
|
---|
1503 |
|
---|
1504 | =item Quisquam Est
|
---|
1505 |
|
---|
1506 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
|
---|
1507 | velit, sed quia non numquam eius modi tempora incidunt ut
|
---|
1508 | labore et dolore magnam aliquam quaerat voluptatem.
|
---|
1509 |
|
---|
1510 | =item Ut Enim
|
---|
1511 |
|
---|
1512 | is semantically ambiguous, in a way that makes formatting decisions
|
---|
1513 | a bit difficult. On the one hand, it could be mention of an item
|
---|
1514 | "Neque", mention of another item "Porro", and mention of another
|
---|
1515 | item "Quisquam Est", with just the last one requiring the explanatory
|
---|
1516 | paragraph "Qui dolorem ipsum quia dolor..."; and then an item
|
---|
1517 | "Ut Enim". In that case, you'd want to format it like so:
|
---|
1518 |
|
---|
1519 | Neque
|
---|
1520 |
|
---|
1521 | Porro
|
---|
1522 |
|
---|
1523 | Quisquam Est
|
---|
1524 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
|
---|
1525 | velit, sed quia non numquam eius modi tempora incidunt ut
|
---|
1526 | labore et dolore magnam aliquam quaerat voluptatem.
|
---|
1527 |
|
---|
1528 | Ut Enim
|
---|
1529 |
|
---|
1530 | But it could equally well be a discussion of three (related or equivalent)
|
---|
1531 | items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph
|
---|
1532 | explaining them all, and then a new item "Ut Enim". In that case, you'd
|
---|
1533 | probably want to format it like so:
|
---|
1534 |
|
---|
1535 | Neque
|
---|
1536 | Porro
|
---|
1537 | Quisquam Est
|
---|
1538 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
|
---|
1539 | velit, sed quia non numquam eius modi tempora incidunt ut
|
---|
1540 | labore et dolore magnam aliquam quaerat voluptatem.
|
---|
1541 |
|
---|
1542 | Ut Enim
|
---|
1543 |
|
---|
1544 | But (for the forseeable future), Pod does not provide any way for Pod
|
---|
1545 | authors to distinguish which grouping is meant by the above
|
---|
1546 | "=item"-cluster structure. So formatters should format it like so:
|
---|
1547 |
|
---|
1548 | Neque
|
---|
1549 |
|
---|
1550 | Porro
|
---|
1551 |
|
---|
1552 | Quisquam Est
|
---|
1553 |
|
---|
1554 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
|
---|
1555 | velit, sed quia non numquam eius modi tempora incidunt ut
|
---|
1556 | labore et dolore magnam aliquam quaerat voluptatem.
|
---|
1557 |
|
---|
1558 | Ut Enim
|
---|
1559 |
|
---|
1560 | That is, there should be (at least roughly) equal spacing between
|
---|
1561 | items as between paragraphs (although that spacing may well be less
|
---|
1562 | than the full height of a line of text). This leaves it to the reader
|
---|
1563 | to use (con)textual cues to figure out whether the "Qui dolorem
|
---|
1564 | ipsum..." paragraph applies to the "Quisquam Est" item or to all three
|
---|
1565 | items "Neque", "Porro", and "Quisquam Est". While not an ideal
|
---|
1566 | situation, this is preferable to providing formatting cues that may
|
---|
1567 | be actually contrary to the author's intent.
|
---|
1568 |
|
---|
1569 | =back
|
---|
1570 |
|
---|
1571 |
|
---|
1572 |
|
---|
1573 | =head1 About Data Paragraphs and "=begin/=end" Regions
|
---|
1574 |
|
---|
1575 | Data paragraphs are typically used for inlining non-Pod data that is
|
---|
1576 | to be used (typically passed through) when rendering the document to
|
---|
1577 | a specific format:
|
---|
1578 |
|
---|
1579 | =begin rtf
|
---|
1580 |
|
---|
1581 | \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
|
---|
1582 |
|
---|
1583 | =end rtf
|
---|
1584 |
|
---|
1585 | The exact same effect could, incidentally, be achieved with a single
|
---|
1586 | "=for" paragraph:
|
---|
1587 |
|
---|
1588 | =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
|
---|
1589 |
|
---|
1590 | (Although that is not formally a data paragraph, it has the same
|
---|
1591 | meaning as one, and Pod parsers may parse it as one.)
|
---|
1592 |
|
---|
1593 | Another example of a data paragraph:
|
---|
1594 |
|
---|
1595 | =begin html
|
---|
1596 |
|
---|
1597 | I like <em>PIE</em>!
|
---|
1598 |
|
---|
1599 | <hr>Especially pecan pie!
|
---|
1600 |
|
---|
1601 | =end html
|
---|
1602 |
|
---|
1603 | If these were ordinary paragraphs, the Pod parser would try to
|
---|
1604 | expand the "EE<lt>/em>" (in the first paragraph) as a formatting
|
---|
1605 | code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this
|
---|
1606 | is in a "=begin I<identifier>"..."=end I<identifier>" region I<and>
|
---|
1607 | the identifier "html" doesn't begin have a ":" prefix, the contents
|
---|
1608 | of this region are stored as data paragraphs, instead of being
|
---|
1609 | processed as ordinary paragraphs (or if they began with a spaces
|
---|
1610 | and/or tabs, as verbatim paragraphs).
|
---|
1611 |
|
---|
1612 | As a further example: At time of writing, no "biblio" identifier is
|
---|
1613 | supported, but suppose some processor were written to recognize it as
|
---|
1614 | a way of (say) denoting a bibliographic reference (necessarily
|
---|
1615 | containing formatting codes in ordinary paragraphs). The fact that
|
---|
1616 | "biblio" paragraphs were meant for ordinary processing would be
|
---|
1617 | indicated by prefacing each "biblio" identifier with a colon:
|
---|
1618 |
|
---|
1619 | =begin :biblio
|
---|
1620 |
|
---|
1621 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
|
---|
1622 | Programs.> Prentice-Hall, Englewood Cliffs, NJ.
|
---|
1623 |
|
---|
1624 | =end :biblio
|
---|
1625 |
|
---|
1626 | This would signal to the parser that paragraphs in this begin...end
|
---|
1627 | region are subject to normal handling as ordinary/verbatim paragraphs
|
---|
1628 | (while still tagged as meant only for processors that understand the
|
---|
1629 | "biblio" identifier). The same effect could be had with:
|
---|
1630 |
|
---|
1631 | =for :biblio
|
---|
1632 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
|
---|
1633 | Programs.> Prentice-Hall, Englewood Cliffs, NJ.
|
---|
1634 |
|
---|
1635 | The ":" on these identifiers means simply "process this stuff
|
---|
1636 | normally, even though the result will be for some special target".
|
---|
1637 | I suggest that parser APIs report "biblio" as the target identifier,
|
---|
1638 | but also report that it had a ":" prefix. (And similarly, with the
|
---|
1639 | above "html", report "html" as the target identifier, and note the
|
---|
1640 | I<lack> of a ":" prefix.)
|
---|
1641 |
|
---|
1642 | Note that a "=begin I<identifier>"..."=end I<identifier>" region where
|
---|
1643 | I<identifier> begins with a colon, I<can> contain commands. For example:
|
---|
1644 |
|
---|
1645 | =begin :biblio
|
---|
1646 |
|
---|
1647 | Wirth's classic is available in several editions, including:
|
---|
1648 |
|
---|
1649 | =for comment
|
---|
1650 | hm, check abebooks.com for how much used copies cost.
|
---|
1651 |
|
---|
1652 | =over
|
---|
1653 |
|
---|
1654 | =item
|
---|
1655 |
|
---|
1656 | Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
|
---|
1657 | Teubner, Stuttgart. [Yes, it's in German.]
|
---|
1658 |
|
---|
1659 | =item
|
---|
1660 |
|
---|
1661 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
|
---|
1662 | Programs.> Prentice-Hall, Englewood Cliffs, NJ.
|
---|
1663 |
|
---|
1664 | =back
|
---|
1665 |
|
---|
1666 | =end :biblio
|
---|
1667 |
|
---|
1668 | Note, however, a "=begin I<identifier>"..."=end I<identifier>"
|
---|
1669 | region where I<identifier> does I<not> begin with a colon, should not
|
---|
1670 | directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back",
|
---|
1671 | nor "=item". For example, this may be considered invalid:
|
---|
1672 |
|
---|
1673 | =begin somedata
|
---|
1674 |
|
---|
1675 | This is a data paragraph.
|
---|
1676 |
|
---|
1677 | =head1 Don't do this!
|
---|
1678 |
|
---|
1679 | This is a data paragraph too.
|
---|
1680 |
|
---|
1681 | =end somedata
|
---|
1682 |
|
---|
1683 | A Pod processor may signal that the above (specifically the "=head1"
|
---|
1684 | paragraph) is an error. Note, however, that the following should
|
---|
1685 | I<not> be treated as an error:
|
---|
1686 |
|
---|
1687 | =begin somedata
|
---|
1688 |
|
---|
1689 | This is a data paragraph.
|
---|
1690 |
|
---|
1691 | =cut
|
---|
1692 |
|
---|
1693 | # Yup, this isn't Pod anymore.
|
---|
1694 | sub excl { (rand() > .5) ? "hoo!" : "hah!" }
|
---|
1695 |
|
---|
1696 | =pod
|
---|
1697 |
|
---|
1698 | This is a data paragraph too.
|
---|
1699 |
|
---|
1700 | =end somedata
|
---|
1701 |
|
---|
1702 | And this too is valid:
|
---|
1703 |
|
---|
1704 | =begin someformat
|
---|
1705 |
|
---|
1706 | This is a data paragraph.
|
---|
1707 |
|
---|
1708 | And this is a data paragraph.
|
---|
1709 |
|
---|
1710 | =begin someotherformat
|
---|
1711 |
|
---|
1712 | This is a data paragraph too.
|
---|
1713 |
|
---|
1714 | And this is a data paragraph too.
|
---|
1715 |
|
---|
1716 | =begin :yetanotherformat
|
---|
1717 |
|
---|
1718 | =head2 This is a command paragraph!
|
---|
1719 |
|
---|
1720 | This is an ordinary paragraph!
|
---|
1721 |
|
---|
1722 | And this is a verbatim paragraph!
|
---|
1723 |
|
---|
1724 | =end :yetanotherformat
|
---|
1725 |
|
---|
1726 | =end someotherformat
|
---|
1727 |
|
---|
1728 | Another data paragraph!
|
---|
1729 |
|
---|
1730 | =end someformat
|
---|
1731 |
|
---|
1732 | The contents of the above "=begin :yetanotherformat" ...
|
---|
1733 | "=end :yetanotherformat" region I<aren't> data paragraphs, because
|
---|
1734 | the immediately containing region's identifier (":yetanotherformat")
|
---|
1735 | begins with a colon. In practice, most regions that contain
|
---|
1736 | data paragraphs will contain I<only> data paragraphs; however,
|
---|
1737 | the above nesting is syntactically valid as Pod, even if it is
|
---|
1738 | rare. However, the handlers for some formats, like "html",
|
---|
1739 | will accept only data paragraphs, not nested regions; and they may
|
---|
1740 | complain if they see (targeted for them) nested regions, or commands,
|
---|
1741 | other than "=end", "=pod", and "=cut".
|
---|
1742 |
|
---|
1743 | Also consider this valid structure:
|
---|
1744 |
|
---|
1745 | =begin :biblio
|
---|
1746 |
|
---|
1747 | Wirth's classic is available in several editions, including:
|
---|
1748 |
|
---|
1749 | =over
|
---|
1750 |
|
---|
1751 | =item
|
---|
1752 |
|
---|
1753 | Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
|
---|
1754 | Teubner, Stuttgart. [Yes, it's in German.]
|
---|
1755 |
|
---|
1756 | =item
|
---|
1757 |
|
---|
1758 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
|
---|
1759 | Programs.> Prentice-Hall, Englewood Cliffs, NJ.
|
---|
1760 |
|
---|
1761 | =back
|
---|
1762 |
|
---|
1763 | Buy buy buy!
|
---|
1764 |
|
---|
1765 | =begin html
|
---|
1766 |
|
---|
1767 | <img src='wirth_spokesmodeling_book.png'>
|
---|
1768 |
|
---|
1769 | <hr>
|
---|
1770 |
|
---|
1771 | =end html
|
---|
1772 |
|
---|
1773 | Now now now!
|
---|
1774 |
|
---|
1775 | =end :biblio
|
---|
1776 |
|
---|
1777 | There, the "=begin html"..."=end html" region is nested inside
|
---|
1778 | the larger "=begin :biblio"..."=end :biblio" region. Note that the
|
---|
1779 | content of the "=begin html"..."=end html" region is data
|
---|
1780 | paragraph(s), because the immediately containing region's identifier
|
---|
1781 | ("html") I<doesn't> begin with a colon.
|
---|
1782 |
|
---|
1783 | Pod parsers, when processing a series of data paragraphs one
|
---|
1784 | after another (within a single region), should consider them to
|
---|
1785 | be one large data paragraph that happens to contain blank lines. So
|
---|
1786 | the content of the above "=begin html"..."=end html" I<may> be stored
|
---|
1787 | as two data paragraphs (one consisting of
|
---|
1788 | "<img src='wirth_spokesmodeling_book.png'>\n"
|
---|
1789 | and another consisting of "<hr>\n"), but I<should> be stored as
|
---|
1790 | a single data paragraph (consisting of
|
---|
1791 | "<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
|
---|
1792 |
|
---|
1793 | Pod processors should tolerate empty
|
---|
1794 | "=begin I<something>"..."=end I<something>" regions,
|
---|
1795 | empty "=begin :I<something>"..."=end :I<something>" regions, and
|
---|
1796 | contentless "=for I<something>" and "=for :I<something>"
|
---|
1797 | paragraphs. I.e., these should be tolerated:
|
---|
1798 |
|
---|
1799 | =for html
|
---|
1800 |
|
---|
1801 | =begin html
|
---|
1802 |
|
---|
1803 | =end html
|
---|
1804 |
|
---|
1805 | =begin :biblio
|
---|
1806 |
|
---|
1807 | =end :biblio
|
---|
1808 |
|
---|
1809 | Incidentally, note that there's no easy way to express a data
|
---|
1810 | paragraph starting with something that looks like a command. Consider:
|
---|
1811 |
|
---|
1812 | =begin stuff
|
---|
1813 |
|
---|
1814 | =shazbot
|
---|
1815 |
|
---|
1816 | =end stuff
|
---|
1817 |
|
---|
1818 | There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data
|
---|
1819 | paragraph "=shazbot\n". However, you can express a data paragraph consisting
|
---|
1820 | of "=shazbot\n" using this code:
|
---|
1821 |
|
---|
1822 | =for stuff =shazbot
|
---|
1823 |
|
---|
1824 | The situation where this is necessary, is presumably quite rare.
|
---|
1825 |
|
---|
1826 | Note that =end commands must match the currently open =begin command. That
|
---|
1827 | is, they must properly nest. For example, this is valid:
|
---|
1828 |
|
---|
1829 | =begin outer
|
---|
1830 |
|
---|
1831 | X
|
---|
1832 |
|
---|
1833 | =begin inner
|
---|
1834 |
|
---|
1835 | Y
|
---|
1836 |
|
---|
1837 | =end inner
|
---|
1838 |
|
---|
1839 | Z
|
---|
1840 |
|
---|
1841 | =end outer
|
---|
1842 |
|
---|
1843 | while this is invalid:
|
---|
1844 |
|
---|
1845 | =begin outer
|
---|
1846 |
|
---|
1847 | X
|
---|
1848 |
|
---|
1849 | =begin inner
|
---|
1850 |
|
---|
1851 | Y
|
---|
1852 |
|
---|
1853 | =end outer
|
---|
1854 |
|
---|
1855 | Z
|
---|
1856 |
|
---|
1857 | =end inner
|
---|
1858 |
|
---|
1859 | This latter is improper because when the "=end outer" command is seen, the
|
---|
1860 | currently open region has the formatname "inner", not "outer". (It just
|
---|
1861 | happens that "outer" is the format name of a higher-up region.) This is
|
---|
1862 | an error. Processors must by default report this as an error, and may halt
|
---|
1863 | processing the document containing that error. A corollary of this is that
|
---|
1864 | regions cannot "overlap" -- i.e., the latter block above does not represent
|
---|
1865 | a region called "outer" which contains X and Y, overlapping a region called
|
---|
1866 | "inner" which contains Y and Z. But because it is invalid (as all
|
---|
1867 | apparently overlapping regions would be), it doesn't represent that, or
|
---|
1868 | anything at all.
|
---|
1869 |
|
---|
1870 | Similarly, this is invalid:
|
---|
1871 |
|
---|
1872 | =begin thing
|
---|
1873 |
|
---|
1874 | =end hting
|
---|
1875 |
|
---|
1876 | This is an error because the region is opened by "thing", and the "=end"
|
---|
1877 | tries to close "hting" [sic].
|
---|
1878 |
|
---|
1879 | This is also invalid:
|
---|
1880 |
|
---|
1881 | =begin thing
|
---|
1882 |
|
---|
1883 | =end
|
---|
1884 |
|
---|
1885 | This is invalid because every "=end" command must have a formatname
|
---|
1886 | parameter.
|
---|
1887 |
|
---|
1888 | =head1 SEE ALSO
|
---|
1889 |
|
---|
1890 | L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">,
|
---|
1891 | L<podchecker>
|
---|
1892 |
|
---|
1893 | =head1 AUTHOR
|
---|
1894 |
|
---|
1895 | Sean M. Burke
|
---|
1896 |
|
---|
1897 | =cut
|
---|
1898 |
|
---|
1899 |
|
---|