1 | =head1 NAME
|
---|
2 |
|
---|
3 | perldebguts - Guts of Perl debugging
|
---|
4 |
|
---|
5 | =head1 DESCRIPTION
|
---|
6 |
|
---|
7 | This is not the perldebug(1) manpage, which tells you how to use
|
---|
8 | the debugger. This manpage describes low-level details concerning
|
---|
9 | the debugger's internals, which range from difficult to impossible
|
---|
10 | to understand for anyone who isn't incredibly intimate with Perl's guts.
|
---|
11 | Caveat lector.
|
---|
12 |
|
---|
13 | =head1 Debugger Internals
|
---|
14 |
|
---|
15 | Perl has special debugging hooks at compile-time and run-time used
|
---|
16 | to create debugging environments. These hooks are not to be confused
|
---|
17 | with the I<perl -Dxxx> command described in L<perlrun>, which is
|
---|
18 | usable only if a special Perl is built per the instructions in the
|
---|
19 | F<INSTALL> podpage in the Perl source tree.
|
---|
20 |
|
---|
21 | For example, whenever you call Perl's built-in C<caller> function
|
---|
22 | from the package C<DB>, the arguments that the corresponding stack
|
---|
23 | frame was called with are copied to the C<@DB::args> array. These
|
---|
24 | mechanisms are enabled by calling Perl with the B<-d> switch.
|
---|
25 | Specifically, the following additional features are enabled
|
---|
26 | (cf. L<perlvar/$^P>):
|
---|
27 |
|
---|
28 | =over 4
|
---|
29 |
|
---|
30 | =item *
|
---|
31 |
|
---|
32 | Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require
|
---|
33 | 'perl5db.pl'}> if not present) before the first line of your program.
|
---|
34 |
|
---|
35 | =item *
|
---|
36 |
|
---|
37 | Each array C<@{"_<$filename"}> holds the lines of $filename for a
|
---|
38 | file compiled by Perl. The same is also true for C<eval>ed strings
|
---|
39 | that contain subroutines, or which are currently being executed.
|
---|
40 | The $filename for C<eval>ed strings looks like C<(eval 34)>.
|
---|
41 | Code assertions in regexes look like C<(re_eval 19)>.
|
---|
42 |
|
---|
43 | Values in this array are magical in numeric context: they compare
|
---|
44 | equal to zero only if the line is not breakable.
|
---|
45 |
|
---|
46 | =item *
|
---|
47 |
|
---|
48 | Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed
|
---|
49 | by line number. Individual entries (as opposed to the whole hash)
|
---|
50 | are settable. Perl only cares about Boolean true here, although
|
---|
51 | the values used by F<perl5db.pl> have the form
|
---|
52 | C<"$break_condition\0$action">.
|
---|
53 |
|
---|
54 | The same holds for evaluated strings that contain subroutines, or
|
---|
55 | which are currently being executed. The $filename for C<eval>ed strings
|
---|
56 | looks like C<(eval 34)> or C<(re_eval 19)>.
|
---|
57 |
|
---|
58 | =item *
|
---|
59 |
|
---|
60 | Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is
|
---|
61 | also the case for evaluated strings that contain subroutines, or
|
---|
62 | which are currently being executed. The $filename for C<eval>ed
|
---|
63 | strings looks like C<(eval 34)> or C<(re_eval 19)>.
|
---|
64 |
|
---|
65 | =item *
|
---|
66 |
|
---|
67 | After each C<require>d file is compiled, but before it is executed,
|
---|
68 | C<DB::postponed(*{"_<$filename"})> is called if the subroutine
|
---|
69 | C<DB::postponed> exists. Here, the $filename is the expanded name of
|
---|
70 | the C<require>d file, as found in the values of %INC.
|
---|
71 |
|
---|
72 | =item *
|
---|
73 |
|
---|
74 | After each subroutine C<subname> is compiled, the existence of
|
---|
75 | C<$DB::postponed{subname}> is checked. If this key exists,
|
---|
76 | C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine
|
---|
77 | also exists.
|
---|
78 |
|
---|
79 | =item *
|
---|
80 |
|
---|
81 | A hash C<%DB::sub> is maintained, whose keys are subroutine names
|
---|
82 | and whose values have the form C<filename:startline-endline>.
|
---|
83 | C<filename> has the form C<(eval 34)> for subroutines defined inside
|
---|
84 | C<eval>s, or C<(re_eval 19)> for those within regex code assertions.
|
---|
85 |
|
---|
86 | =item *
|
---|
87 |
|
---|
88 | When the execution of your program reaches a point that can hold a
|
---|
89 | breakpoint, the C<DB::DB()> subroutine is called if any of the variables
|
---|
90 | C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true. These variables
|
---|
91 | are not C<local>izable. This feature is disabled when executing
|
---|
92 | inside C<DB::DB()>, including functions called from it
|
---|
93 | unless C<< $^D & (1<<30) >> is true.
|
---|
94 |
|
---|
95 | =item *
|
---|
96 |
|
---|
97 | When execution of the program reaches a subroutine call, a call to
|
---|
98 | C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the
|
---|
99 | name of the called subroutine. (This doesn't happen if the subroutine
|
---|
100 | was compiled in the C<DB> package.)
|
---|
101 |
|
---|
102 | =back
|
---|
103 |
|
---|
104 | Note that if C<&DB::sub> needs external data for it to work, no
|
---|
105 | subroutine call is possible without it. As an example, the standard
|
---|
106 | debugger's C<&DB::sub> depends on the C<$DB::deep> variable
|
---|
107 | (it defines how many levels of recursion deep into the debugger you can go
|
---|
108 | before a mandatory break). If C<$DB::deep> is not defined, subroutine
|
---|
109 | calls are not possible, even though C<&DB::sub> exists.
|
---|
110 |
|
---|
111 | =head2 Writing Your Own Debugger
|
---|
112 |
|
---|
113 | =head3 Environment Variables
|
---|
114 |
|
---|
115 | The C<PERL5DB> environment variable can be used to define a debugger.
|
---|
116 | For example, the minimal "working" debugger (it actually doesn't do anything)
|
---|
117 | consists of one line:
|
---|
118 |
|
---|
119 | sub DB::DB {}
|
---|
120 |
|
---|
121 | It can easily be defined like this:
|
---|
122 |
|
---|
123 | $ PERL5DB="sub DB::DB {}" perl -d your-script
|
---|
124 |
|
---|
125 | Another brief debugger, slightly more useful, can be created
|
---|
126 | with only the line:
|
---|
127 |
|
---|
128 | sub DB::DB {print ++$i; scalar <STDIN>}
|
---|
129 |
|
---|
130 | This debugger prints a number which increments for each statement
|
---|
131 | encountered and waits for you to hit a newline before continuing
|
---|
132 | to the next statement.
|
---|
133 |
|
---|
134 | The following debugger is actually useful:
|
---|
135 |
|
---|
136 | {
|
---|
137 | package DB;
|
---|
138 | sub DB {}
|
---|
139 | sub sub {print ++$i, " $sub\n"; &$sub}
|
---|
140 | }
|
---|
141 |
|
---|
142 | It prints the sequence number of each subroutine call and the name of the
|
---|
143 | called subroutine. Note that C<&DB::sub> is being compiled into the
|
---|
144 | package C<DB> through the use of the C<package> directive.
|
---|
145 |
|
---|
146 | When it starts, the debugger reads your rc file (F<./.perldb> or
|
---|
147 | F<~/.perldb> under Unix), which can set important options.
|
---|
148 | (A subroutine (C<&afterinit>) can be defined here as well; it is executed
|
---|
149 | after the debugger completes its own initialization.)
|
---|
150 |
|
---|
151 | After the rc file is read, the debugger reads the PERLDB_OPTS
|
---|
152 | environment variable and uses it to set debugger options. The
|
---|
153 | contents of this variable are treated as if they were the argument
|
---|
154 | of an C<o ...> debugger command (q.v. in L<perldebug/Options>).
|
---|
155 |
|
---|
156 | =head3 Debugger internal variables
|
---|
157 | In addition to the file and subroutine-related variables mentioned above,
|
---|
158 | the debugger also maintains various magical internal variables.
|
---|
159 |
|
---|
160 | =over 4
|
---|
161 |
|
---|
162 | =item *
|
---|
163 |
|
---|
164 | C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which
|
---|
165 | holds the lines of the currently-selected file (compiled by Perl), either
|
---|
166 | explicitly chosen with the debugger's C<f> command, or implicitly by flow
|
---|
167 | of execution.
|
---|
168 |
|
---|
169 | Values in this array are magical in numeric context: they compare
|
---|
170 | equal to zero only if the line is not breakable.
|
---|
171 |
|
---|
172 | =item *
|
---|
173 |
|
---|
174 | C<%DB::dbline>, is an alias for C<%{"::_<current_file"}>, which
|
---|
175 | contains breakpoints and actions keyed by line number in
|
---|
176 | the currently-selected file, either explicitly chosen with the
|
---|
177 | debugger's C<f> command, or implicitly by flow of execution.
|
---|
178 |
|
---|
179 | As previously noted, individual entries (as opposed to the whole hash)
|
---|
180 | are settable. Perl only cares about Boolean true here, although
|
---|
181 | the values used by F<perl5db.pl> have the form
|
---|
182 | C<"$break_condition\0$action">.
|
---|
183 |
|
---|
184 | =back
|
---|
185 |
|
---|
186 | =head3 Debugger customization functions
|
---|
187 |
|
---|
188 | Some functions are provided to simplify customization.
|
---|
189 |
|
---|
190 | =over 4
|
---|
191 |
|
---|
192 | =item *
|
---|
193 |
|
---|
194 | See L<perldebug/"Options"> for description of options parsed by
|
---|
195 | C<DB::parse_options(string)> parses debugger options; see
|
---|
196 | L<pperldebug/Options> for a description of options recognized.
|
---|
197 |
|
---|
198 | =item *
|
---|
199 |
|
---|
200 | C<DB::dump_trace(skip[,count])> skips the specified number of frames
|
---|
201 | and returns a list containing information about the calling frames (all
|
---|
202 | of them, if C<count> is missing). Each entry is reference to a hash
|
---|
203 | with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine
|
---|
204 | name, or info about C<eval>), C<args> (C<undef> or a reference to
|
---|
205 | an array), C<file>, and C<line>.
|
---|
206 |
|
---|
207 | =item *
|
---|
208 |
|
---|
209 | C<DB::print_trace(FH, skip[, count[, short]])> prints
|
---|
210 | formatted info about caller frames. The last two functions may be
|
---|
211 | convenient as arguments to C<< < >>, C<< << >> commands.
|
---|
212 |
|
---|
213 | =back
|
---|
214 |
|
---|
215 | Note that any variables and functions that are not documented in
|
---|
216 | this manpages (or in L<perldebug>) are considered for internal
|
---|
217 | use only, and as such are subject to change without notice.
|
---|
218 |
|
---|
219 | =head1 Frame Listing Output Examples
|
---|
220 |
|
---|
221 | The C<frame> option can be used to control the output of frame
|
---|
222 | information. For example, contrast this expression trace:
|
---|
223 |
|
---|
224 | $ perl -de 42
|
---|
225 | Stack dump during die enabled outside of evals.
|
---|
226 |
|
---|
227 | Loading DB routines from perl5db.pl patch level 0.94
|
---|
228 | Emacs support available.
|
---|
229 |
|
---|
230 | Enter h or `h h' for help.
|
---|
231 |
|
---|
232 | main::(-e:1): 0
|
---|
233 | DB<1> sub foo { 14 }
|
---|
234 |
|
---|
235 | DB<2> sub bar { 3 }
|
---|
236 |
|
---|
237 | DB<3> t print foo() * bar()
|
---|
238 | main::((eval 172):3): print foo() + bar();
|
---|
239 | main::foo((eval 168):2):
|
---|
240 | main::bar((eval 170):2):
|
---|
241 | 42
|
---|
242 |
|
---|
243 | with this one, once the C<o>ption C<frame=2> has been set:
|
---|
244 |
|
---|
245 | DB<4> o f=2
|
---|
246 | frame = '2'
|
---|
247 | DB<5> t print foo() * bar()
|
---|
248 | 3: foo() * bar()
|
---|
249 | entering main::foo
|
---|
250 | 2: sub foo { 14 };
|
---|
251 | exited main::foo
|
---|
252 | entering main::bar
|
---|
253 | 2: sub bar { 3 };
|
---|
254 | exited main::bar
|
---|
255 | 42
|
---|
256 |
|
---|
257 | By way of demonstration, we present below a laborious listing
|
---|
258 | resulting from setting your C<PERLDB_OPTS> environment variable to
|
---|
259 | the value C<f=n N>, and running I<perl -d -V> from the command line.
|
---|
260 | Examples use various values of C<n> are shown to give you a feel
|
---|
261 | for the difference between settings. Long those it may be, this
|
---|
262 | is not a complete listing, but only excerpts.
|
---|
263 |
|
---|
264 | =over 4
|
---|
265 |
|
---|
266 | =item 1
|
---|
267 |
|
---|
268 | entering main::BEGIN
|
---|
269 | entering Config::BEGIN
|
---|
270 | Package lib/Exporter.pm.
|
---|
271 | Package lib/Carp.pm.
|
---|
272 | Package lib/Config.pm.
|
---|
273 | entering Config::TIEHASH
|
---|
274 | entering Exporter::import
|
---|
275 | entering Exporter::export
|
---|
276 | entering Config::myconfig
|
---|
277 | entering Config::FETCH
|
---|
278 | entering Config::FETCH
|
---|
279 | entering Config::FETCH
|
---|
280 | entering Config::FETCH
|
---|
281 |
|
---|
282 | =item 2
|
---|
283 |
|
---|
284 | entering main::BEGIN
|
---|
285 | entering Config::BEGIN
|
---|
286 | Package lib/Exporter.pm.
|
---|
287 | Package lib/Carp.pm.
|
---|
288 | exited Config::BEGIN
|
---|
289 | Package lib/Config.pm.
|
---|
290 | entering Config::TIEHASH
|
---|
291 | exited Config::TIEHASH
|
---|
292 | entering Exporter::import
|
---|
293 | entering Exporter::export
|
---|
294 | exited Exporter::export
|
---|
295 | exited Exporter::import
|
---|
296 | exited main::BEGIN
|
---|
297 | entering Config::myconfig
|
---|
298 | entering Config::FETCH
|
---|
299 | exited Config::FETCH
|
---|
300 | entering Config::FETCH
|
---|
301 | exited Config::FETCH
|
---|
302 | entering Config::FETCH
|
---|
303 |
|
---|
304 | =item 4
|
---|
305 |
|
---|
306 | in $=main::BEGIN() from /dev/null:0
|
---|
307 | in $=Config::BEGIN() from lib/Config.pm:2
|
---|
308 | Package lib/Exporter.pm.
|
---|
309 | Package lib/Carp.pm.
|
---|
310 | Package lib/Config.pm.
|
---|
311 | in $=Config::TIEHASH('Config') from lib/Config.pm:644
|
---|
312 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
313 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
|
---|
314 | in @=Config::myconfig() from /dev/null:0
|
---|
315 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
|
---|
316 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
|
---|
317 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
|
---|
318 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
|
---|
319 | in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
|
---|
320 | in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
|
---|
321 |
|
---|
322 | =item 6
|
---|
323 |
|
---|
324 | in $=main::BEGIN() from /dev/null:0
|
---|
325 | in $=Config::BEGIN() from lib/Config.pm:2
|
---|
326 | Package lib/Exporter.pm.
|
---|
327 | Package lib/Carp.pm.
|
---|
328 | out $=Config::BEGIN() from lib/Config.pm:0
|
---|
329 | Package lib/Config.pm.
|
---|
330 | in $=Config::TIEHASH('Config') from lib/Config.pm:644
|
---|
331 | out $=Config::TIEHASH('Config') from lib/Config.pm:644
|
---|
332 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
333 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
|
---|
334 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
|
---|
335 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
336 | out $=main::BEGIN() from /dev/null:0
|
---|
337 | in @=Config::myconfig() from /dev/null:0
|
---|
338 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
|
---|
339 | out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
|
---|
340 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
|
---|
341 | out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
|
---|
342 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
|
---|
343 | out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
|
---|
344 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
|
---|
345 |
|
---|
346 | =item 14
|
---|
347 |
|
---|
348 | in $=main::BEGIN() from /dev/null:0
|
---|
349 | in $=Config::BEGIN() from lib/Config.pm:2
|
---|
350 | Package lib/Exporter.pm.
|
---|
351 | Package lib/Carp.pm.
|
---|
352 | out $=Config::BEGIN() from lib/Config.pm:0
|
---|
353 | Package lib/Config.pm.
|
---|
354 | in $=Config::TIEHASH('Config') from lib/Config.pm:644
|
---|
355 | out $=Config::TIEHASH('Config') from lib/Config.pm:644
|
---|
356 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
357 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
|
---|
358 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
|
---|
359 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
360 | out $=main::BEGIN() from /dev/null:0
|
---|
361 | in @=Config::myconfig() from /dev/null:0
|
---|
362 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
|
---|
363 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
|
---|
364 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
|
---|
365 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
|
---|
366 |
|
---|
367 | =item 30
|
---|
368 |
|
---|
369 | in $=CODE(0x15eca4)() from /dev/null:0
|
---|
370 | in $=CODE(0x182528)() from lib/Config.pm:2
|
---|
371 | Package lib/Exporter.pm.
|
---|
372 | out $=CODE(0x182528)() from lib/Config.pm:0
|
---|
373 | scalar context return from CODE(0x182528): undef
|
---|
374 | Package lib/Config.pm.
|
---|
375 | in $=Config::TIEHASH('Config') from lib/Config.pm:628
|
---|
376 | out $=Config::TIEHASH('Config') from lib/Config.pm:628
|
---|
377 | scalar context return from Config::TIEHASH: empty hash
|
---|
378 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
379 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
|
---|
380 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
|
---|
381 | scalar context return from Exporter::export: ''
|
---|
382 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
---|
383 | scalar context return from Exporter::import: ''
|
---|
384 |
|
---|
385 | =back
|
---|
386 |
|
---|
387 | In all cases shown above, the line indentation shows the call tree.
|
---|
388 | If bit 2 of C<frame> is set, a line is printed on exit from a
|
---|
389 | subroutine as well. If bit 4 is set, the arguments are printed
|
---|
390 | along with the caller info. If bit 8 is set, the arguments are
|
---|
391 | printed even if they are tied or references. If bit 16 is set, the
|
---|
392 | return value is printed, too.
|
---|
393 |
|
---|
394 | When a package is compiled, a line like this
|
---|
395 |
|
---|
396 | Package lib/Carp.pm.
|
---|
397 |
|
---|
398 | is printed with proper indentation.
|
---|
399 |
|
---|
400 | =head1 Debugging regular expressions
|
---|
401 |
|
---|
402 | There are two ways to enable debugging output for regular expressions.
|
---|
403 |
|
---|
404 | If your perl is compiled with C<-DDEBUGGING>, you may use the
|
---|
405 | B<-Dr> flag on the command line.
|
---|
406 |
|
---|
407 | Otherwise, one can C<use re 'debug'>, which has effects at
|
---|
408 | compile time and run time. It is not lexically scoped.
|
---|
409 |
|
---|
410 | =head2 Compile-time output
|
---|
411 |
|
---|
412 | The debugging output at compile time looks like this:
|
---|
413 |
|
---|
414 | Compiling REx `[bc]d(ef*g)+h[ij]k$'
|
---|
415 | size 45 Got 364 bytes for offset annotations.
|
---|
416 | first at 1
|
---|
417 | rarest char g at 0
|
---|
418 | rarest char d at 0
|
---|
419 | 1: ANYOF[bc](12)
|
---|
420 | 12: EXACT <d>(14)
|
---|
421 | 14: CURLYX[0] {1,32767}(28)
|
---|
422 | 16: OPEN1(18)
|
---|
423 | 18: EXACT <e>(20)
|
---|
424 | 20: STAR(23)
|
---|
425 | 21: EXACT <f>(0)
|
---|
426 | 23: EXACT <g>(25)
|
---|
427 | 25: CLOSE1(27)
|
---|
428 | 27: WHILEM[1/1](0)
|
---|
429 | 28: NOTHING(29)
|
---|
430 | 29: EXACT <h>(31)
|
---|
431 | 31: ANYOF[ij](42)
|
---|
432 | 42: EXACT <k>(44)
|
---|
433 | 44: EOL(45)
|
---|
434 | 45: END(0)
|
---|
435 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
|
---|
436 | stclass `ANYOF[bc]' minlen 7
|
---|
437 | Offsets: [45]
|
---|
438 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
|
---|
439 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
|
---|
440 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
|
---|
441 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
|
---|
442 | Omitting $` $& $' support.
|
---|
443 |
|
---|
444 | The first line shows the pre-compiled form of the regex. The second
|
---|
445 | shows the size of the compiled form (in arbitrary units, usually
|
---|
446 | 4-byte words) and the total number of bytes allocated for the
|
---|
447 | offset/length table, usually 4+C<size>*8. The next line shows the
|
---|
448 | label I<id> of the first node that does a match.
|
---|
449 |
|
---|
450 | The
|
---|
451 |
|
---|
452 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
|
---|
453 | stclass `ANYOF[bc]' minlen 7
|
---|
454 |
|
---|
455 | line (split into two lines above) contains optimizer
|
---|
456 | information. In the example shown, the optimizer found that the match
|
---|
457 | should contain a substring C<de> at offset 1, plus substring C<gh>
|
---|
458 | at some offset between 3 and infinity. Moreover, when checking for
|
---|
459 | these substrings (to abandon impossible matches quickly), Perl will check
|
---|
460 | for the substring C<gh> before checking for the substring C<de>. The
|
---|
461 | optimizer may also use the knowledge that the match starts (at the
|
---|
462 | C<first> I<id>) with a character class, and no string
|
---|
463 | shorter than 7 characters can possibly match.
|
---|
464 |
|
---|
465 | The fields of interest which may appear in this line are
|
---|
466 |
|
---|
467 | =over 4
|
---|
468 |
|
---|
469 | =item C<anchored> I<STRING> C<at> I<POS>
|
---|
470 |
|
---|
471 | =item C<floating> I<STRING> C<at> I<POS1..POS2>
|
---|
472 |
|
---|
473 | See above.
|
---|
474 |
|
---|
475 | =item C<matching floating/anchored>
|
---|
476 |
|
---|
477 | Which substring to check first.
|
---|
478 |
|
---|
479 | =item C<minlen>
|
---|
480 |
|
---|
481 | The minimal length of the match.
|
---|
482 |
|
---|
483 | =item C<stclass> I<TYPE>
|
---|
484 |
|
---|
485 | Type of first matching node.
|
---|
486 |
|
---|
487 | =item C<noscan>
|
---|
488 |
|
---|
489 | Don't scan for the found substrings.
|
---|
490 |
|
---|
491 | =item C<isall>
|
---|
492 |
|
---|
493 | Means that the optimizer information is all that the regular
|
---|
494 | expression contains, and thus one does not need to enter the regex engine at
|
---|
495 | all.
|
---|
496 |
|
---|
497 | =item C<GPOS>
|
---|
498 |
|
---|
499 | Set if the pattern contains C<\G>.
|
---|
500 |
|
---|
501 | =item C<plus>
|
---|
502 |
|
---|
503 | Set if the pattern starts with a repeated char (as in C<x+y>).
|
---|
504 |
|
---|
505 | =item C<implicit>
|
---|
506 |
|
---|
507 | Set if the pattern starts with C<.*>.
|
---|
508 |
|
---|
509 | =item C<with eval>
|
---|
510 |
|
---|
511 | Set if the pattern contain eval-groups, such as C<(?{ code })> and
|
---|
512 | C<(??{ code })>.
|
---|
513 |
|
---|
514 | =item C<anchored(TYPE)>
|
---|
515 |
|
---|
516 | If the pattern may match only at a handful of places, (with C<TYPE>
|
---|
517 | being C<BOL>, C<MBOL>, or C<GPOS>. See the table below.
|
---|
518 |
|
---|
519 | =back
|
---|
520 |
|
---|
521 | If a substring is known to match at end-of-line only, it may be
|
---|
522 | followed by C<$>, as in C<floating `k'$>.
|
---|
523 |
|
---|
524 | The optimizer-specific information is used to avoid entering (a slow) regex
|
---|
525 | engine on strings that will not definitely match. If the C<isall> flag
|
---|
526 | is set, a call to the regex engine may be avoided even when the optimizer
|
---|
527 | found an appropriate place for the match.
|
---|
528 |
|
---|
529 | Above the optimizer section is the list of I<nodes> of the compiled
|
---|
530 | form of the regex. Each line has format
|
---|
531 |
|
---|
532 | C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
|
---|
533 |
|
---|
534 | =head2 Types of nodes
|
---|
535 |
|
---|
536 | Here are the possible types, with short descriptions:
|
---|
537 |
|
---|
538 | # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
|
---|
539 |
|
---|
540 | # Exit points
|
---|
541 | END no End of program.
|
---|
542 | SUCCEED no Return from a subroutine, basically.
|
---|
543 |
|
---|
544 | # Anchors:
|
---|
545 | BOL no Match "" at beginning of line.
|
---|
546 | MBOL no Same, assuming multiline.
|
---|
547 | SBOL no Same, assuming singleline.
|
---|
548 | EOS no Match "" at end of string.
|
---|
549 | EOL no Match "" at end of line.
|
---|
550 | MEOL no Same, assuming multiline.
|
---|
551 | SEOL no Same, assuming singleline.
|
---|
552 | BOUND no Match "" at any word boundary
|
---|
553 | BOUNDL no Match "" at any word boundary
|
---|
554 | NBOUND no Match "" at any word non-boundary
|
---|
555 | NBOUNDL no Match "" at any word non-boundary
|
---|
556 | GPOS no Matches where last m//g left off.
|
---|
557 |
|
---|
558 | # [Special] alternatives
|
---|
559 | ANY no Match any one character (except newline).
|
---|
560 | SANY no Match any one character.
|
---|
561 | ANYOF sv Match character in (or not in) this class.
|
---|
562 | ALNUM no Match any alphanumeric character
|
---|
563 | ALNUML no Match any alphanumeric char in locale
|
---|
564 | NALNUM no Match any non-alphanumeric character
|
---|
565 | NALNUML no Match any non-alphanumeric char in locale
|
---|
566 | SPACE no Match any whitespace character
|
---|
567 | SPACEL no Match any whitespace char in locale
|
---|
568 | NSPACE no Match any non-whitespace character
|
---|
569 | NSPACEL no Match any non-whitespace char in locale
|
---|
570 | DIGIT no Match any numeric character
|
---|
571 | NDIGIT no Match any non-numeric character
|
---|
572 |
|
---|
573 | # BRANCH The set of branches constituting a single choice are hooked
|
---|
574 | # together with their "next" pointers, since precedence prevents
|
---|
575 | # anything being concatenated to any individual branch. The
|
---|
576 | # "next" pointer of the last BRANCH in a choice points to the
|
---|
577 | # thing following the whole choice. This is also where the
|
---|
578 | # final "next" pointer of each individual branch points; each
|
---|
579 | # branch starts with the operand node of a BRANCH node.
|
---|
580 | #
|
---|
581 | BRANCH node Match this alternative, or the next...
|
---|
582 |
|
---|
583 | # BACK Normal "next" pointers all implicitly point forward; BACK
|
---|
584 | # exists to make loop structures possible.
|
---|
585 | # not used
|
---|
586 | BACK no Match "", "next" ptr points backward.
|
---|
587 |
|
---|
588 | # Literals
|
---|
589 | EXACT sv Match this string (preceded by length).
|
---|
590 | EXACTF sv Match this string, folded (prec. by length).
|
---|
591 | EXACTFL sv Match this string, folded in locale (w/len).
|
---|
592 |
|
---|
593 | # Do nothing
|
---|
594 | NOTHING no Match empty string.
|
---|
595 | # A variant of above which delimits a group, thus stops optimizations
|
---|
596 | TAIL no Match empty string. Can jump here from outside.
|
---|
597 |
|
---|
598 | # STAR,PLUS '?', and complex '*' and '+', are implemented as circular
|
---|
599 | # BRANCH structures using BACK. Simple cases (one character
|
---|
600 | # per match) are implemented with STAR and PLUS for speed
|
---|
601 | # and to minimize recursive plunges.
|
---|
602 | #
|
---|
603 | STAR node Match this (simple) thing 0 or more times.
|
---|
604 | PLUS node Match this (simple) thing 1 or more times.
|
---|
605 |
|
---|
606 | CURLY sv 2 Match this simple thing {n,m} times.
|
---|
607 | CURLYN no 2 Match next-after-this simple thing
|
---|
608 | # {n,m} times, set parens.
|
---|
609 | CURLYM no 2 Match this medium-complex thing {n,m} times.
|
---|
610 | CURLYX sv 2 Match this complex thing {n,m} times.
|
---|
611 |
|
---|
612 | # This terminator creates a loop structure for CURLYX
|
---|
613 | WHILEM no Do curly processing and see if rest matches.
|
---|
614 |
|
---|
615 | # OPEN,CLOSE,GROUPP ...are numbered at compile time.
|
---|
616 | OPEN num 1 Mark this point in input as start of #n.
|
---|
617 | CLOSE num 1 Analogous to OPEN.
|
---|
618 |
|
---|
619 | REF num 1 Match some already matched string
|
---|
620 | REFF num 1 Match already matched string, folded
|
---|
621 | REFFL num 1 Match already matched string, folded in loc.
|
---|
622 |
|
---|
623 | # grouping assertions
|
---|
624 | IFMATCH off 1 2 Succeeds if the following matches.
|
---|
625 | UNLESSM off 1 2 Fails if the following matches.
|
---|
626 | SUSPEND off 1 1 "Independent" sub-regex.
|
---|
627 | IFTHEN off 1 1 Switch, should be preceded by switcher .
|
---|
628 | GROUPP num 1 Whether the group matched.
|
---|
629 |
|
---|
630 | # Support for long regex
|
---|
631 | LONGJMP off 1 1 Jump far away.
|
---|
632 | BRANCHJ off 1 1 BRANCH with long offset.
|
---|
633 |
|
---|
634 | # The heavy worker
|
---|
635 | EVAL evl 1 Execute some Perl code.
|
---|
636 |
|
---|
637 | # Modifiers
|
---|
638 | MINMOD no Next operator is not greedy.
|
---|
639 | LOGICAL no Next opcode should set the flag only.
|
---|
640 |
|
---|
641 | # This is not used yet
|
---|
642 | RENUM off 1 1 Group with independently numbered parens.
|
---|
643 |
|
---|
644 | # This is not really a node, but an optimized away piece of a "long" node.
|
---|
645 | # To simplify debugging output, we mark it as if it were a node
|
---|
646 | OPTIMIZED off Placeholder for dump.
|
---|
647 |
|
---|
648 | =for unprinted-credits
|
---|
649 | Next section M-J. Dominus ([email protected]) 20010421
|
---|
650 |
|
---|
651 | Following the optimizer information is a dump of the offset/length
|
---|
652 | table, here split across several lines:
|
---|
653 |
|
---|
654 | Offsets: [45]
|
---|
655 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
|
---|
656 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
|
---|
657 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
|
---|
658 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
|
---|
659 |
|
---|
660 | The first line here indicates that the offset/length table contains 45
|
---|
661 | entries. Each entry is a pair of integers, denoted by C<offset[length]>.
|
---|
662 | Entries are numbered starting with 1, so entry #1 here is C<1[4]> and
|
---|
663 | entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:>
|
---|
664 | (the C<1: ANYOF[bc]>) begins at character position 1 in the
|
---|
665 | pre-compiled form of the regex, and has a length of 4 characters.
|
---|
666 | C<5[1]> in position 12
|
---|
667 | indicates that the node labeled C<12:>
|
---|
668 | (the C<< 12: EXACT <d> >>) begins at character position 5 in the
|
---|
669 | pre-compiled form of the regex, and has a length of 1 character.
|
---|
670 | C<12[1]> in position 14
|
---|
671 | indicates that the node labeled C<14:>
|
---|
672 | (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the
|
---|
673 | pre-compiled form of the regex, and has a length of 1 character---that
|
---|
674 | is, it corresponds to the C<+> symbol in the precompiled regex.
|
---|
675 |
|
---|
676 | C<0[0]> items indicate that there is no corresponding node.
|
---|
677 |
|
---|
678 | =head2 Run-time output
|
---|
679 |
|
---|
680 | First of all, when doing a match, one may get no run-time output even
|
---|
681 | if debugging is enabled. This means that the regex engine was never
|
---|
682 | entered and that all of the job was therefore done by the optimizer.
|
---|
683 |
|
---|
684 | If the regex engine was entered, the output may look like this:
|
---|
685 |
|
---|
686 | Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
|
---|
687 | Setting an EVAL scope, savestack=3
|
---|
688 | 2 <ab> <cdefg__gh_> | 1: ANYOF
|
---|
689 | 3 <abc> <defg__gh_> | 11: EXACT <d>
|
---|
690 | 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
|
---|
691 | 4 <abcd> <efg__gh_> | 26: WHILEM
|
---|
692 | 0 out of 1..32767 cc=effff31c
|
---|
693 | 4 <abcd> <efg__gh_> | 15: OPEN1
|
---|
694 | 4 <abcd> <efg__gh_> | 17: EXACT <e>
|
---|
695 | 5 <abcde> <fg__gh_> | 19: STAR
|
---|
696 | EXACT <f> can match 1 times out of 32767...
|
---|
697 | Setting an EVAL scope, savestack=3
|
---|
698 | 6 <bcdef> <g__gh__> | 22: EXACT <g>
|
---|
699 | 7 <bcdefg> <__gh__> | 24: CLOSE1
|
---|
700 | 7 <bcdefg> <__gh__> | 26: WHILEM
|
---|
701 | 1 out of 1..32767 cc=effff31c
|
---|
702 | Setting an EVAL scope, savestack=12
|
---|
703 | 7 <bcdefg> <__gh__> | 15: OPEN1
|
---|
704 | 7 <bcdefg> <__gh__> | 17: EXACT <e>
|
---|
705 | restoring \1 to 4(4)..7
|
---|
706 | failed, try continuation...
|
---|
707 | 7 <bcdefg> <__gh__> | 27: NOTHING
|
---|
708 | 7 <bcdefg> <__gh__> | 28: EXACT <h>
|
---|
709 | failed...
|
---|
710 | failed...
|
---|
711 |
|
---|
712 | The most significant information in the output is about the particular I<node>
|
---|
713 | of the compiled regex that is currently being tested against the target string.
|
---|
714 | The format of these lines is
|
---|
715 |
|
---|
716 | C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE>
|
---|
717 |
|
---|
718 | The I<TYPE> info is indented with respect to the backtracking level.
|
---|
719 | Other incidental information appears interspersed within.
|
---|
720 |
|
---|
721 | =head1 Debugging Perl memory usage
|
---|
722 |
|
---|
723 | Perl is a profligate wastrel when it comes to memory use. There
|
---|
724 | is a saying that to estimate memory usage of Perl, assume a reasonable
|
---|
725 | algorithm for memory allocation, multiply that estimate by 10, and
|
---|
726 | while you still may miss the mark, at least you won't be quite so
|
---|
727 | astonished. This is not absolutely true, but may provide a good
|
---|
728 | grasp of what happens.
|
---|
729 |
|
---|
730 | Assume that an integer cannot take less than 20 bytes of memory, a
|
---|
731 | float cannot take less than 24 bytes, a string cannot take less
|
---|
732 | than 32 bytes (all these examples assume 32-bit architectures, the
|
---|
733 | result are quite a bit worse on 64-bit architectures). If a variable
|
---|
734 | is accessed in two of three different ways (which require an integer,
|
---|
735 | a float, or a string), the memory footprint may increase yet another
|
---|
736 | 20 bytes. A sloppy malloc(3) implementation can inflate these
|
---|
737 | numbers dramatically.
|
---|
738 |
|
---|
739 | On the opposite end of the scale, a declaration like
|
---|
740 |
|
---|
741 | sub foo;
|
---|
742 |
|
---|
743 | may take up to 500 bytes of memory, depending on which release of Perl
|
---|
744 | you're running.
|
---|
745 |
|
---|
746 | Anecdotal estimates of source-to-compiled code bloat suggest an
|
---|
747 | eightfold increase. This means that the compiled form of reasonable
|
---|
748 | (normally commented, properly indented etc.) code will take
|
---|
749 | about eight times more space in memory than the code took
|
---|
750 | on disk.
|
---|
751 |
|
---|
752 | The B<-DL> command-line switch is obsolete since circa Perl 5.6.0
|
---|
753 | (it was available only if Perl was built with C<-DDEBUGGING>).
|
---|
754 | The switch was used to track Perl's memory allocations and possible
|
---|
755 | memory leaks. These days the use of malloc debugging tools like
|
---|
756 | F<Purify> or F<valgrind> is suggested instead.
|
---|
757 |
|
---|
758 | One way to find out how much memory is being used by Perl data
|
---|
759 | structures is to install the Devel::Size module from CPAN: it gives
|
---|
760 | you the minimum number of bytes required to store a particular data
|
---|
761 | structure. Please be mindful of the difference between the size()
|
---|
762 | and total_size().
|
---|
763 |
|
---|
764 | If Perl has been compiled using Perl's malloc you can analyze Perl
|
---|
765 | memory usage by setting the $ENV{PERL_DEBUG_MSTATS}.
|
---|
766 |
|
---|
767 | =head2 Using C<$ENV{PERL_DEBUG_MSTATS}>
|
---|
768 |
|
---|
769 | If your perl is using Perl's malloc() and was compiled with the
|
---|
770 | necessary switches (this is the default), then it will print memory
|
---|
771 | usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS}
|
---|
772 | > 1 >>, and before termination of the program when C<<
|
---|
773 | $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to
|
---|
774 | the following example:
|
---|
775 |
|
---|
776 | $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
|
---|
777 | Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
|
---|
778 | 14216 free: 130 117 28 7 9 0 2 2 1 0 0
|
---|
779 | 437 61 36 0 5
|
---|
780 | 60924 used: 125 137 161 55 7 8 6 16 2 0 1
|
---|
781 | 74 109 304 84 20
|
---|
782 | Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
|
---|
783 | Memory allocation statistics after execution: (buckets 4(4)..8188(8192)
|
---|
784 | 30888 free: 245 78 85 13 6 2 1 3 2 0 1
|
---|
785 | 315 162 39 42 11
|
---|
786 | 175816 used: 265 176 1112 111 26 22 11 27 2 1 1
|
---|
787 | 196 178 1066 798 39
|
---|
788 | Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
|
---|
789 |
|
---|
790 | It is possible to ask for such a statistic at arbitrary points in
|
---|
791 | your execution using the mstat() function out of the standard
|
---|
792 | Devel::Peek module.
|
---|
793 |
|
---|
794 | Here is some explanation of that format:
|
---|
795 |
|
---|
796 | =over 4
|
---|
797 |
|
---|
798 | =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)>
|
---|
799 |
|
---|
800 | Perl's malloc() uses bucketed allocations. Every request is rounded
|
---|
801 | up to the closest bucket size available, and a bucket is taken from
|
---|
802 | the pool of buckets of that size.
|
---|
803 |
|
---|
804 | The line above describes the limits of buckets currently in use.
|
---|
805 | Each bucket has two sizes: memory footprint and the maximal size
|
---|
806 | of user data that can fit into this bucket. Suppose in the above
|
---|
807 | example that the smallest bucket were size 4. The biggest bucket
|
---|
808 | would have usable size 8188, and the memory footprint would be 8192.
|
---|
809 |
|
---|
810 | In a Perl built for debugging, some buckets may have negative usable
|
---|
811 | size. This means that these buckets cannot (and will not) be used.
|
---|
812 | For larger buckets, the memory footprint may be one page greater
|
---|
813 | than a power of 2. If so, case the corresponding power of two is
|
---|
814 | printed in the C<APPROX> field above.
|
---|
815 |
|
---|
816 | =item Free/Used
|
---|
817 |
|
---|
818 | The 1 or 2 rows of numbers following that correspond to the number
|
---|
819 | of buckets of each size between C<SMALLEST> and C<GREATEST>. In
|
---|
820 | the first row, the sizes (memory footprints) of buckets are powers
|
---|
821 | of two--or possibly one page greater. In the second row, if present,
|
---|
822 | the memory footprints of the buckets are between the memory footprints
|
---|
823 | of two buckets "above".
|
---|
824 |
|
---|
825 | For example, suppose under the previous example, the memory footprints
|
---|
826 | were
|
---|
827 |
|
---|
828 | free: 8 16 32 64 128 256 512 1024 2048 4096 8192
|
---|
829 | 4 12 24 48 80
|
---|
830 |
|
---|
831 | With non-C<DEBUGGING> perl, the buckets starting from C<128> have
|
---|
832 | a 4-byte overhead, and thus an 8192-long bucket may take up to
|
---|
833 | 8188-byte allocations.
|
---|
834 |
|
---|
835 | =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS>
|
---|
836 |
|
---|
837 | The first two fields give the total amount of memory perl sbrk(2)ed
|
---|
838 | (ess-broken? :-) and number of sbrk(2)s used. The third number is
|
---|
839 | what perl thinks about continuity of returned chunks. So long as
|
---|
840 | this number is positive, malloc() will assume that it is probable
|
---|
841 | that sbrk(2) will provide continuous memory.
|
---|
842 |
|
---|
843 | Memory allocated by external libraries is not counted.
|
---|
844 |
|
---|
845 | =item C<pad: 0>
|
---|
846 |
|
---|
847 | The amount of sbrk(2)ed memory needed to keep buckets aligned.
|
---|
848 |
|
---|
849 | =item C<heads: 2192>
|
---|
850 |
|
---|
851 | Although memory overhead of bigger buckets is kept inside the bucket, for
|
---|
852 | smaller buckets, it is kept in separate areas. This field gives the
|
---|
853 | total size of these areas.
|
---|
854 |
|
---|
855 | =item C<chain: 0>
|
---|
856 |
|
---|
857 | malloc() may want to subdivide a bigger bucket into smaller buckets.
|
---|
858 | If only a part of the deceased bucket is left unsubdivided, the rest
|
---|
859 | is kept as an element of a linked list. This field gives the total
|
---|
860 | size of these chunks.
|
---|
861 |
|
---|
862 | =item C<tail: 6144>
|
---|
863 |
|
---|
864 | To minimize the number of sbrk(2)s, malloc() asks for more memory. This
|
---|
865 | field gives the size of the yet unused part, which is sbrk(2)ed, but
|
---|
866 | never touched.
|
---|
867 |
|
---|
868 | =back
|
---|
869 |
|
---|
870 | =head1 SEE ALSO
|
---|
871 |
|
---|
872 | L<perldebug>,
|
---|
873 | L<perlguts>,
|
---|
874 | L<perlrun>
|
---|
875 | L<re>,
|
---|
876 | and
|
---|
877 | L<Devel::DProf>.
|
---|