[14489] | 1 | =head1 NAME
|
---|
| 2 |
|
---|
| 3 | perlhack - How to hack at the Perl internals
|
---|
| 4 |
|
---|
| 5 | =head1 DESCRIPTION
|
---|
| 6 |
|
---|
| 7 | This document attempts to explain how Perl development takes place,
|
---|
| 8 | and ends with some suggestions for people wanting to become bona fide
|
---|
| 9 | porters.
|
---|
| 10 |
|
---|
| 11 | The perl5-porters mailing list is where the Perl standard distribution
|
---|
| 12 | is maintained and developed. The list can get anywhere from 10 to 150
|
---|
| 13 | messages a day, depending on the heatedness of the debate. Most days
|
---|
| 14 | there are two or three patches, extensions, features, or bugs being
|
---|
| 15 | discussed at a time.
|
---|
| 16 |
|
---|
| 17 | A searchable archive of the list is at either:
|
---|
| 18 |
|
---|
| 19 | http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
|
---|
| 20 |
|
---|
| 21 | or
|
---|
| 22 |
|
---|
| 23 | http://archive.develooper.com/[email protected]/
|
---|
| 24 |
|
---|
| 25 | List subscribers (the porters themselves) come in several flavours.
|
---|
| 26 | Some are quiet curious lurkers, who rarely pitch in and instead watch
|
---|
| 27 | the ongoing development to ensure they're forewarned of new changes or
|
---|
| 28 | features in Perl. Some are representatives of vendors, who are there
|
---|
| 29 | to make sure that Perl continues to compile and work on their
|
---|
| 30 | platforms. Some patch any reported bug that they know how to fix,
|
---|
| 31 | some are actively patching their pet area (threads, Win32, the regexp
|
---|
| 32 | engine), while others seem to do nothing but complain. In other
|
---|
| 33 | words, it's your usual mix of technical people.
|
---|
| 34 |
|
---|
| 35 | Over this group of porters presides Larry Wall. He has the final word
|
---|
| 36 | in what does and does not change in the Perl language. Various
|
---|
| 37 | releases of Perl are shepherded by a "pumpking", a porter
|
---|
| 38 | responsible for gathering patches, deciding on a patch-by-patch,
|
---|
| 39 | feature-by-feature basis what will and will not go into the release.
|
---|
| 40 | For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
|
---|
| 41 | Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and
|
---|
| 42 | Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release.
|
---|
| 43 |
|
---|
| 44 | In addition, various people are pumpkings for different things. For
|
---|
| 45 | instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
|
---|
| 46 | I<Configure> pumpkin up till the 5.8 release. For the 5.10 release
|
---|
| 47 | H.Merijn Brand took over.
|
---|
| 48 |
|
---|
| 49 | Larry sees Perl development along the lines of the US government:
|
---|
| 50 | there's the Legislature (the porters), the Executive branch (the
|
---|
| 51 | pumpkings), and the Supreme Court (Larry). The legislature can
|
---|
| 52 | discuss and submit patches to the executive branch all they like, but
|
---|
| 53 | the executive branch is free to veto them. Rarely, the Supreme Court
|
---|
| 54 | will side with the executive branch over the legislature, or the
|
---|
| 55 | legislature over the executive branch. Mostly, however, the
|
---|
| 56 | legislature and the executive branch are supposed to get along and
|
---|
| 57 | work out their differences without impeachment or court cases.
|
---|
| 58 |
|
---|
| 59 | You might sometimes see reference to Rule 1 and Rule 2. Larry's power
|
---|
| 60 | as Supreme Court is expressed in The Rules:
|
---|
| 61 |
|
---|
| 62 | =over 4
|
---|
| 63 |
|
---|
| 64 | =item 1
|
---|
| 65 |
|
---|
| 66 | Larry is always by definition right about how Perl should behave.
|
---|
| 67 | This means he has final veto power on the core functionality.
|
---|
| 68 |
|
---|
| 69 | =item 2
|
---|
| 70 |
|
---|
| 71 | Larry is allowed to change his mind about any matter at a later date,
|
---|
| 72 | regardless of whether he previously invoked Rule 1.
|
---|
| 73 |
|
---|
| 74 | =back
|
---|
| 75 |
|
---|
| 76 | Got that? Larry is always right, even when he was wrong. It's rare
|
---|
| 77 | to see either Rule exercised, but they are often alluded to.
|
---|
| 78 |
|
---|
| 79 | New features and extensions to the language are contentious, because
|
---|
| 80 | the criteria used by the pumpkings, Larry, and other porters to decide
|
---|
| 81 | which features should be implemented and incorporated are not codified
|
---|
| 82 | in a few small design goals as with some other languages. Instead,
|
---|
| 83 | the heuristics are flexible and often difficult to fathom. Here is
|
---|
| 84 | one person's list, roughly in decreasing order of importance, of
|
---|
| 85 | heuristics that new features have to be weighed against:
|
---|
| 86 |
|
---|
| 87 | =over 4
|
---|
| 88 |
|
---|
| 89 | =item Does concept match the general goals of Perl?
|
---|
| 90 |
|
---|
| 91 | These haven't been written anywhere in stone, but one approximation
|
---|
| 92 | is:
|
---|
| 93 |
|
---|
| 94 | 1. Keep it fast, simple, and useful.
|
---|
| 95 | 2. Keep features/concepts as orthogonal as possible.
|
---|
| 96 | 3. No arbitrary limits (platforms, data sizes, cultures).
|
---|
| 97 | 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
|
---|
| 98 | 5. Either assimilate new technologies, or build bridges to them.
|
---|
| 99 |
|
---|
| 100 | =item Where is the implementation?
|
---|
| 101 |
|
---|
| 102 | All the talk in the world is useless without an implementation. In
|
---|
| 103 | almost every case, the person or people who argue for a new feature
|
---|
| 104 | will be expected to be the ones who implement it. Porters capable
|
---|
| 105 | of coding new features have their own agendas, and are not available
|
---|
| 106 | to implement your (possibly good) idea.
|
---|
| 107 |
|
---|
| 108 | =item Backwards compatibility
|
---|
| 109 |
|
---|
| 110 | It's a cardinal sin to break existing Perl programs. New warnings are
|
---|
| 111 | contentious--some say that a program that emits warnings is not
|
---|
| 112 | broken, while others say it is. Adding keywords has the potential to
|
---|
| 113 | break programs, changing the meaning of existing token sequences or
|
---|
| 114 | functions might break programs.
|
---|
| 115 |
|
---|
| 116 | =item Could it be a module instead?
|
---|
| 117 |
|
---|
| 118 | Perl 5 has extension mechanisms, modules and XS, specifically to avoid
|
---|
| 119 | the need to keep changing the Perl interpreter. You can write modules
|
---|
| 120 | that export functions, you can give those functions prototypes so they
|
---|
| 121 | can be called like built-in functions, you can even write XS code to
|
---|
| 122 | mess with the runtime data structures of the Perl interpreter if you
|
---|
| 123 | want to implement really complicated things. If it can be done in a
|
---|
| 124 | module instead of in the core, it's highly unlikely to be added.
|
---|
| 125 |
|
---|
| 126 | =item Is the feature generic enough?
|
---|
| 127 |
|
---|
| 128 | Is this something that only the submitter wants added to the language,
|
---|
| 129 | or would it be broadly useful? Sometimes, instead of adding a feature
|
---|
| 130 | with a tight focus, the porters might decide to wait until someone
|
---|
| 131 | implements the more generalized feature. For instance, instead of
|
---|
| 132 | implementing a "delayed evaluation" feature, the porters are waiting
|
---|
| 133 | for a macro system that would permit delayed evaluation and much more.
|
---|
| 134 |
|
---|
| 135 | =item Does it potentially introduce new bugs?
|
---|
| 136 |
|
---|
| 137 | Radical rewrites of large chunks of the Perl interpreter have the
|
---|
| 138 | potential to introduce new bugs. The smaller and more localized the
|
---|
| 139 | change, the better.
|
---|
| 140 |
|
---|
| 141 | =item Does it preclude other desirable features?
|
---|
| 142 |
|
---|
| 143 | A patch is likely to be rejected if it closes off future avenues of
|
---|
| 144 | development. For instance, a patch that placed a true and final
|
---|
| 145 | interpretation on prototypes is likely to be rejected because there
|
---|
| 146 | are still options for the future of prototypes that haven't been
|
---|
| 147 | addressed.
|
---|
| 148 |
|
---|
| 149 | =item Is the implementation robust?
|
---|
| 150 |
|
---|
| 151 | Good patches (tight code, complete, correct) stand more chance of
|
---|
| 152 | going in. Sloppy or incorrect patches might be placed on the back
|
---|
| 153 | burner until the pumpking has time to fix, or might be discarded
|
---|
| 154 | altogether without further notice.
|
---|
| 155 |
|
---|
| 156 | =item Is the implementation generic enough to be portable?
|
---|
| 157 |
|
---|
| 158 | The worst patches make use of a system-specific features. It's highly
|
---|
| 159 | unlikely that nonportable additions to the Perl language will be
|
---|
| 160 | accepted.
|
---|
| 161 |
|
---|
| 162 | =item Is the implementation tested?
|
---|
| 163 |
|
---|
| 164 | Patches which change behaviour (fixing bugs or introducing new features)
|
---|
| 165 | must include regression tests to verify that everything works as expected.
|
---|
| 166 | Without tests provided by the original author, how can anyone else changing
|
---|
| 167 | perl in the future be sure that they haven't unwittingly broken the behaviour
|
---|
| 168 | the patch implements? And without tests, how can the patch's author be
|
---|
| 169 | confident that his/her hard work put into the patch won't be accidentally
|
---|
| 170 | thrown away by someone in the future?
|
---|
| 171 |
|
---|
| 172 | =item Is there enough documentation?
|
---|
| 173 |
|
---|
| 174 | Patches without documentation are probably ill-thought out or
|
---|
| 175 | incomplete. Nothing can be added without documentation, so submitting
|
---|
| 176 | a patch for the appropriate manpages as well as the source code is
|
---|
| 177 | always a good idea.
|
---|
| 178 |
|
---|
| 179 | =item Is there another way to do it?
|
---|
| 180 |
|
---|
| 181 | Larry said "Although the Perl Slogan is I<There's More Than One Way
|
---|
| 182 | to Do It>, I hesitate to make 10 ways to do something". This is a
|
---|
| 183 | tricky heuristic to navigate, though--one man's essential addition is
|
---|
| 184 | another man's pointless cruft.
|
---|
| 185 |
|
---|
| 186 | =item Does it create too much work?
|
---|
| 187 |
|
---|
| 188 | Work for the pumpking, work for Perl programmers, work for module
|
---|
| 189 | authors, ... Perl is supposed to be easy.
|
---|
| 190 |
|
---|
| 191 | =item Patches speak louder than words
|
---|
| 192 |
|
---|
| 193 | Working code is always preferred to pie-in-the-sky ideas. A patch to
|
---|
| 194 | add a feature stands a much higher chance of making it to the language
|
---|
| 195 | than does a random feature request, no matter how fervently argued the
|
---|
| 196 | request might be. This ties into "Will it be useful?", as the fact
|
---|
| 197 | that someone took the time to make the patch demonstrates a strong
|
---|
| 198 | desire for the feature.
|
---|
| 199 |
|
---|
| 200 | =back
|
---|
| 201 |
|
---|
| 202 | If you're on the list, you might hear the word "core" bandied
|
---|
| 203 | around. It refers to the standard distribution. "Hacking on the
|
---|
| 204 | core" means you're changing the C source code to the Perl
|
---|
| 205 | interpreter. "A core module" is one that ships with Perl.
|
---|
| 206 |
|
---|
| 207 | =head2 Keeping in sync
|
---|
| 208 |
|
---|
| 209 | The source code to the Perl interpreter, in its different versions, is
|
---|
| 210 | kept in a repository managed by a revision control system ( which is
|
---|
| 211 | currently the Perforce program, see http://perforce.com/ ). The
|
---|
| 212 | pumpkings and a few others have access to the repository to check in
|
---|
| 213 | changes. Periodically the pumpking for the development version of Perl
|
---|
| 214 | will release a new version, so the rest of the porters can see what's
|
---|
| 215 | changed. The current state of the main trunk of repository, and patches
|
---|
| 216 | that describe the individual changes that have happened since the last
|
---|
| 217 | public release are available at this location:
|
---|
| 218 |
|
---|
| 219 | http://public.activestate.com/pub/apc/
|
---|
| 220 | ftp://public.activestate.com/pub/apc/
|
---|
| 221 |
|
---|
| 222 | If you're looking for a particular change, or a change that affected
|
---|
| 223 | a particular set of files, you may find the B<Perl Repository Browser>
|
---|
| 224 | useful:
|
---|
| 225 |
|
---|
| 226 | http://public.activestate.com/cgi-bin/perlbrowse
|
---|
| 227 |
|
---|
| 228 | You may also want to subscribe to the perl5-changes mailing list to
|
---|
| 229 | receive a copy of each patch that gets submitted to the maintenance
|
---|
| 230 | and development "branches" of the perl repository. See
|
---|
| 231 | http://lists.perl.org/ for subscription information.
|
---|
| 232 |
|
---|
| 233 | If you are a member of the perl5-porters mailing list, it is a good
|
---|
| 234 | thing to keep in touch with the most recent changes. If not only to
|
---|
| 235 | verify if what you would have posted as a bug report isn't already
|
---|
| 236 | solved in the most recent available perl development branch, also
|
---|
| 237 | known as perl-current, bleading edge perl, bleedperl or bleadperl.
|
---|
| 238 |
|
---|
| 239 | Needless to say, the source code in perl-current is usually in a perpetual
|
---|
| 240 | state of evolution. You should expect it to be very buggy. Do B<not> use
|
---|
| 241 | it for any purpose other than testing and development.
|
---|
| 242 |
|
---|
| 243 | Keeping in sync with the most recent branch can be done in several ways,
|
---|
| 244 | but the most convenient and reliable way is using B<rsync>, available at
|
---|
| 245 | ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent
|
---|
| 246 | branch by FTP.)
|
---|
| 247 |
|
---|
| 248 | If you choose to keep in sync using rsync, there are two approaches
|
---|
| 249 | to doing so:
|
---|
| 250 |
|
---|
| 251 | =over 4
|
---|
| 252 |
|
---|
| 253 | =item rsync'ing the source tree
|
---|
| 254 |
|
---|
| 255 | Presuming you are in the directory where your perl source resides
|
---|
| 256 | and you have rsync installed and available, you can "upgrade" to
|
---|
| 257 | the bleadperl using:
|
---|
| 258 |
|
---|
| 259 | # rsync -avz rsync://public.activestate.com/perl-current/ .
|
---|
| 260 |
|
---|
| 261 | This takes care of updating every single item in the source tree to
|
---|
| 262 | the latest applied patch level, creating files that are new (to your
|
---|
| 263 | distribution) and setting date/time stamps of existing files to
|
---|
| 264 | reflect the bleadperl status.
|
---|
| 265 |
|
---|
| 266 | Note that this will not delete any files that were in '.' before
|
---|
| 267 | the rsync. Once you are sure that the rsync is running correctly,
|
---|
| 268 | run it with the --delete and the --dry-run options like this:
|
---|
| 269 |
|
---|
| 270 | # rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ .
|
---|
| 271 |
|
---|
| 272 | This will I<simulate> an rsync run that also deletes files not
|
---|
| 273 | present in the bleadperl master copy. Observe the results from
|
---|
| 274 | this run closely. If you are sure that the actual run would delete
|
---|
| 275 | no files precious to you, you could remove the '--dry-run' option.
|
---|
| 276 |
|
---|
| 277 | You can than check what patch was the latest that was applied by
|
---|
| 278 | looking in the file B<.patch>, which will show the number of the
|
---|
| 279 | latest patch.
|
---|
| 280 |
|
---|
| 281 | If you have more than one machine to keep in sync, and not all of
|
---|
| 282 | them have access to the WAN (so you are not able to rsync all the
|
---|
| 283 | source trees to the real source), there are some ways to get around
|
---|
| 284 | this problem.
|
---|
| 285 |
|
---|
| 286 | =over 4
|
---|
| 287 |
|
---|
| 288 | =item Using rsync over the LAN
|
---|
| 289 |
|
---|
| 290 | Set up a local rsync server which makes the rsynced source tree
|
---|
| 291 | available to the LAN and sync the other machines against this
|
---|
| 292 | directory.
|
---|
| 293 |
|
---|
| 294 | From http://rsync.samba.org/README.html :
|
---|
| 295 |
|
---|
| 296 | "Rsync uses rsh or ssh for communication. It does not need to be
|
---|
| 297 | setuid and requires no special privileges for installation. It
|
---|
| 298 | does not require an inetd entry or a daemon. You must, however,
|
---|
| 299 | have a working rsh or ssh system. Using ssh is recommended for
|
---|
| 300 | its security features."
|
---|
| 301 |
|
---|
| 302 | =item Using pushing over the NFS
|
---|
| 303 |
|
---|
| 304 | Having the other systems mounted over the NFS, you can take an
|
---|
| 305 | active pushing approach by checking the just updated tree against
|
---|
| 306 | the other not-yet synced trees. An example would be
|
---|
| 307 |
|
---|
| 308 | #!/usr/bin/perl -w
|
---|
| 309 |
|
---|
| 310 | use strict;
|
---|
| 311 | use File::Copy;
|
---|
| 312 |
|
---|
| 313 | my %MF = map {
|
---|
| 314 | m/(\S+)/;
|
---|
| 315 | $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
|
---|
| 316 | } `cat MANIFEST`;
|
---|
| 317 |
|
---|
| 318 | my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
|
---|
| 319 |
|
---|
| 320 | foreach my $host (keys %remote) {
|
---|
| 321 | unless (-d $remote{$host}) {
|
---|
| 322 | print STDERR "Cannot Xsync for host $host\n";
|
---|
| 323 | next;
|
---|
| 324 | }
|
---|
| 325 | foreach my $file (keys %MF) {
|
---|
| 326 | my $rfile = "$remote{$host}/$file";
|
---|
| 327 | my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
|
---|
| 328 | defined $size or ($mode, $size, $mtime) = (0, 0, 0);
|
---|
| 329 | $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
|
---|
| 330 | printf "%4s %-34s %8d %9d %8d %9d\n",
|
---|
| 331 | $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
|
---|
| 332 | unlink $rfile;
|
---|
| 333 | copy ($file, $rfile);
|
---|
| 334 | utime time, $MF{$file}[2], $rfile;
|
---|
| 335 | chmod $MF{$file}[0], $rfile;
|
---|
| 336 | }
|
---|
| 337 | }
|
---|
| 338 |
|
---|
| 339 | though this is not perfect. It could be improved with checking
|
---|
| 340 | file checksums before updating. Not all NFS systems support
|
---|
| 341 | reliable utime support (when used over the NFS).
|
---|
| 342 |
|
---|
| 343 | =back
|
---|
| 344 |
|
---|
| 345 | =item rsync'ing the patches
|
---|
| 346 |
|
---|
| 347 | The source tree is maintained by the pumpking who applies patches to
|
---|
| 348 | the files in the tree. These patches are either created by the
|
---|
| 349 | pumpking himself using C<diff -c> after updating the file manually or
|
---|
| 350 | by applying patches sent in by posters on the perl5-porters list.
|
---|
| 351 | These patches are also saved and rsync'able, so you can apply them
|
---|
| 352 | yourself to the source files.
|
---|
| 353 |
|
---|
| 354 | Presuming you are in a directory where your patches reside, you can
|
---|
| 355 | get them in sync with
|
---|
| 356 |
|
---|
| 357 | # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
|
---|
| 358 |
|
---|
| 359 | This makes sure the latest available patch is downloaded to your
|
---|
| 360 | patch directory.
|
---|
| 361 |
|
---|
| 362 | It's then up to you to apply these patches, using something like
|
---|
| 363 |
|
---|
| 364 | # last=`ls -t *.gz | sed q`
|
---|
| 365 | # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
|
---|
| 366 | # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
|
---|
| 367 | # cd ../perl-current
|
---|
| 368 | # patch -p1 -N <../perl-current-diffs/blead.patch
|
---|
| 369 |
|
---|
| 370 | or, since this is only a hint towards how it works, use CPAN-patchaperl
|
---|
| 371 | from Andreas König to have better control over the patching process.
|
---|
| 372 |
|
---|
| 373 | =back
|
---|
| 374 |
|
---|
| 375 | =head2 Why rsync the source tree
|
---|
| 376 |
|
---|
| 377 | =over 4
|
---|
| 378 |
|
---|
| 379 | =item It's easier to rsync the source tree
|
---|
| 380 |
|
---|
| 381 | Since you don't have to apply the patches yourself, you are sure all
|
---|
| 382 | files in the source tree are in the right state.
|
---|
| 383 |
|
---|
| 384 | =item It's more reliable
|
---|
| 385 |
|
---|
| 386 | While both the rsync-able source and patch areas are automatically
|
---|
| 387 | updated every few minutes, keep in mind that applying patches may
|
---|
| 388 | sometimes mean careful hand-holding, especially if your version of
|
---|
| 389 | the C<patch> program does not understand how to deal with new files,
|
---|
| 390 | files with 8-bit characters, or files without trailing newlines.
|
---|
| 391 |
|
---|
| 392 | =back
|
---|
| 393 |
|
---|
| 394 | =head2 Why rsync the patches
|
---|
| 395 |
|
---|
| 396 | =over 4
|
---|
| 397 |
|
---|
| 398 | =item It's easier to rsync the patches
|
---|
| 399 |
|
---|
| 400 | If you have more than one machine that you want to keep in track with
|
---|
| 401 | bleadperl, it's easier to rsync the patches only once and then apply
|
---|
| 402 | them to all the source trees on the different machines.
|
---|
| 403 |
|
---|
| 404 | In case you try to keep in pace on 5 different machines, for which
|
---|
| 405 | only one of them has access to the WAN, rsync'ing all the source
|
---|
| 406 | trees should than be done 5 times over the NFS. Having
|
---|
| 407 | rsync'ed the patches only once, I can apply them to all the source
|
---|
| 408 | trees automatically. Need you say more ;-)
|
---|
| 409 |
|
---|
| 410 | =item It's a good reference
|
---|
| 411 |
|
---|
| 412 | If you do not only like to have the most recent development branch,
|
---|
| 413 | but also like to B<fix> bugs, or extend features, you want to dive
|
---|
| 414 | into the sources. If you are a seasoned perl core diver, you don't
|
---|
| 415 | need no manuals, tips, roadmaps, perlguts.pod or other aids to find
|
---|
| 416 | your way around. But if you are a starter, the patches may help you
|
---|
| 417 | in finding where you should start and how to change the bits that
|
---|
| 418 | bug you.
|
---|
| 419 |
|
---|
| 420 | The file B<Changes> is updated on occasions the pumpking sees as his
|
---|
| 421 | own little sync points. On those occasions, he releases a tar-ball of
|
---|
| 422 | the current source tree (i.e. [email protected]), which will be an
|
---|
| 423 | excellent point to start with when choosing to use the 'rsync the
|
---|
| 424 | patches' scheme. Starting with perl@7582, which means a set of source
|
---|
| 425 | files on which the latest applied patch is number 7582, you apply all
|
---|
| 426 | succeeding patches available from then on (7583, 7584, ...).
|
---|
| 427 |
|
---|
| 428 | You can use the patches later as a kind of search archive.
|
---|
| 429 |
|
---|
| 430 | =over 4
|
---|
| 431 |
|
---|
| 432 | =item Finding a start point
|
---|
| 433 |
|
---|
| 434 | If you want to fix/change the behaviour of function/feature Foo, just
|
---|
| 435 | scan the patches for patches that mention Foo either in the subject,
|
---|
| 436 | the comments, or the body of the fix. A good chance the patch shows
|
---|
| 437 | you the files that are affected by that patch which are very likely
|
---|
| 438 | to be the starting point of your journey into the guts of perl.
|
---|
| 439 |
|
---|
| 440 | =item Finding how to fix a bug
|
---|
| 441 |
|
---|
| 442 | If you've found I<where> the function/feature Foo misbehaves, but you
|
---|
| 443 | don't know how to fix it (but you do know the change you want to
|
---|
| 444 | make), you can, again, peruse the patches for similar changes and
|
---|
| 445 | look how others apply the fix.
|
---|
| 446 |
|
---|
| 447 | =item Finding the source of misbehaviour
|
---|
| 448 |
|
---|
| 449 | When you keep in sync with bleadperl, the pumpking would love to
|
---|
| 450 | I<see> that the community efforts really work. So after each of his
|
---|
| 451 | sync points, you are to 'make test' to check if everything is still
|
---|
| 452 | in working order. If it is, you do 'make ok', which will send an OK
|
---|
| 453 | report to [email protected]. (If you do not have access to a mailer
|
---|
| 454 | from the system you just finished successfully 'make test', you can
|
---|
| 455 | do 'make okfile', which creates the file C<perl.ok>, which you can
|
---|
| 456 | than take to your favourite mailer and mail yourself).
|
---|
| 457 |
|
---|
| 458 | But of course, as always, things will not always lead to a success
|
---|
| 459 | path, and one or more test do not pass the 'make test'. Before
|
---|
| 460 | sending in a bug report (using 'make nok' or 'make nokfile'), check
|
---|
| 461 | the mailing list if someone else has reported the bug already and if
|
---|
| 462 | so, confirm it by replying to that message. If not, you might want to
|
---|
| 463 | trace the source of that misbehaviour B<before> sending in the bug,
|
---|
| 464 | which will help all the other porters in finding the solution.
|
---|
| 465 |
|
---|
| 466 | Here the saved patches come in very handy. You can check the list of
|
---|
| 467 | patches to see which patch changed what file and what change caused
|
---|
| 468 | the misbehaviour. If you note that in the bug report, it saves the
|
---|
| 469 | one trying to solve it, looking for that point.
|
---|
| 470 |
|
---|
| 471 | =back
|
---|
| 472 |
|
---|
| 473 | If searching the patches is too bothersome, you might consider using
|
---|
| 474 | perl's bugtron to find more information about discussions and
|
---|
| 475 | ramblings on posted bugs.
|
---|
| 476 |
|
---|
| 477 | If you want to get the best of both worlds, rsync both the source
|
---|
| 478 | tree for convenience, reliability and ease and rsync the patches
|
---|
| 479 | for reference.
|
---|
| 480 |
|
---|
| 481 | =back
|
---|
| 482 |
|
---|
| 483 | =head2 Working with the source
|
---|
| 484 |
|
---|
| 485 | Because you cannot use the Perforce client, you cannot easily generate
|
---|
| 486 | diffs against the repository, nor will merges occur when you update
|
---|
| 487 | via rsync. If you edit a file locally and then rsync against the
|
---|
| 488 | latest source, changes made in the remote copy will I<overwrite> your
|
---|
| 489 | local versions!
|
---|
| 490 |
|
---|
| 491 | The best way to deal with this is to maintain a tree of symlinks to
|
---|
| 492 | the rsync'd source. Then, when you want to edit a file, you remove
|
---|
| 493 | the symlink, copy the real file into the other tree, and edit it. You
|
---|
| 494 | can then diff your edited file against the original to generate a
|
---|
| 495 | patch, and you can safely update the original tree.
|
---|
| 496 |
|
---|
| 497 | Perl's F<Configure> script can generate this tree of symlinks for you.
|
---|
| 498 | The following example assumes that you have used rsync to pull a copy
|
---|
| 499 | of the Perl source into the F<perl-rsync> directory. In the directory
|
---|
| 500 | above that one, you can execute the following commands:
|
---|
| 501 |
|
---|
| 502 | mkdir perl-dev
|
---|
| 503 | cd perl-dev
|
---|
| 504 | ../perl-rsync/Configure -Dmksymlinks -Dusedevel -D"optimize=-g"
|
---|
| 505 |
|
---|
| 506 | This will start the Perl configuration process. After a few prompts,
|
---|
| 507 | you should see something like this:
|
---|
| 508 |
|
---|
| 509 | Symbolic links are supported.
|
---|
| 510 |
|
---|
| 511 | Checking how to test for symbolic links...
|
---|
| 512 | Your builtin 'test -h' may be broken.
|
---|
| 513 | Trying external '/usr/bin/test -h'.
|
---|
| 514 | You can test for symbolic links with '/usr/bin/test -h'.
|
---|
| 515 |
|
---|
| 516 | Creating the symbolic links...
|
---|
| 517 | (First creating the subdirectories...)
|
---|
| 518 | (Then creating the symlinks...)
|
---|
| 519 |
|
---|
| 520 | The specifics may vary based on your operating system, of course.
|
---|
| 521 | After you see this, you can abort the F<Configure> script, and you
|
---|
| 522 | will see that the directory you are in has a tree of symlinks to the
|
---|
| 523 | F<perl-rsync> directories and files.
|
---|
| 524 |
|
---|
| 525 | If you plan to do a lot of work with the Perl source, here are some
|
---|
| 526 | Bourne shell script functions that can make your life easier:
|
---|
| 527 |
|
---|
| 528 | function edit {
|
---|
| 529 | if [ -L $1 ]; then
|
---|
| 530 | mv $1 $1.orig
|
---|
| 531 | cp $1.orig $1
|
---|
| 532 | vi $1
|
---|
| 533 | else
|
---|
| 534 | /bin/vi $1
|
---|
| 535 | fi
|
---|
| 536 | }
|
---|
| 537 |
|
---|
| 538 | function unedit {
|
---|
| 539 | if [ -L $1.orig ]; then
|
---|
| 540 | rm $1
|
---|
| 541 | mv $1.orig $1
|
---|
| 542 | fi
|
---|
| 543 | }
|
---|
| 544 |
|
---|
| 545 | Replace "vi" with your favorite flavor of editor.
|
---|
| 546 |
|
---|
| 547 | Here is another function which will quickly generate a patch for the
|
---|
| 548 | files which have been edited in your symlink tree:
|
---|
| 549 |
|
---|
| 550 | mkpatchorig() {
|
---|
| 551 | local diffopts
|
---|
| 552 | for f in `find . -name '*.orig' | sed s,^\./,,`
|
---|
| 553 | do
|
---|
| 554 | case `echo $f | sed 's,.orig$,,;s,.*\.,,'` in
|
---|
| 555 | c) diffopts=-p ;;
|
---|
| 556 | pod) diffopts='-F^=' ;;
|
---|
| 557 | *) diffopts= ;;
|
---|
| 558 | esac
|
---|
| 559 | diff -du $diffopts $f `echo $f | sed 's,.orig$,,'`
|
---|
| 560 | done
|
---|
| 561 | }
|
---|
| 562 |
|
---|
| 563 | This function produces patches which include enough context to make
|
---|
| 564 | your changes obvious. This makes it easier for the Perl pumpking(s)
|
---|
| 565 | to review them when you send them to the perl5-porters list, and that
|
---|
| 566 | means they're more likely to get applied.
|
---|
| 567 |
|
---|
| 568 | This function assumed a GNU diff, and may require some tweaking for
|
---|
| 569 | other diff variants.
|
---|
| 570 |
|
---|
| 571 | =head2 Perlbug administration
|
---|
| 572 |
|
---|
| 573 | There is a single remote administrative interface for modifying bug status,
|
---|
| 574 | category, open issues etc. using the B<RT> I<bugtracker> system, maintained
|
---|
| 575 | by I<Robert Spier>. Become an administrator, and close any bugs you can get
|
---|
| 576 | your sticky mitts on:
|
---|
| 577 |
|
---|
| 578 | http://rt.perl.org
|
---|
| 579 |
|
---|
| 580 | The bugtracker mechanism for B<perl5> bugs in particular is at:
|
---|
| 581 |
|
---|
| 582 | http://bugs6.perl.org/perlbug
|
---|
| 583 |
|
---|
| 584 | To email the bug system administrators:
|
---|
| 585 |
|
---|
| 586 | "perlbug-admin" <[email protected]>
|
---|
| 587 |
|
---|
| 588 |
|
---|
| 589 | =head2 Submitting patches
|
---|
| 590 |
|
---|
| 591 | Always submit patches to I<[email protected]>. If you're
|
---|
| 592 | patching a core module and there's an author listed, send the author a
|
---|
| 593 | copy (see L<Patching a core module>). This lets other porters review
|
---|
| 594 | your patch, which catches a surprising number of errors in patches.
|
---|
| 595 | Either use the diff program (available in source code form from
|
---|
| 596 | ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch>
|
---|
| 597 | (available from I<CPAN/authors/id/JV/>). Unified diffs are preferred,
|
---|
| 598 | but context diffs are accepted. Do not send RCS-style diffs or diffs
|
---|
| 599 | without context lines. More information is given in the
|
---|
| 600 | I<Porting/patching.pod> file in the Perl source distribution. Please
|
---|
| 601 | patch against the latest B<development> version (e.g., if you're
|
---|
| 602 | fixing a bug in the 5.005 track, patch against the latest 5.005_5x
|
---|
| 603 | version). Only patches that survive the heat of the development
|
---|
| 604 | branch get applied to maintenance versions.
|
---|
| 605 |
|
---|
| 606 | Your patch should update the documentation and test suite. See
|
---|
| 607 | L<Writing a test>.
|
---|
| 608 |
|
---|
| 609 | To report a bug in Perl, use the program I<perlbug> which comes with
|
---|
| 610 | Perl (if you can't get Perl to work, send mail to the address
|
---|
| 611 | I<[email protected]> or I<[email protected]>). Reporting bugs through
|
---|
| 612 | I<perlbug> feeds into the automated bug-tracking system, access to
|
---|
| 613 | which is provided through the web at http://bugs.perl.org/ . It
|
---|
| 614 | often pays to check the archives of the perl5-porters mailing list to
|
---|
| 615 | see whether the bug you're reporting has been reported before, and if
|
---|
| 616 | so whether it was considered a bug. See above for the location of
|
---|
| 617 | the searchable archives.
|
---|
| 618 |
|
---|
| 619 | The CPAN testers ( http://testers.cpan.org/ ) are a group of
|
---|
| 620 | volunteers who test CPAN modules on a variety of platforms. Perl
|
---|
| 621 | Smokers ( http://archives.develooper.com/[email protected]/ )
|
---|
| 622 | automatically tests Perl source releases on platforms with various
|
---|
| 623 | configurations. Both efforts welcome volunteers.
|
---|
| 624 |
|
---|
| 625 | It's a good idea to read and lurk for a while before chipping in.
|
---|
| 626 | That way you'll get to see the dynamic of the conversations, learn the
|
---|
| 627 | personalities of the players, and hopefully be better prepared to make
|
---|
| 628 | a useful contribution when do you speak up.
|
---|
| 629 |
|
---|
| 630 | If after all this you still think you want to join the perl5-porters
|
---|
| 631 | mailing list, send mail to I<[email protected]>. To
|
---|
| 632 | unsubscribe, send mail to I<[email protected]>.
|
---|
| 633 |
|
---|
| 634 | To hack on the Perl guts, you'll need to read the following things:
|
---|
| 635 |
|
---|
| 636 | =over 3
|
---|
| 637 |
|
---|
| 638 | =item L<perlguts>
|
---|
| 639 |
|
---|
| 640 | This is of paramount importance, since it's the documentation of what
|
---|
| 641 | goes where in the Perl source. Read it over a couple of times and it
|
---|
| 642 | might start to make sense - don't worry if it doesn't yet, because the
|
---|
| 643 | best way to study it is to read it in conjunction with poking at Perl
|
---|
| 644 | source, and we'll do that later on.
|
---|
| 645 |
|
---|
| 646 | You might also want to look at Gisle Aas's illustrated perlguts -
|
---|
| 647 | there's no guarantee that this will be absolutely up-to-date with the
|
---|
| 648 | latest documentation in the Perl core, but the fundamentals will be
|
---|
| 649 | right. ( http://gisle.aas.no/perl/illguts/ )
|
---|
| 650 |
|
---|
| 651 | =item L<perlxstut> and L<perlxs>
|
---|
| 652 |
|
---|
| 653 | A working knowledge of XSUB programming is incredibly useful for core
|
---|
| 654 | hacking; XSUBs use techniques drawn from the PP code, the portion of the
|
---|
| 655 | guts that actually executes a Perl program. It's a lot gentler to learn
|
---|
| 656 | those techniques from simple examples and explanation than from the core
|
---|
| 657 | itself.
|
---|
| 658 |
|
---|
| 659 | =item L<perlapi>
|
---|
| 660 |
|
---|
| 661 | The documentation for the Perl API explains what some of the internal
|
---|
| 662 | functions do, as well as the many macros used in the source.
|
---|
| 663 |
|
---|
| 664 | =item F<Porting/pumpkin.pod>
|
---|
| 665 |
|
---|
| 666 | This is a collection of words of wisdom for a Perl porter; some of it is
|
---|
| 667 | only useful to the pumpkin holder, but most of it applies to anyone
|
---|
| 668 | wanting to go about Perl development.
|
---|
| 669 |
|
---|
| 670 | =item The perl5-porters FAQ
|
---|
| 671 |
|
---|
| 672 | This should be available from http://simon-cozens.org/writings/p5p-faq ;
|
---|
| 673 | alternatively, you can get the FAQ emailed to you by sending mail to
|
---|
| 674 | C<[email protected]>. It contains hints on reading perl5-porters,
|
---|
| 675 | information on how perl5-porters works and how Perl development in general
|
---|
| 676 | works.
|
---|
| 677 |
|
---|
| 678 | =back
|
---|
| 679 |
|
---|
| 680 | =head2 Finding Your Way Around
|
---|
| 681 |
|
---|
| 682 | Perl maintenance can be split into a number of areas, and certain people
|
---|
| 683 | (pumpkins) will have responsibility for each area. These areas sometimes
|
---|
| 684 | correspond to files or directories in the source kit. Among the areas are:
|
---|
| 685 |
|
---|
| 686 | =over 3
|
---|
| 687 |
|
---|
| 688 | =item Core modules
|
---|
| 689 |
|
---|
| 690 | Modules shipped as part of the Perl core live in the F<lib/> and F<ext/>
|
---|
| 691 | subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
|
---|
| 692 | contains the core XS modules.
|
---|
| 693 |
|
---|
| 694 | =item Tests
|
---|
| 695 |
|
---|
| 696 | There are tests for nearly all the modules, built-ins and major bits
|
---|
| 697 | of functionality. Test files all have a .t suffix. Module tests live
|
---|
| 698 | in the F<lib/> and F<ext/> directories next to the module being
|
---|
| 699 | tested. Others live in F<t/>. See L<Writing a test>
|
---|
| 700 |
|
---|
| 701 | =item Documentation
|
---|
| 702 |
|
---|
| 703 | Documentation maintenance includes looking after everything in the
|
---|
| 704 | F<pod/> directory, (as well as contributing new documentation) and
|
---|
| 705 | the documentation to the modules in core.
|
---|
| 706 |
|
---|
| 707 | =item Configure
|
---|
| 708 |
|
---|
| 709 | The configure process is the way we make Perl portable across the
|
---|
| 710 | myriad of operating systems it supports. Responsibility for the
|
---|
| 711 | configure, build and installation process, as well as the overall
|
---|
| 712 | portability of the core code rests with the configure pumpkin - others
|
---|
| 713 | help out with individual operating systems.
|
---|
| 714 |
|
---|
| 715 | The files involved are the operating system directories, (F<win32/>,
|
---|
| 716 | F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
|
---|
| 717 | and F<Makefile>, as well as the metaconfig files which generate
|
---|
| 718 | F<Configure>. (metaconfig isn't included in the core distribution.)
|
---|
| 719 |
|
---|
| 720 | =item Interpreter
|
---|
| 721 |
|
---|
| 722 | And of course, there's the core of the Perl interpreter itself. Let's
|
---|
| 723 | have a look at that in a little more detail.
|
---|
| 724 |
|
---|
| 725 | =back
|
---|
| 726 |
|
---|
| 727 | Before we leave looking at the layout, though, don't forget that
|
---|
| 728 | F<MANIFEST> contains not only the file names in the Perl distribution,
|
---|
| 729 | but short descriptions of what's in them, too. For an overview of the
|
---|
| 730 | important files, try this:
|
---|
| 731 |
|
---|
| 732 | perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
|
---|
| 733 |
|
---|
| 734 | =head2 Elements of the interpreter
|
---|
| 735 |
|
---|
| 736 | The work of the interpreter has two main stages: compiling the code
|
---|
| 737 | into the internal representation, or bytecode, and then executing it.
|
---|
| 738 | L<perlguts/Compiled code> explains exactly how the compilation stage
|
---|
| 739 | happens.
|
---|
| 740 |
|
---|
| 741 | Here is a short breakdown of perl's operation:
|
---|
| 742 |
|
---|
| 743 | =over 3
|
---|
| 744 |
|
---|
| 745 | =item Startup
|
---|
| 746 |
|
---|
| 747 | The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
|
---|
| 748 | This is very high-level code, enough to fit on a single screen, and it
|
---|
| 749 | resembles the code found in L<perlembed>; most of the real action takes
|
---|
| 750 | place in F<perl.c>
|
---|
| 751 |
|
---|
| 752 | First, F<perlmain.c> allocates some memory and constructs a Perl
|
---|
| 753 | interpreter:
|
---|
| 754 |
|
---|
| 755 | 1 PERL_SYS_INIT3(&argc,&argv,&env);
|
---|
| 756 | 2
|
---|
| 757 | 3 if (!PL_do_undump) {
|
---|
| 758 | 4 my_perl = perl_alloc();
|
---|
| 759 | 5 if (!my_perl)
|
---|
| 760 | 6 exit(1);
|
---|
| 761 | 7 perl_construct(my_perl);
|
---|
| 762 | 8 PL_perl_destruct_level = 0;
|
---|
| 763 | 9 }
|
---|
| 764 |
|
---|
| 765 | Line 1 is a macro, and its definition is dependent on your operating
|
---|
| 766 | system. Line 3 references C<PL_do_undump>, a global variable - all
|
---|
| 767 | global variables in Perl start with C<PL_>. This tells you whether the
|
---|
| 768 | current running program was created with the C<-u> flag to perl and then
|
---|
| 769 | F<undump>, which means it's going to be false in any sane context.
|
---|
| 770 |
|
---|
| 771 | Line 4 calls a function in F<perl.c> to allocate memory for a Perl
|
---|
| 772 | interpreter. It's quite a simple function, and the guts of it looks like
|
---|
| 773 | this:
|
---|
| 774 |
|
---|
| 775 | my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
|
---|
| 776 |
|
---|
| 777 | Here you see an example of Perl's system abstraction, which we'll see
|
---|
| 778 | later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
|
---|
| 779 | own C<malloc> as defined in F<malloc.c> if you selected that option at
|
---|
| 780 | configure time.
|
---|
| 781 |
|
---|
| 782 | Next, in line 7, we construct the interpreter; this sets up all the
|
---|
| 783 | special variables that Perl needs, the stacks, and so on.
|
---|
| 784 |
|
---|
| 785 | Now we pass Perl the command line options, and tell it to go:
|
---|
| 786 |
|
---|
| 787 | exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
|
---|
| 788 | if (!exitstatus) {
|
---|
| 789 | exitstatus = perl_run(my_perl);
|
---|
| 790 | }
|
---|
| 791 |
|
---|
| 792 |
|
---|
| 793 | C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
|
---|
| 794 | in F<perl.c>, which processes the command line options, sets up any
|
---|
| 795 | statically linked XS modules, opens the program and calls C<yyparse> to
|
---|
| 796 | parse it.
|
---|
| 797 |
|
---|
| 798 | =item Parsing
|
---|
| 799 |
|
---|
| 800 | The aim of this stage is to take the Perl source, and turn it into an op
|
---|
| 801 | tree. We'll see what one of those looks like later. Strictly speaking,
|
---|
| 802 | there's three things going on here.
|
---|
| 803 |
|
---|
| 804 | C<yyparse>, the parser, lives in F<perly.c>, although you're better off
|
---|
| 805 | reading the original YACC input in F<perly.y>. (Yes, Virginia, there
|
---|
| 806 | B<is> a YACC grammar for Perl!) The job of the parser is to take your
|
---|
| 807 | code and "understand" it, splitting it into sentences, deciding which
|
---|
| 808 | operands go with which operators and so on.
|
---|
| 809 |
|
---|
| 810 | The parser is nobly assisted by the lexer, which chunks up your input
|
---|
| 811 | into tokens, and decides what type of thing each token is: a variable
|
---|
| 812 | name, an operator, a bareword, a subroutine, a core function, and so on.
|
---|
| 813 | The main point of entry to the lexer is C<yylex>, and that and its
|
---|
| 814 | associated routines can be found in F<toke.c>. Perl isn't much like
|
---|
| 815 | other computer languages; it's highly context sensitive at times, it can
|
---|
| 816 | be tricky to work out what sort of token something is, or where a token
|
---|
| 817 | ends. As such, there's a lot of interplay between the tokeniser and the
|
---|
| 818 | parser, which can get pretty frightening if you're not used to it.
|
---|
| 819 |
|
---|
| 820 | As the parser understands a Perl program, it builds up a tree of
|
---|
| 821 | operations for the interpreter to perform during execution. The routines
|
---|
| 822 | which construct and link together the various operations are to be found
|
---|
| 823 | in F<op.c>, and will be examined later.
|
---|
| 824 |
|
---|
| 825 | =item Optimization
|
---|
| 826 |
|
---|
| 827 | Now the parsing stage is complete, and the finished tree represents
|
---|
| 828 | the operations that the Perl interpreter needs to perform to execute our
|
---|
| 829 | program. Next, Perl does a dry run over the tree looking for
|
---|
| 830 | optimisations: constant expressions such as C<3 + 4> will be computed
|
---|
| 831 | now, and the optimizer will also see if any multiple operations can be
|
---|
| 832 | replaced with a single one. For instance, to fetch the variable C<$foo>,
|
---|
| 833 | instead of grabbing the glob C<*foo> and looking at the scalar
|
---|
| 834 | component, the optimizer fiddles the op tree to use a function which
|
---|
| 835 | directly looks up the scalar in question. The main optimizer is C<peep>
|
---|
| 836 | in F<op.c>, and many ops have their own optimizing functions.
|
---|
| 837 |
|
---|
| 838 | =item Running
|
---|
| 839 |
|
---|
| 840 | Now we're finally ready to go: we have compiled Perl byte code, and all
|
---|
| 841 | that's left to do is run it. The actual execution is done by the
|
---|
| 842 | C<runops_standard> function in F<run.c>; more specifically, it's done by
|
---|
| 843 | these three innocent looking lines:
|
---|
| 844 |
|
---|
| 845 | while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
|
---|
| 846 | PERL_ASYNC_CHECK();
|
---|
| 847 | }
|
---|
| 848 |
|
---|
| 849 | You may be more comfortable with the Perl version of that:
|
---|
| 850 |
|
---|
| 851 | PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
|
---|
| 852 |
|
---|
| 853 | Well, maybe not. Anyway, each op contains a function pointer, which
|
---|
| 854 | stipulates the function which will actually carry out the operation.
|
---|
| 855 | This function will return the next op in the sequence - this allows for
|
---|
| 856 | things like C<if> which choose the next op dynamically at run time.
|
---|
| 857 | The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
|
---|
| 858 | execution if required.
|
---|
| 859 |
|
---|
| 860 | The actual functions called are known as PP code, and they're spread
|
---|
| 861 | between four files: F<pp_hot.c> contains the "hot" code, which is most
|
---|
| 862 | often used and highly optimized, F<pp_sys.c> contains all the
|
---|
| 863 | system-specific functions, F<pp_ctl.c> contains the functions which
|
---|
| 864 | implement control structures (C<if>, C<while> and the like) and F<pp.c>
|
---|
| 865 | contains everything else. These are, if you like, the C code for Perl's
|
---|
| 866 | built-in functions and operators.
|
---|
| 867 |
|
---|
| 868 | Note that each C<pp_> function is expected to return a pointer to the next
|
---|
| 869 | op. Calls to perl subs (and eval blocks) are handled within the same
|
---|
| 870 | runops loop, and do not consume extra space on the C stack. For example,
|
---|
| 871 | C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> block
|
---|
| 872 | struct onto the context stack which contain the address of the op
|
---|
| 873 | following the sub call or eval. They then return the first op of that sub
|
---|
| 874 | or eval block, and so execution continues of that sub or block. Later, a
|
---|
| 875 | C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>,
|
---|
| 876 | retrieves the return op from it, and returns it.
|
---|
| 877 |
|
---|
| 878 | =item Exception handing
|
---|
| 879 |
|
---|
| 880 | Perl's exception handing (i.e. C<die> etc) is built on top of the low-level
|
---|
| 881 | C<setjmp()>/C<longjmp()> C-library functions. These basically provide a
|
---|
| 882 | way to capture the current PC and SP registers and later restore them; i.e.
|
---|
| 883 | a C<longjmp()> continues at the point in code where a previous C<setjmp()>
|
---|
| 884 | was done, with anything further up on the C stack being lost. This is why
|
---|
| 885 | code should always save values using C<SAVE_FOO> rather than in auto
|
---|
| 886 | variables.
|
---|
| 887 |
|
---|
| 888 | The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and
|
---|
| 889 | C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and
|
---|
| 890 | C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while
|
---|
| 891 | C<die> within C<eval> does a C<JMPENV_JUMP(3)>.
|
---|
| 892 |
|
---|
| 893 | At entry points to perl, such as C<perl_parse()>, C<perl_run()> and
|
---|
| 894 | C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops
|
---|
| 895 | loop or whatever, and handle possible exception returns. For a 2 return,
|
---|
| 896 | final cleanup is performed, such as popping stacks and calling C<CHECK> or
|
---|
| 897 | C<END> blocks. Amongst other things, this is how scope cleanup still
|
---|
| 898 | occurs during an C<exit>.
|
---|
| 899 |
|
---|
| 900 | If a C<die> can find a C<CxEVAL> block on the context stack, then the
|
---|
| 901 | stack is popped to that level and the return op in that block is assigned
|
---|
| 902 | to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. This normally
|
---|
| 903 | passes control back to the guard. In the case of C<perl_run> and
|
---|
| 904 | C<call_sv>, a non-null C<PL_restartop> triggers re-entry to the runops
|
---|
| 905 | loop. The is the normal way that C<die> or C<croak> is handled within an
|
---|
| 906 | C<eval>.
|
---|
| 907 |
|
---|
| 908 | Sometimes ops are executed within an inner runops loop, such as tie, sort
|
---|
| 909 | or overload code. In this case, something like
|
---|
| 910 |
|
---|
| 911 | sub FETCH { eval { die } }
|
---|
| 912 |
|
---|
| 913 | would cause a longjmp right back to the guard in C<perl_run>, popping both
|
---|
| 914 | runops loops, which is clearly incorrect. One way to avoid this is for the
|
---|
| 915 | tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in the inner
|
---|
| 916 | runops loop, but for efficiency reasons, perl in fact just sets a flag,
|
---|
| 917 | using C<CATCH_SET(TRUE)>. The C<pp_require>, C<pp_entereval> and
|
---|
| 918 | C<pp_entertry> ops check this flag, and if true, they call C<docatch>,
|
---|
| 919 | which does a C<JMPENV_PUSH> and starts a new runops level to execute the
|
---|
| 920 | code, rather than doing it on the current loop.
|
---|
| 921 |
|
---|
| 922 | As a further optimisation, on exit from the eval block in the C<FETCH>,
|
---|
| 923 | execution of the code following the block is still carried on in the inner
|
---|
| 924 | loop. When an exception is raised, C<docatch> compares the C<JMPENV>
|
---|
| 925 | level of the C<CxEVAL> with C<PL_top_env> and if they differ, just
|
---|
| 926 | re-throws the exception. In this way any inner loops get popped.
|
---|
| 927 |
|
---|
| 928 | Here's an example.
|
---|
| 929 |
|
---|
| 930 | 1: eval { tie @a, 'A' };
|
---|
| 931 | 2: sub A::TIEARRAY {
|
---|
| 932 | 3: eval { die };
|
---|
| 933 | 4: die;
|
---|
| 934 | 5: }
|
---|
| 935 |
|
---|
| 936 | To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> then
|
---|
| 937 | enters a runops loop. This loop executes the eval and tie ops on line 1,
|
---|
| 938 | with the eval pushing a C<CxEVAL> onto the context stack.
|
---|
| 939 |
|
---|
| 940 | The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops loop
|
---|
| 941 | to execute the body of C<TIEARRAY>. When it executes the entertry op on
|
---|
| 942 | line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> which
|
---|
| 943 | does a C<JMPENV_PUSH> and starts a third runops loop, which then executes
|
---|
| 944 | the die op. At this point the C call stack looks like this:
|
---|
| 945 |
|
---|
| 946 | Perl_pp_die
|
---|
| 947 | Perl_runops # third loop
|
---|
| 948 | S_docatch_body
|
---|
| 949 | S_docatch
|
---|
| 950 | Perl_pp_entertry
|
---|
| 951 | Perl_runops # second loop
|
---|
| 952 | S_call_body
|
---|
| 953 | Perl_call_sv
|
---|
| 954 | Perl_pp_tie
|
---|
| 955 | Perl_runops # first loop
|
---|
| 956 | S_run_body
|
---|
| 957 | perl_run
|
---|
| 958 | main
|
---|
| 959 |
|
---|
| 960 | and the context and data stacks, as shown by C<-Dstv>, look like:
|
---|
| 961 |
|
---|
| 962 | STACK 0: MAIN
|
---|
| 963 | CX 0: BLOCK =>
|
---|
| 964 | CX 1: EVAL => AV() PV("A"\0)
|
---|
| 965 | retop=leave
|
---|
| 966 | STACK 1: MAGIC
|
---|
| 967 | CX 0: SUB =>
|
---|
| 968 | retop=(null)
|
---|
| 969 | CX 1: EVAL => *
|
---|
| 970 | retop=nextstate
|
---|
| 971 |
|
---|
| 972 | The die pops the first C<CxEVAL> off the context stack, sets
|
---|
| 973 | C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns to
|
---|
| 974 | the top C<docatch>. This then starts another third-level runops level,
|
---|
| 975 | which executes the nextstate, pushmark and die ops on line 4. At the point
|
---|
| 976 | that the second C<pp_die> is called, the C call stack looks exactly like
|
---|
| 977 | that above, even though we are no longer within an inner eval; this is
|
---|
| 978 | because of the optimization mentioned earlier. However, the context stack
|
---|
| 979 | now looks like this, ie with the top CxEVAL popped:
|
---|
| 980 |
|
---|
| 981 | STACK 0: MAIN
|
---|
| 982 | CX 0: BLOCK =>
|
---|
| 983 | CX 1: EVAL => AV() PV("A"\0)
|
---|
| 984 | retop=leave
|
---|
| 985 | STACK 1: MAGIC
|
---|
| 986 | CX 0: SUB =>
|
---|
| 987 | retop=(null)
|
---|
| 988 |
|
---|
| 989 | The die on line 4 pops the context stack back down to the CxEVAL, leaving
|
---|
| 990 | it as:
|
---|
| 991 |
|
---|
| 992 | STACK 0: MAIN
|
---|
| 993 | CX 0: BLOCK =>
|
---|
| 994 |
|
---|
| 995 | As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a
|
---|
| 996 | C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch:
|
---|
| 997 |
|
---|
| 998 | S_docatch
|
---|
| 999 | Perl_pp_entertry
|
---|
| 1000 | Perl_runops # second loop
|
---|
| 1001 | S_call_body
|
---|
| 1002 | Perl_call_sv
|
---|
| 1003 | Perl_pp_tie
|
---|
| 1004 | Perl_runops # first loop
|
---|
| 1005 | S_run_body
|
---|
| 1006 | perl_run
|
---|
| 1007 | main
|
---|
| 1008 |
|
---|
| 1009 | In this case, because the C<JMPENV> level recorded in the C<CxEVAL>
|
---|
| 1010 | differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)>
|
---|
| 1011 | and the C stack unwinds to:
|
---|
| 1012 |
|
---|
| 1013 | perl_run
|
---|
| 1014 | main
|
---|
| 1015 |
|
---|
| 1016 | Because C<PL_restartop> is non-null, C<run_body> starts a new runops loop
|
---|
| 1017 | and execution continues.
|
---|
| 1018 |
|
---|
| 1019 | =back
|
---|
| 1020 |
|
---|
| 1021 | =head2 Internal Variable Types
|
---|
| 1022 |
|
---|
| 1023 | You should by now have had a look at L<perlguts>, which tells you about
|
---|
| 1024 | Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
|
---|
| 1025 | that now.
|
---|
| 1026 |
|
---|
| 1027 | These variables are used not only to represent Perl-space variables, but
|
---|
| 1028 | also any constants in the code, as well as some structures completely
|
---|
| 1029 | internal to Perl. The symbol table, for instance, is an ordinary Perl
|
---|
| 1030 | hash. Your code is represented by an SV as it's read into the parser;
|
---|
| 1031 | any program files you call are opened via ordinary Perl filehandles, and
|
---|
| 1032 | so on.
|
---|
| 1033 |
|
---|
| 1034 | The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
|
---|
| 1035 | Perl program. Let's see, for instance, how Perl treats the constant
|
---|
| 1036 | C<"hello">.
|
---|
| 1037 |
|
---|
| 1038 | % perl -MDevel::Peek -e 'Dump("hello")'
|
---|
| 1039 | 1 SV = PV(0xa041450) at 0xa04ecbc
|
---|
| 1040 | 2 REFCNT = 1
|
---|
| 1041 | 3 FLAGS = (POK,READONLY,pPOK)
|
---|
| 1042 | 4 PV = 0xa0484e0 "hello"\0
|
---|
| 1043 | 5 CUR = 5
|
---|
| 1044 | 6 LEN = 6
|
---|
| 1045 |
|
---|
| 1046 | Reading C<Devel::Peek> output takes a bit of practise, so let's go
|
---|
| 1047 | through it line by line.
|
---|
| 1048 |
|
---|
| 1049 | Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
|
---|
| 1050 | memory. SVs themselves are very simple structures, but they contain a
|
---|
| 1051 | pointer to a more complex structure. In this case, it's a PV, a
|
---|
| 1052 | structure which holds a string value, at location C<0xa041450>. Line 2
|
---|
| 1053 | is the reference count; there are no other references to this data, so
|
---|
| 1054 | it's 1.
|
---|
| 1055 |
|
---|
| 1056 | Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
|
---|
| 1057 | read-only SV (because it's a constant) and the data is a PV internally.
|
---|
| 1058 | Next we've got the contents of the string, starting at location
|
---|
| 1059 | C<0xa0484e0>.
|
---|
| 1060 |
|
---|
| 1061 | Line 5 gives us the current length of the string - note that this does
|
---|
| 1062 | B<not> include the null terminator. Line 6 is not the length of the
|
---|
| 1063 | string, but the length of the currently allocated buffer; as the string
|
---|
| 1064 | grows, Perl automatically extends the available storage via a routine
|
---|
| 1065 | called C<SvGROW>.
|
---|
| 1066 |
|
---|
| 1067 | You can get at any of these quantities from C very easily; just add
|
---|
| 1068 | C<Sv> to the name of the field shown in the snippet, and you've got a
|
---|
| 1069 | macro which will return the value: C<SvCUR(sv)> returns the current
|
---|
| 1070 | length of the string, C<SvREFCOUNT(sv)> returns the reference count,
|
---|
| 1071 | C<SvPV(sv, len)> returns the string itself with its length, and so on.
|
---|
| 1072 | More macros to manipulate these properties can be found in L<perlguts>.
|
---|
| 1073 |
|
---|
| 1074 | Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c>
|
---|
| 1075 |
|
---|
| 1076 | 1 void
|
---|
| 1077 | 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
|
---|
| 1078 | 3 {
|
---|
| 1079 | 4 STRLEN tlen;
|
---|
| 1080 | 5 char *junk;
|
---|
| 1081 |
|
---|
| 1082 | 6 junk = SvPV_force(sv, tlen);
|
---|
| 1083 | 7 SvGROW(sv, tlen + len + 1);
|
---|
| 1084 | 8 if (ptr == junk)
|
---|
| 1085 | 9 ptr = SvPVX(sv);
|
---|
| 1086 | 10 Move(ptr,SvPVX(sv)+tlen,len,char);
|
---|
| 1087 | 11 SvCUR(sv) += len;
|
---|
| 1088 | 12 *SvEND(sv) = '\0';
|
---|
| 1089 | 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
|
---|
| 1090 | 14 SvTAINT(sv);
|
---|
| 1091 | 15 }
|
---|
| 1092 |
|
---|
| 1093 | This is a function which adds a string, C<ptr>, of length C<len> onto
|
---|
| 1094 | the end of the PV stored in C<sv>. The first thing we do in line 6 is
|
---|
| 1095 | make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
|
---|
| 1096 | macro to force a PV. As a side effect, C<tlen> gets set to the current
|
---|
| 1097 | value of the PV, and the PV itself is returned to C<junk>.
|
---|
| 1098 |
|
---|
| 1099 | In line 7, we make sure that the SV will have enough room to accommodate
|
---|
| 1100 | the old string, the new string and the null terminator. If C<LEN> isn't
|
---|
| 1101 | big enough, C<SvGROW> will reallocate space for us.
|
---|
| 1102 |
|
---|
| 1103 | Now, if C<junk> is the same as the string we're trying to add, we can
|
---|
| 1104 | grab the string directly from the SV; C<SvPVX> is the address of the PV
|
---|
| 1105 | in the SV.
|
---|
| 1106 |
|
---|
| 1107 | Line 10 does the actual catenation: the C<Move> macro moves a chunk of
|
---|
| 1108 | memory around: we move the string C<ptr> to the end of the PV - that's
|
---|
| 1109 | the start of the PV plus its current length. We're moving C<len> bytes
|
---|
| 1110 | of type C<char>. After doing so, we need to tell Perl we've extended the
|
---|
| 1111 | string, by altering C<CUR> to reflect the new length. C<SvEND> is a
|
---|
| 1112 | macro which gives us the end of the string, so that needs to be a
|
---|
| 1113 | C<"\0">.
|
---|
| 1114 |
|
---|
| 1115 | Line 13 manipulates the flags; since we've changed the PV, any IV or NV
|
---|
| 1116 | values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
|
---|
| 1117 | want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware
|
---|
| 1118 | version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
|
---|
| 1119 | and turns on POK. The final C<SvTAINT> is a macro which launders tainted
|
---|
| 1120 | data if taint mode is turned on.
|
---|
| 1121 |
|
---|
| 1122 | AVs and HVs are more complicated, but SVs are by far the most common
|
---|
| 1123 | variable type being thrown around. Having seen something of how we
|
---|
| 1124 | manipulate these, let's go on and look at how the op tree is
|
---|
| 1125 | constructed.
|
---|
| 1126 |
|
---|
| 1127 | =head2 Op Trees
|
---|
| 1128 |
|
---|
| 1129 | First, what is the op tree, anyway? The op tree is the parsed
|
---|
| 1130 | representation of your program, as we saw in our section on parsing, and
|
---|
| 1131 | it's the sequence of operations that Perl goes through to execute your
|
---|
| 1132 | program, as we saw in L</Running>.
|
---|
| 1133 |
|
---|
| 1134 | An op is a fundamental operation that Perl can perform: all the built-in
|
---|
| 1135 | functions and operators are ops, and there are a series of ops which
|
---|
| 1136 | deal with concepts the interpreter needs internally - entering and
|
---|
| 1137 | leaving a block, ending a statement, fetching a variable, and so on.
|
---|
| 1138 |
|
---|
| 1139 | The op tree is connected in two ways: you can imagine that there are two
|
---|
| 1140 | "routes" through it, two orders in which you can traverse the tree.
|
---|
| 1141 | First, parse order reflects how the parser understood the code, and
|
---|
| 1142 | secondly, execution order tells perl what order to perform the
|
---|
| 1143 | operations in.
|
---|
| 1144 |
|
---|
| 1145 | The easiest way to examine the op tree is to stop Perl after it has
|
---|
| 1146 | finished parsing, and get it to dump out the tree. This is exactly what
|
---|
| 1147 | the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
|
---|
| 1148 | and L<B::Debug|B::Debug> do.
|
---|
| 1149 |
|
---|
| 1150 | Let's have a look at how Perl sees C<$a = $b + $c>:
|
---|
| 1151 |
|
---|
| 1152 | % perl -MO=Terse -e '$a=$b+$c'
|
---|
| 1153 | 1 LISTOP (0x8179888) leave
|
---|
| 1154 | 2 OP (0x81798b0) enter
|
---|
| 1155 | 3 COP (0x8179850) nextstate
|
---|
| 1156 | 4 BINOP (0x8179828) sassign
|
---|
| 1157 | 5 BINOP (0x8179800) add [1]
|
---|
| 1158 | 6 UNOP (0x81796e0) null [15]
|
---|
| 1159 | 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
|
---|
| 1160 | 8 UNOP (0x81797e0) null [15]
|
---|
| 1161 | 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
|
---|
| 1162 | 10 UNOP (0x816b4f0) null [15]
|
---|
| 1163 | 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
|
---|
| 1164 |
|
---|
| 1165 | Let's start in the middle, at line 4. This is a BINOP, a binary
|
---|
| 1166 | operator, which is at location C<0x8179828>. The specific operator in
|
---|
| 1167 | question is C<sassign> - scalar assignment - and you can find the code
|
---|
| 1168 | which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
|
---|
| 1169 | binary operator, it has two children: the add operator, providing the
|
---|
| 1170 | result of C<$b+$c>, is uppermost on line 5, and the left hand side is on
|
---|
| 1171 | line 10.
|
---|
| 1172 |
|
---|
| 1173 | Line 10 is the null op: this does exactly nothing. What is that doing
|
---|
| 1174 | there? If you see the null op, it's a sign that something has been
|
---|
| 1175 | optimized away after parsing. As we mentioned in L</Optimization>,
|
---|
| 1176 | the optimization stage sometimes converts two operations into one, for
|
---|
| 1177 | example when fetching a scalar variable. When this happens, instead of
|
---|
| 1178 | rewriting the op tree and cleaning up the dangling pointers, it's easier
|
---|
| 1179 | just to replace the redundant operation with the null op. Originally,
|
---|
| 1180 | the tree would have looked like this:
|
---|
| 1181 |
|
---|
| 1182 | 10 SVOP (0x816b4f0) rv2sv [15]
|
---|
| 1183 | 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
|
---|
| 1184 |
|
---|
| 1185 | That is, fetch the C<a> entry from the main symbol table, and then look
|
---|
| 1186 | at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)
|
---|
| 1187 | happens to do both these things.
|
---|
| 1188 |
|
---|
| 1189 | The right hand side, starting at line 5 is similar to what we've just
|
---|
| 1190 | seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together
|
---|
| 1191 | two C<gvsv>s.
|
---|
| 1192 |
|
---|
| 1193 | Now, what's this about?
|
---|
| 1194 |
|
---|
| 1195 | 1 LISTOP (0x8179888) leave
|
---|
| 1196 | 2 OP (0x81798b0) enter
|
---|
| 1197 | 3 COP (0x8179850) nextstate
|
---|
| 1198 |
|
---|
| 1199 | C<enter> and C<leave> are scoping ops, and their job is to perform any
|
---|
| 1200 | housekeeping every time you enter and leave a block: lexical variables
|
---|
| 1201 | are tidied up, unreferenced variables are destroyed, and so on. Every
|
---|
| 1202 | program will have those first three lines: C<leave> is a list, and its
|
---|
| 1203 | children are all the statements in the block. Statements are delimited
|
---|
| 1204 | by C<nextstate>, so a block is a collection of C<nextstate> ops, with
|
---|
| 1205 | the ops to be performed for each statement being the children of
|
---|
| 1206 | C<nextstate>. C<enter> is a single op which functions as a marker.
|
---|
| 1207 |
|
---|
| 1208 | That's how Perl parsed the program, from top to bottom:
|
---|
| 1209 |
|
---|
| 1210 | Program
|
---|
| 1211 | |
|
---|
| 1212 | Statement
|
---|
| 1213 | |
|
---|
| 1214 | =
|
---|
| 1215 | / \
|
---|
| 1216 | / \
|
---|
| 1217 | $a +
|
---|
| 1218 | / \
|
---|
| 1219 | $b $c
|
---|
| 1220 |
|
---|
| 1221 | However, it's impossible to B<perform> the operations in this order:
|
---|
| 1222 | you have to find the values of C<$b> and C<$c> before you add them
|
---|
| 1223 | together, for instance. So, the other thread that runs through the op
|
---|
| 1224 | tree is the execution order: each op has a field C<op_next> which points
|
---|
| 1225 | to the next op to be run, so following these pointers tells us how perl
|
---|
| 1226 | executes the code. We can traverse the tree in this order using
|
---|
| 1227 | the C<exec> option to C<B::Terse>:
|
---|
| 1228 |
|
---|
| 1229 | % perl -MO=Terse,exec -e '$a=$b+$c'
|
---|
| 1230 | 1 OP (0x8179928) enter
|
---|
| 1231 | 2 COP (0x81798c8) nextstate
|
---|
| 1232 | 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
|
---|
| 1233 | 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
|
---|
| 1234 | 5 BINOP (0x8179878) add [1]
|
---|
| 1235 | 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
|
---|
| 1236 | 7 BINOP (0x81798a0) sassign
|
---|
| 1237 | 8 LISTOP (0x8179900) leave
|
---|
| 1238 |
|
---|
| 1239 | This probably makes more sense for a human: enter a block, start a
|
---|
| 1240 | statement. Get the values of C<$b> and C<$c>, and add them together.
|
---|
| 1241 | Find C<$a>, and assign one to the other. Then leave.
|
---|
| 1242 |
|
---|
| 1243 | The way Perl builds up these op trees in the parsing process can be
|
---|
| 1244 | unravelled by examining F<perly.y>, the YACC grammar. Let's take the
|
---|
| 1245 | piece we need to construct the tree for C<$a = $b + $c>
|
---|
| 1246 |
|
---|
| 1247 | 1 term : term ASSIGNOP term
|
---|
| 1248 | 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
|
---|
| 1249 | 3 | term ADDOP term
|
---|
| 1250 | 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
|
---|
| 1251 |
|
---|
| 1252 | If you're not used to reading BNF grammars, this is how it works: You're
|
---|
| 1253 | fed certain things by the tokeniser, which generally end up in upper
|
---|
| 1254 | case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your
|
---|
| 1255 | code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are
|
---|
| 1256 | "terminal symbols", because you can't get any simpler than them.
|
---|
| 1257 |
|
---|
| 1258 | The grammar, lines one and three of the snippet above, tells you how to
|
---|
| 1259 | build up more complex forms. These complex forms, "non-terminal symbols"
|
---|
| 1260 | are generally placed in lower case. C<term> here is a non-terminal
|
---|
| 1261 | symbol, representing a single expression.
|
---|
| 1262 |
|
---|
| 1263 | The grammar gives you the following rule: you can make the thing on the
|
---|
| 1264 | left of the colon if you see all the things on the right in sequence.
|
---|
| 1265 | This is called a "reduction", and the aim of parsing is to completely
|
---|
| 1266 | reduce the input. There are several different ways you can perform a
|
---|
| 1267 | reduction, separated by vertical bars: so, C<term> followed by C<=>
|
---|
| 1268 | followed by C<term> makes a C<term>, and C<term> followed by C<+>
|
---|
| 1269 | followed by C<term> can also make a C<term>.
|
---|
| 1270 |
|
---|
| 1271 | So, if you see two terms with an C<=> or C<+>, between them, you can
|
---|
| 1272 | turn them into a single expression. When you do this, you execute the
|
---|
| 1273 | code in the block on the next line: if you see C<=>, you'll do the code
|
---|
| 1274 | in line 2. If you see C<+>, you'll do the code in line 4. It's this code
|
---|
| 1275 | which contributes to the op tree.
|
---|
| 1276 |
|
---|
| 1277 | | term ADDOP term
|
---|
| 1278 | { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
|
---|
| 1279 |
|
---|
| 1280 | What this does is creates a new binary op, and feeds it a number of
|
---|
| 1281 | variables. The variables refer to the tokens: C<$1> is the first token in
|
---|
| 1282 | the input, C<$2> the second, and so on - think regular expression
|
---|
| 1283 | backreferences. C<$$> is the op returned from this reduction. So, we
|
---|
| 1284 | call C<newBINOP> to create a new binary operator. The first parameter to
|
---|
| 1285 | C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
|
---|
| 1286 | operator, so we want the type to be C<ADDOP>. We could specify this
|
---|
| 1287 | directly, but it's right there as the second token in the input, so we
|
---|
| 1288 | use C<$2>. The second parameter is the op's flags: 0 means "nothing
|
---|
| 1289 | special". Then the things to add: the left and right hand side of our
|
---|
| 1290 | expression, in scalar context.
|
---|
| 1291 |
|
---|
| 1292 | =head2 Stacks
|
---|
| 1293 |
|
---|
| 1294 | When perl executes something like C<addop>, how does it pass on its
|
---|
| 1295 | results to the next op? The answer is, through the use of stacks. Perl
|
---|
| 1296 | has a number of stacks to store things it's currently working on, and
|
---|
| 1297 | we'll look at the three most important ones here.
|
---|
| 1298 |
|
---|
| 1299 | =over 3
|
---|
| 1300 |
|
---|
| 1301 | =item Argument stack
|
---|
| 1302 |
|
---|
| 1303 | Arguments are passed to PP code and returned from PP code using the
|
---|
| 1304 | argument stack, C<ST>. The typical way to handle arguments is to pop
|
---|
| 1305 | them off the stack, deal with them how you wish, and then push the result
|
---|
| 1306 | back onto the stack. This is how, for instance, the cosine operator
|
---|
| 1307 | works:
|
---|
| 1308 |
|
---|
| 1309 | NV value;
|
---|
| 1310 | value = POPn;
|
---|
| 1311 | value = Perl_cos(value);
|
---|
| 1312 | XPUSHn(value);
|
---|
| 1313 |
|
---|
| 1314 | We'll see a more tricky example of this when we consider Perl's macros
|
---|
| 1315 | below. C<POPn> gives you the NV (floating point value) of the top SV on
|
---|
| 1316 | the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push
|
---|
| 1317 | the result back as an NV. The C<X> in C<XPUSHn> means that the stack
|
---|
| 1318 | should be extended if necessary - it can't be necessary here, because we
|
---|
| 1319 | know there's room for one more item on the stack, since we've just
|
---|
| 1320 | removed one! The C<XPUSH*> macros at least guarantee safety.
|
---|
| 1321 |
|
---|
| 1322 | Alternatively, you can fiddle with the stack directly: C<SP> gives you
|
---|
| 1323 | the first element in your portion of the stack, and C<TOP*> gives you
|
---|
| 1324 | the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
|
---|
| 1325 | negation of an integer:
|
---|
| 1326 |
|
---|
| 1327 | SETi(-TOPi);
|
---|
| 1328 |
|
---|
| 1329 | Just set the integer value of the top stack entry to its negation.
|
---|
| 1330 |
|
---|
| 1331 | Argument stack manipulation in the core is exactly the same as it is in
|
---|
| 1332 | XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
|
---|
| 1333 | description of the macros used in stack manipulation.
|
---|
| 1334 |
|
---|
| 1335 | =item Mark stack
|
---|
| 1336 |
|
---|
| 1337 | I say "your portion of the stack" above because PP code doesn't
|
---|
| 1338 | necessarily get the whole stack to itself: if your function calls
|
---|
| 1339 | another function, you'll only want to expose the arguments aimed for the
|
---|
| 1340 | called function, and not (necessarily) let it get at your own data. The
|
---|
| 1341 | way we do this is to have a "virtual" bottom-of-stack, exposed to each
|
---|
| 1342 | function. The mark stack keeps bookmarks to locations in the argument
|
---|
| 1343 | stack usable by each function. For instance, when dealing with a tied
|
---|
| 1344 | variable, (internally, something with "P" magic) Perl has to call
|
---|
| 1345 | methods for accesses to the tied variables. However, we need to separate
|
---|
| 1346 | the arguments exposed to the method to the argument exposed to the
|
---|
| 1347 | original function - the store or fetch or whatever it may be. Here's how
|
---|
| 1348 | the tied C<push> is implemented; see C<av_push> in F<av.c>:
|
---|
| 1349 |
|
---|
| 1350 | 1 PUSHMARK(SP);
|
---|
| 1351 | 2 EXTEND(SP,2);
|
---|
| 1352 | 3 PUSHs(SvTIED_obj((SV*)av, mg));
|
---|
| 1353 | 4 PUSHs(val);
|
---|
| 1354 | 5 PUTBACK;
|
---|
| 1355 | 6 ENTER;
|
---|
| 1356 | 7 call_method("PUSH", G_SCALAR|G_DISCARD);
|
---|
| 1357 | 8 LEAVE;
|
---|
| 1358 | 9 POPSTACK;
|
---|
| 1359 |
|
---|
| 1360 | The lines which concern the mark stack are the first, fifth and last
|
---|
| 1361 | lines: they save away, restore and remove the current position of the
|
---|
| 1362 | argument stack.
|
---|
| 1363 |
|
---|
| 1364 | Let's examine the whole implementation, for practice:
|
---|
| 1365 |
|
---|
| 1366 | 1 PUSHMARK(SP);
|
---|
| 1367 |
|
---|
| 1368 | Push the current state of the stack pointer onto the mark stack. This is
|
---|
| 1369 | so that when we've finished adding items to the argument stack, Perl
|
---|
| 1370 | knows how many things we've added recently.
|
---|
| 1371 |
|
---|
| 1372 | 2 EXTEND(SP,2);
|
---|
| 1373 | 3 PUSHs(SvTIED_obj((SV*)av, mg));
|
---|
| 1374 | 4 PUSHs(val);
|
---|
| 1375 |
|
---|
| 1376 | We're going to add two more items onto the argument stack: when you have
|
---|
| 1377 | a tied array, the C<PUSH> subroutine receives the object and the value
|
---|
| 1378 | to be pushed, and that's exactly what we have here - the tied object,
|
---|
| 1379 | retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
|
---|
| 1380 |
|
---|
| 1381 | 5 PUTBACK;
|
---|
| 1382 |
|
---|
| 1383 | Next we tell Perl to make the change to the global stack pointer: C<dSP>
|
---|
| 1384 | only gave us a local copy, not a reference to the global.
|
---|
| 1385 |
|
---|
| 1386 | 6 ENTER;
|
---|
| 1387 | 7 call_method("PUSH", G_SCALAR|G_DISCARD);
|
---|
| 1388 | 8 LEAVE;
|
---|
| 1389 |
|
---|
| 1390 | C<ENTER> and C<LEAVE> localise a block of code - they make sure that all
|
---|
| 1391 | variables are tidied up, everything that has been localised gets
|
---|
| 1392 | its previous value returned, and so on. Think of them as the C<{> and
|
---|
| 1393 | C<}> of a Perl block.
|
---|
| 1394 |
|
---|
| 1395 | To actually do the magic method call, we have to call a subroutine in
|
---|
| 1396 | Perl space: C<call_method> takes care of that, and it's described in
|
---|
| 1397 | L<perlcall>. We call the C<PUSH> method in scalar context, and we're
|
---|
| 1398 | going to discard its return value.
|
---|
| 1399 |
|
---|
| 1400 | 9 POPSTACK;
|
---|
| 1401 |
|
---|
| 1402 | Finally, we remove the value we placed on the mark stack, since we
|
---|
| 1403 | don't need it any more.
|
---|
| 1404 |
|
---|
| 1405 | =item Save stack
|
---|
| 1406 |
|
---|
| 1407 | C doesn't have a concept of local scope, so perl provides one. We've
|
---|
| 1408 | seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
|
---|
| 1409 | stack implements the C equivalent of, for example:
|
---|
| 1410 |
|
---|
| 1411 | {
|
---|
| 1412 | local $foo = 42;
|
---|
| 1413 | ...
|
---|
| 1414 | }
|
---|
| 1415 |
|
---|
| 1416 | See L<perlguts/Localising Changes> for how to use the save stack.
|
---|
| 1417 |
|
---|
| 1418 | =back
|
---|
| 1419 |
|
---|
| 1420 | =head2 Millions of Macros
|
---|
| 1421 |
|
---|
| 1422 | One thing you'll notice about the Perl source is that it's full of
|
---|
| 1423 | macros. Some have called the pervasive use of macros the hardest thing
|
---|
| 1424 | to understand, others find it adds to clarity. Let's take an example,
|
---|
| 1425 | the code which implements the addition operator:
|
---|
| 1426 |
|
---|
| 1427 | 1 PP(pp_add)
|
---|
| 1428 | 2 {
|
---|
| 1429 | 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
|
---|
| 1430 | 4 {
|
---|
| 1431 | 5 dPOPTOPnnrl_ul;
|
---|
| 1432 | 6 SETn( left + right );
|
---|
| 1433 | 7 RETURN;
|
---|
| 1434 | 8 }
|
---|
| 1435 | 9 }
|
---|
| 1436 |
|
---|
| 1437 | Every line here (apart from the braces, of course) contains a macro. The
|
---|
| 1438 | first line sets up the function declaration as Perl expects for PP code;
|
---|
| 1439 | line 3 sets up variable declarations for the argument stack and the
|
---|
| 1440 | target, the return value of the operation. Finally, it tries to see if
|
---|
| 1441 | the addition operation is overloaded; if so, the appropriate subroutine
|
---|
| 1442 | is called.
|
---|
| 1443 |
|
---|
| 1444 | Line 5 is another variable declaration - all variable declarations start
|
---|
| 1445 | with C<d> - which pops from the top of the argument stack two NVs (hence
|
---|
| 1446 | C<nn>) and puts them into the variables C<right> and C<left>, hence the
|
---|
| 1447 | C<rl>. These are the two operands to the addition operator. Next, we
|
---|
| 1448 | call C<SETn> to set the NV of the return value to the result of adding
|
---|
| 1449 | the two values. This done, we return - the C<RETURN> macro makes sure
|
---|
| 1450 | that our return value is properly handled, and we pass the next operator
|
---|
| 1451 | to run back to the main run loop.
|
---|
| 1452 |
|
---|
| 1453 | Most of these macros are explained in L<perlapi>, and some of the more
|
---|
| 1454 | important ones are explained in L<perlxs> as well. Pay special attention
|
---|
| 1455 | to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on
|
---|
| 1456 | the C<[pad]THX_?> macros.
|
---|
| 1457 |
|
---|
| 1458 | =head2 The .i Targets
|
---|
| 1459 |
|
---|
| 1460 | You can expand the macros in a F<foo.c> file by saying
|
---|
| 1461 |
|
---|
| 1462 | make foo.i
|
---|
| 1463 |
|
---|
| 1464 | which will expand the macros using cpp. Don't be scared by the results.
|
---|
| 1465 |
|
---|
| 1466 | =head2 Poking at Perl
|
---|
| 1467 |
|
---|
| 1468 | To really poke around with Perl, you'll probably want to build Perl for
|
---|
| 1469 | debugging, like this:
|
---|
| 1470 |
|
---|
| 1471 | ./Configure -d -D optimize=-g
|
---|
| 1472 | make
|
---|
| 1473 |
|
---|
| 1474 | C<-g> is a flag to the C compiler to have it produce debugging
|
---|
| 1475 | information which will allow us to step through a running program.
|
---|
| 1476 | F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
|
---|
| 1477 | enables all the internal debugging code in Perl. There are a whole bunch
|
---|
| 1478 | of things you can debug with this: L<perlrun> lists them all, and the
|
---|
| 1479 | best way to find out about them is to play about with them. The most
|
---|
| 1480 | useful options are probably
|
---|
| 1481 |
|
---|
| 1482 | l Context (loop) stack processing
|
---|
| 1483 | t Trace execution
|
---|
| 1484 | o Method and overloading resolution
|
---|
| 1485 | c String/numeric conversions
|
---|
| 1486 |
|
---|
| 1487 | Some of the functionality of the debugging code can be achieved using XS
|
---|
| 1488 | modules.
|
---|
| 1489 |
|
---|
| 1490 | -Dr => use re 'debug'
|
---|
| 1491 | -Dx => use O 'Debug'
|
---|
| 1492 |
|
---|
| 1493 | =head2 Using a source-level debugger
|
---|
| 1494 |
|
---|
| 1495 | If the debugging output of C<-D> doesn't help you, it's time to step
|
---|
| 1496 | through perl's execution with a source-level debugger.
|
---|
| 1497 |
|
---|
| 1498 | =over 3
|
---|
| 1499 |
|
---|
| 1500 | =item *
|
---|
| 1501 |
|
---|
| 1502 | We'll use C<gdb> for our examples here; the principles will apply to any
|
---|
| 1503 | debugger, but check the manual of the one you're using.
|
---|
| 1504 |
|
---|
| 1505 | =back
|
---|
| 1506 |
|
---|
| 1507 | To fire up the debugger, type
|
---|
| 1508 |
|
---|
| 1509 | gdb ./perl
|
---|
| 1510 |
|
---|
| 1511 | You'll want to do that in your Perl source tree so the debugger can read
|
---|
| 1512 | the source code. You should see the copyright message, followed by the
|
---|
| 1513 | prompt.
|
---|
| 1514 |
|
---|
| 1515 | (gdb)
|
---|
| 1516 |
|
---|
| 1517 | C<help> will get you into the documentation, but here are the most
|
---|
| 1518 | useful commands:
|
---|
| 1519 |
|
---|
| 1520 | =over 3
|
---|
| 1521 |
|
---|
| 1522 | =item run [args]
|
---|
| 1523 |
|
---|
| 1524 | Run the program with the given arguments.
|
---|
| 1525 |
|
---|
| 1526 | =item break function_name
|
---|
| 1527 |
|
---|
| 1528 | =item break source.c:xxx
|
---|
| 1529 |
|
---|
| 1530 | Tells the debugger that we'll want to pause execution when we reach
|
---|
| 1531 | either the named function (but see L<perlguts/Internal Functions>!) or the given
|
---|
| 1532 | line in the named source file.
|
---|
| 1533 |
|
---|
| 1534 | =item step
|
---|
| 1535 |
|
---|
| 1536 | Steps through the program a line at a time.
|
---|
| 1537 |
|
---|
| 1538 | =item next
|
---|
| 1539 |
|
---|
| 1540 | Steps through the program a line at a time, without descending into
|
---|
| 1541 | functions.
|
---|
| 1542 |
|
---|
| 1543 | =item continue
|
---|
| 1544 |
|
---|
| 1545 | Run until the next breakpoint.
|
---|
| 1546 |
|
---|
| 1547 | =item finish
|
---|
| 1548 |
|
---|
| 1549 | Run until the end of the current function, then stop again.
|
---|
| 1550 |
|
---|
| 1551 | =item 'enter'
|
---|
| 1552 |
|
---|
| 1553 | Just pressing Enter will do the most recent operation again - it's a
|
---|
| 1554 | blessing when stepping through miles of source code.
|
---|
| 1555 |
|
---|
| 1556 | =item print
|
---|
| 1557 |
|
---|
| 1558 | Execute the given C code and print its results. B<WARNING>: Perl makes
|
---|
| 1559 | heavy use of macros, and F<gdb> does not necessarily support macros
|
---|
| 1560 | (see later L</"gdb macro support">). You'll have to substitute them
|
---|
| 1561 | yourself, or to invoke cpp on the source code files
|
---|
| 1562 | (see L</"The .i Targets">)
|
---|
| 1563 | So, for instance, you can't say
|
---|
| 1564 |
|
---|
| 1565 | print SvPV_nolen(sv)
|
---|
| 1566 |
|
---|
| 1567 | but you have to say
|
---|
| 1568 |
|
---|
| 1569 | print Perl_sv_2pv_nolen(sv)
|
---|
| 1570 |
|
---|
| 1571 | =back
|
---|
| 1572 |
|
---|
| 1573 | You may find it helpful to have a "macro dictionary", which you can
|
---|
| 1574 | produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
|
---|
| 1575 | recursively apply those macros for you.
|
---|
| 1576 |
|
---|
| 1577 | =head2 gdb macro support
|
---|
| 1578 |
|
---|
| 1579 | Recent versions of F<gdb> have fairly good macro support, but
|
---|
| 1580 | in order to use it you'll need to compile perl with macro definitions
|
---|
| 1581 | included in the debugging information. Using F<gcc> version 3.1, this
|
---|
| 1582 | means configuring with C<-Doptimize=-g3>. Other compilers might use a
|
---|
| 1583 | different switch (if they support debugging macros at all).
|
---|
| 1584 |
|
---|
| 1585 | =head2 Dumping Perl Data Structures
|
---|
| 1586 |
|
---|
| 1587 | One way to get around this macro hell is to use the dumping functions in
|
---|
| 1588 | F<dump.c>; these work a little like an internal
|
---|
| 1589 | L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures
|
---|
| 1590 | that you can't get at from Perl. Let's take an example. We'll use the
|
---|
| 1591 | C<$a = $b + $c> we used before, but give it a bit of context:
|
---|
| 1592 | C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around?
|
---|
| 1593 |
|
---|
| 1594 | What about C<pp_add>, the function we examined earlier to implement the
|
---|
| 1595 | C<+> operator:
|
---|
| 1596 |
|
---|
| 1597 | (gdb) break Perl_pp_add
|
---|
| 1598 | Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
|
---|
| 1599 |
|
---|
| 1600 | Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>.
|
---|
| 1601 | With the breakpoint in place, we can run our program:
|
---|
| 1602 |
|
---|
| 1603 | (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
|
---|
| 1604 |
|
---|
| 1605 | Lots of junk will go past as gdb reads in the relevant source files and
|
---|
| 1606 | libraries, and then:
|
---|
| 1607 |
|
---|
| 1608 | Breakpoint 1, Perl_pp_add () at pp_hot.c:309
|
---|
| 1609 | 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
|
---|
| 1610 | (gdb) step
|
---|
| 1611 | 311 dPOPTOPnnrl_ul;
|
---|
| 1612 | (gdb)
|
---|
| 1613 |
|
---|
| 1614 | We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul>
|
---|
| 1615 | arranges for two C<NV>s to be placed into C<left> and C<right> - let's
|
---|
| 1616 | slightly expand it:
|
---|
| 1617 |
|
---|
| 1618 | #define dPOPTOPnnrl_ul NV right = POPn; \
|
---|
| 1619 | SV *leftsv = TOPs; \
|
---|
| 1620 | NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
|
---|
| 1621 |
|
---|
| 1622 | C<POPn> takes the SV from the top of the stack and obtains its NV either
|
---|
| 1623 | directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function.
|
---|
| 1624 | C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses
|
---|
| 1625 | C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from
|
---|
| 1626 | C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>.
|
---|
| 1627 |
|
---|
| 1628 | Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
|
---|
| 1629 | convert it. If we step again, we'll find ourselves there:
|
---|
| 1630 |
|
---|
| 1631 | Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
|
---|
| 1632 | 1669 if (!sv)
|
---|
| 1633 | (gdb)
|
---|
| 1634 |
|
---|
| 1635 | We can now use C<Perl_sv_dump> to investigate the SV:
|
---|
| 1636 |
|
---|
| 1637 | SV = PV(0xa057cc0) at 0xa0675d0
|
---|
| 1638 | REFCNT = 1
|
---|
| 1639 | FLAGS = (POK,pPOK)
|
---|
| 1640 | PV = 0xa06a510 "6XXXX"\0
|
---|
| 1641 | CUR = 5
|
---|
| 1642 | LEN = 6
|
---|
| 1643 | $1 = void
|
---|
| 1644 |
|
---|
| 1645 | We know we're going to get C<6> from this, so let's finish the
|
---|
| 1646 | subroutine:
|
---|
| 1647 |
|
---|
| 1648 | (gdb) finish
|
---|
| 1649 | Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
|
---|
| 1650 | 0x462669 in Perl_pp_add () at pp_hot.c:311
|
---|
| 1651 | 311 dPOPTOPnnrl_ul;
|
---|
| 1652 |
|
---|
| 1653 | We can also dump out this op: the current op is always stored in
|
---|
| 1654 | C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
|
---|
| 1655 | similar output to L<B::Debug|B::Debug>.
|
---|
| 1656 |
|
---|
| 1657 | {
|
---|
| 1658 | 13 TYPE = add ===> 14
|
---|
| 1659 | TARG = 1
|
---|
| 1660 | FLAGS = (SCALAR,KIDS)
|
---|
| 1661 | {
|
---|
| 1662 | TYPE = null ===> (12)
|
---|
| 1663 | (was rv2sv)
|
---|
| 1664 | FLAGS = (SCALAR,KIDS)
|
---|
| 1665 | {
|
---|
| 1666 | 11 TYPE = gvsv ===> 12
|
---|
| 1667 | FLAGS = (SCALAR)
|
---|
| 1668 | GV = main::b
|
---|
| 1669 | }
|
---|
| 1670 | }
|
---|
| 1671 |
|
---|
| 1672 | # finish this later #
|
---|
| 1673 |
|
---|
| 1674 | =head2 Patching
|
---|
| 1675 |
|
---|
| 1676 | All right, we've now had a look at how to navigate the Perl sources and
|
---|
| 1677 | some things you'll need to know when fiddling with them. Let's now get
|
---|
| 1678 | on and create a simple patch. Here's something Larry suggested: if a
|
---|
| 1679 | C<U> is the first active format during a C<pack>, (for example,
|
---|
| 1680 | C<pack "U3C8", @stuff>) then the resulting string should be treated as
|
---|
| 1681 | UTF-8 encoded.
|
---|
| 1682 |
|
---|
| 1683 | How do we prepare to fix this up? First we locate the code in question -
|
---|
| 1684 | the C<pack> happens at runtime, so it's going to be in one of the F<pp>
|
---|
| 1685 | files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be
|
---|
| 1686 | altering this file, let's copy it to F<pp.c~>.
|
---|
| 1687 |
|
---|
| 1688 | [Well, it was in F<pp.c> when this tutorial was written. It has now been
|
---|
| 1689 | split off with C<pp_unpack> to its own file, F<pp_pack.c>]
|
---|
| 1690 |
|
---|
| 1691 | Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
|
---|
| 1692 | loop over the pattern, taking each format character in turn into
|
---|
| 1693 | C<datum_type>. Then for each possible format character, we swallow up
|
---|
| 1694 | the other arguments in the pattern (a field width, an asterisk, and so
|
---|
| 1695 | on) and convert the next chunk input into the specified format, adding
|
---|
| 1696 | it onto the output SV C<cat>.
|
---|
| 1697 |
|
---|
| 1698 | How do we know if the C<U> is the first format in the C<pat>? Well, if
|
---|
| 1699 | we have a pointer to the start of C<pat> then, if we see a C<U> we can
|
---|
| 1700 | test whether we're still at the start of the string. So, here's where
|
---|
| 1701 | C<pat> is set up:
|
---|
| 1702 |
|
---|
| 1703 | STRLEN fromlen;
|
---|
| 1704 | register char *pat = SvPVx(*++MARK, fromlen);
|
---|
| 1705 | register char *patend = pat + fromlen;
|
---|
| 1706 | register I32 len;
|
---|
| 1707 | I32 datumtype;
|
---|
| 1708 | SV *fromstr;
|
---|
| 1709 |
|
---|
| 1710 | We'll have another string pointer in there:
|
---|
| 1711 |
|
---|
| 1712 | STRLEN fromlen;
|
---|
| 1713 | register char *pat = SvPVx(*++MARK, fromlen);
|
---|
| 1714 | register char *patend = pat + fromlen;
|
---|
| 1715 | + char *patcopy;
|
---|
| 1716 | register I32 len;
|
---|
| 1717 | I32 datumtype;
|
---|
| 1718 | SV *fromstr;
|
---|
| 1719 |
|
---|
| 1720 | And just before we start the loop, we'll set C<patcopy> to be the start
|
---|
| 1721 | of C<pat>:
|
---|
| 1722 |
|
---|
| 1723 | items = SP - MARK;
|
---|
| 1724 | MARK++;
|
---|
| 1725 | sv_setpvn(cat, "", 0);
|
---|
| 1726 | + patcopy = pat;
|
---|
| 1727 | while (pat < patend) {
|
---|
| 1728 |
|
---|
| 1729 | Now if we see a C<U> which was at the start of the string, we turn on
|
---|
| 1730 | the C<UTF8> flag for the output SV, C<cat>:
|
---|
| 1731 |
|
---|
| 1732 | + if (datumtype == 'U' && pat==patcopy+1)
|
---|
| 1733 | + SvUTF8_on(cat);
|
---|
| 1734 | if (datumtype == '#') {
|
---|
| 1735 | while (pat < patend && *pat != '\n')
|
---|
| 1736 | pat++;
|
---|
| 1737 |
|
---|
| 1738 | Remember that it has to be C<patcopy+1> because the first character of
|
---|
| 1739 | the string is the C<U> which has been swallowed into C<datumtype!>
|
---|
| 1740 |
|
---|
| 1741 | Oops, we forgot one thing: what if there are spaces at the start of the
|
---|
| 1742 | pattern? C<pack(" U*", @stuff)> will have C<U> as the first active
|
---|
| 1743 | character, even though it's not the first thing in the pattern. In this
|
---|
| 1744 | case, we have to advance C<patcopy> along with C<pat> when we see spaces:
|
---|
| 1745 |
|
---|
| 1746 | if (isSPACE(datumtype))
|
---|
| 1747 | continue;
|
---|
| 1748 |
|
---|
| 1749 | needs to become
|
---|
| 1750 |
|
---|
| 1751 | if (isSPACE(datumtype)) {
|
---|
| 1752 | patcopy++;
|
---|
| 1753 | continue;
|
---|
| 1754 | }
|
---|
| 1755 |
|
---|
| 1756 | OK. That's the C part done. Now we must do two additional things before
|
---|
| 1757 | this patch is ready to go: we've changed the behaviour of Perl, and so
|
---|
| 1758 | we must document that change. We must also provide some more regression
|
---|
| 1759 | tests to make sure our patch works and doesn't create a bug somewhere
|
---|
| 1760 | else along the line.
|
---|
| 1761 |
|
---|
| 1762 | The regression tests for each operator live in F<t/op/>, and so we
|
---|
| 1763 | make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our
|
---|
| 1764 | tests to the end. First, we'll test that the C<U> does indeed create
|
---|
| 1765 | Unicode strings.
|
---|
| 1766 |
|
---|
| 1767 | t/op/pack.t has a sensible ok() function, but if it didn't we could
|
---|
| 1768 | use the one from t/test.pl.
|
---|
| 1769 |
|
---|
| 1770 | require './test.pl';
|
---|
| 1771 | plan( tests => 159 );
|
---|
| 1772 |
|
---|
| 1773 | so instead of this:
|
---|
| 1774 |
|
---|
| 1775 | print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000);
|
---|
| 1776 | print "ok $test\n"; $test++;
|
---|
| 1777 |
|
---|
| 1778 | we can write the more sensible (see L<Test::More> for a full
|
---|
| 1779 | explanation of is() and other testing functions).
|
---|
| 1780 |
|
---|
| 1781 | is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
|
---|
| 1782 | "U* produces unicode" );
|
---|
| 1783 |
|
---|
| 1784 | Now we'll test that we got that space-at-the-beginning business right:
|
---|
| 1785 |
|
---|
| 1786 | is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000),
|
---|
| 1787 | " with spaces at the beginning" );
|
---|
| 1788 |
|
---|
| 1789 | And finally we'll test that we don't make Unicode strings if C<U> is B<not>
|
---|
| 1790 | the first active format:
|
---|
| 1791 |
|
---|
| 1792 | isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
|
---|
| 1793 | "U* not first isn't unicode" );
|
---|
| 1794 |
|
---|
| 1795 | Mustn't forget to change the number of tests which appears at the top,
|
---|
| 1796 | or else the automated tester will get confused. This will either look
|
---|
| 1797 | like this:
|
---|
| 1798 |
|
---|
| 1799 | print "1..156\n";
|
---|
| 1800 |
|
---|
| 1801 | or this:
|
---|
| 1802 |
|
---|
| 1803 | plan( tests => 156 );
|
---|
| 1804 |
|
---|
| 1805 | We now compile up Perl, and run it through the test suite. Our new
|
---|
| 1806 | tests pass, hooray!
|
---|
| 1807 |
|
---|
| 1808 | Finally, the documentation. The job is never done until the paperwork is
|
---|
| 1809 | over, so let's describe the change we've just made. The relevant place
|
---|
| 1810 | is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert
|
---|
| 1811 | this text in the description of C<pack>:
|
---|
| 1812 |
|
---|
| 1813 | =item *
|
---|
| 1814 |
|
---|
| 1815 | If the pattern begins with a C<U>, the resulting string will be treated
|
---|
| 1816 | as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
|
---|
| 1817 | with an initial C<U0>, and the bytes that follow will be interpreted as
|
---|
| 1818 | Unicode characters. If you don't want this to happen, you can begin your
|
---|
| 1819 | pattern with C<C0> (or anything else) to force Perl not to UTF-8 encode your
|
---|
| 1820 | string, and then follow this with a C<U*> somewhere in your pattern.
|
---|
| 1821 |
|
---|
| 1822 | All done. Now let's create the patch. F<Porting/patching.pod> tells us
|
---|
| 1823 | that if we're making major changes, we should copy the entire directory
|
---|
| 1824 | to somewhere safe before we begin fiddling, and then do
|
---|
| 1825 |
|
---|
| 1826 | diff -ruN old new > patch
|
---|
| 1827 |
|
---|
| 1828 | However, we know which files we've changed, and we can simply do this:
|
---|
| 1829 |
|
---|
| 1830 | diff -u pp.c~ pp.c > patch
|
---|
| 1831 | diff -u t/op/pack.t~ t/op/pack.t >> patch
|
---|
| 1832 | diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch
|
---|
| 1833 |
|
---|
| 1834 | We end up with a patch looking a little like this:
|
---|
| 1835 |
|
---|
| 1836 | --- pp.c~ Fri Jun 02 04:34:10 2000
|
---|
| 1837 | +++ pp.c Fri Jun 16 11:37:25 2000
|
---|
| 1838 | @@ -4375,6 +4375,7 @@
|
---|
| 1839 | register I32 items;
|
---|
| 1840 | STRLEN fromlen;
|
---|
| 1841 | register char *pat = SvPVx(*++MARK, fromlen);
|
---|
| 1842 | + char *patcopy;
|
---|
| 1843 | register char *patend = pat + fromlen;
|
---|
| 1844 | register I32 len;
|
---|
| 1845 | I32 datumtype;
|
---|
| 1846 | @@ -4405,6 +4406,7 @@
|
---|
| 1847 | ...
|
---|
| 1848 |
|
---|
| 1849 | And finally, we submit it, with our rationale, to perl5-porters. Job
|
---|
| 1850 | done!
|
---|
| 1851 |
|
---|
| 1852 | =head2 Patching a core module
|
---|
| 1853 |
|
---|
| 1854 | This works just like patching anything else, with an extra
|
---|
| 1855 | consideration. Many core modules also live on CPAN. If this is so,
|
---|
| 1856 | patch the CPAN version instead of the core and send the patch off to
|
---|
| 1857 | the module maintainer (with a copy to p5p). This will help the module
|
---|
| 1858 | maintainer keep the CPAN version in sync with the core version without
|
---|
| 1859 | constantly scanning p5p.
|
---|
| 1860 |
|
---|
| 1861 | The list of maintainers of core modules is usefully documented in
|
---|
| 1862 | F<Porting/Maintainers.pl>.
|
---|
| 1863 |
|
---|
| 1864 | =head2 Adding a new function to the core
|
---|
| 1865 |
|
---|
| 1866 | If, as part of a patch to fix a bug, or just because you have an
|
---|
| 1867 | especially good idea, you decide to add a new function to the core,
|
---|
| 1868 | discuss your ideas on p5p well before you start work. It may be that
|
---|
| 1869 | someone else has already attempted to do what you are considering and
|
---|
| 1870 | can give lots of good advice or even provide you with bits of code
|
---|
| 1871 | that they already started (but never finished).
|
---|
| 1872 |
|
---|
| 1873 | You have to follow all of the advice given above for patching. It is
|
---|
| 1874 | extremely important to test any addition thoroughly and add new tests
|
---|
| 1875 | to explore all boundary conditions that your new function is expected
|
---|
| 1876 | to handle. If your new function is used only by one module (e.g. toke),
|
---|
| 1877 | then it should probably be named S_your_function (for static); on the
|
---|
| 1878 | other hand, if you expect it to accessible from other functions in
|
---|
| 1879 | Perl, you should name it Perl_your_function. See L<perlguts/Internal Functions>
|
---|
| 1880 | for more details.
|
---|
| 1881 |
|
---|
| 1882 | The location of any new code is also an important consideration. Don't
|
---|
| 1883 | just create a new top level .c file and put your code there; you would
|
---|
| 1884 | have to make changes to Configure (so the Makefile is created properly),
|
---|
| 1885 | as well as possibly lots of include files. This is strictly pumpking
|
---|
| 1886 | business.
|
---|
| 1887 |
|
---|
| 1888 | It is better to add your function to one of the existing top level
|
---|
| 1889 | source code files, but your choice is complicated by the nature of
|
---|
| 1890 | the Perl distribution. Only the files that are marked as compiled
|
---|
| 1891 | static are located in the perl executable. Everything else is located
|
---|
| 1892 | in the shared library (or DLL if you are running under WIN32). So,
|
---|
| 1893 | for example, if a function was only used by functions located in
|
---|
| 1894 | toke.c, then your code can go in toke.c. If, however, you want to call
|
---|
| 1895 | the function from universal.c, then you should put your code in another
|
---|
| 1896 | location, for example util.c.
|
---|
| 1897 |
|
---|
| 1898 | In addition to writing your c-code, you will need to create an
|
---|
| 1899 | appropriate entry in embed.pl describing your function, then run
|
---|
| 1900 | 'make regen_headers' to create the entries in the numerous header
|
---|
| 1901 | files that perl needs to compile correctly. See L<perlguts/Internal Functions>
|
---|
| 1902 | for information on the various options that you can set in embed.pl.
|
---|
| 1903 | You will forget to do this a few (or many) times and you will get
|
---|
| 1904 | warnings during the compilation phase. Make sure that you mention
|
---|
| 1905 | this when you post your patch to P5P; the pumpking needs to know this.
|
---|
| 1906 |
|
---|
| 1907 | When you write your new code, please be conscious of existing code
|
---|
| 1908 | conventions used in the perl source files. See L<perlstyle> for
|
---|
| 1909 | details. Although most of the guidelines discussed seem to focus on
|
---|
| 1910 | Perl code, rather than c, they all apply (except when they don't ;).
|
---|
| 1911 | See also I<Porting/patching.pod> file in the Perl source distribution
|
---|
| 1912 | for lots of details about both formatting and submitting patches of
|
---|
| 1913 | your changes.
|
---|
| 1914 |
|
---|
| 1915 | Lastly, TEST TEST TEST TEST TEST any code before posting to p5p.
|
---|
| 1916 | Test on as many platforms as you can find. Test as many perl
|
---|
| 1917 | Configure options as you can (e.g. MULTIPLICITY). If you have
|
---|
| 1918 | profiling or memory tools, see L<EXTERNAL TOOLS FOR DEBUGGING PERL>
|
---|
| 1919 | below for how to use them to further test your code. Remember that
|
---|
| 1920 | most of the people on P5P are doing this on their own time and
|
---|
| 1921 | don't have the time to debug your code.
|
---|
| 1922 |
|
---|
| 1923 | =head2 Writing a test
|
---|
| 1924 |
|
---|
| 1925 | Every module and built-in function has an associated test file (or
|
---|
| 1926 | should...). If you add or change functionality, you have to write a
|
---|
| 1927 | test. If you fix a bug, you have to write a test so that bug never
|
---|
| 1928 | comes back. If you alter the docs, it would be nice to test what the
|
---|
| 1929 | new documentation says.
|
---|
| 1930 |
|
---|
| 1931 | In short, if you submit a patch you probably also have to patch the
|
---|
| 1932 | tests.
|
---|
| 1933 |
|
---|
| 1934 | For modules, the test file is right next to the module itself.
|
---|
| 1935 | F<lib/strict.t> tests F<lib/strict.pm>. This is a recent innovation,
|
---|
| 1936 | so there are some snags (and it would be wonderful for you to brush
|
---|
| 1937 | them out), but it basically works that way. Everything else lives in
|
---|
| 1938 | F<t/>.
|
---|
| 1939 |
|
---|
| 1940 | =over 3
|
---|
| 1941 |
|
---|
| 1942 | =item F<t/base/>
|
---|
| 1943 |
|
---|
| 1944 | Testing of the absolute basic functionality of Perl. Things like
|
---|
| 1945 | C<if>, basic file reads and writes, simple regexes, etc. These are
|
---|
| 1946 | run first in the test suite and if any of them fail, something is
|
---|
| 1947 | I<really> broken.
|
---|
| 1948 |
|
---|
| 1949 | =item F<t/cmd/>
|
---|
| 1950 |
|
---|
| 1951 | These test the basic control structures, C<if/else>, C<while>,
|
---|
| 1952 | subroutines, etc.
|
---|
| 1953 |
|
---|
| 1954 | =item F<t/comp/>
|
---|
| 1955 |
|
---|
| 1956 | Tests basic issues of how Perl parses and compiles itself.
|
---|
| 1957 |
|
---|
| 1958 | =item F<t/io/>
|
---|
| 1959 |
|
---|
| 1960 | Tests for built-in IO functions, including command line arguments.
|
---|
| 1961 |
|
---|
| 1962 | =item F<t/lib/>
|
---|
| 1963 |
|
---|
| 1964 | The old home for the module tests, you shouldn't put anything new in
|
---|
| 1965 | here. There are still some bits and pieces hanging around in here
|
---|
| 1966 | that need to be moved. Perhaps you could move them? Thanks!
|
---|
| 1967 |
|
---|
| 1968 | =item F<t/op/>
|
---|
| 1969 |
|
---|
| 1970 | Tests for perl's built in functions that don't fit into any of the
|
---|
| 1971 | other directories.
|
---|
| 1972 |
|
---|
| 1973 | =item F<t/pod/>
|
---|
| 1974 |
|
---|
| 1975 | Tests for POD directives. There are still some tests for the Pod
|
---|
| 1976 | modules hanging around in here that need to be moved out into F<lib/>.
|
---|
| 1977 |
|
---|
| 1978 | =item F<t/run/>
|
---|
| 1979 |
|
---|
| 1980 | Testing features of how perl actually runs, including exit codes and
|
---|
| 1981 | handling of PERL* environment variables.
|
---|
| 1982 |
|
---|
| 1983 | =item F<t/uni/>
|
---|
| 1984 |
|
---|
| 1985 | Tests for the core support of Unicode.
|
---|
| 1986 |
|
---|
| 1987 | =item F<t/win32/>
|
---|
| 1988 |
|
---|
| 1989 | Windows-specific tests.
|
---|
| 1990 |
|
---|
| 1991 | =item F<t/x2p>
|
---|
| 1992 |
|
---|
| 1993 | A test suite for the s2p converter.
|
---|
| 1994 |
|
---|
| 1995 | =back
|
---|
| 1996 |
|
---|
| 1997 | The core uses the same testing style as the rest of Perl, a simple
|
---|
| 1998 | "ok/not ok" run through Test::Harness, but there are a few special
|
---|
| 1999 | considerations.
|
---|
| 2000 |
|
---|
| 2001 | There are three ways to write a test in the core. Test::More,
|
---|
| 2002 | t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The
|
---|
| 2003 | decision of which to use depends on what part of the test suite you're
|
---|
| 2004 | working on. This is a measure to prevent a high-level failure (such
|
---|
| 2005 | as Config.pm breaking) from causing basic functionality tests to fail.
|
---|
| 2006 |
|
---|
| 2007 | =over 4
|
---|
| 2008 |
|
---|
| 2009 | =item t/base t/comp
|
---|
| 2010 |
|
---|
| 2011 | Since we don't know if require works, or even subroutines, use ad hoc
|
---|
| 2012 | tests for these two. Step carefully to avoid using the feature being
|
---|
| 2013 | tested.
|
---|
| 2014 |
|
---|
| 2015 | =item t/cmd t/run t/io t/op
|
---|
| 2016 |
|
---|
| 2017 | Now that basic require() and subroutines are tested, you can use the
|
---|
| 2018 | t/test.pl library which emulates the important features of Test::More
|
---|
| 2019 | while using a minimum of core features.
|
---|
| 2020 |
|
---|
| 2021 | You can also conditionally use certain libraries like Config, but be
|
---|
| 2022 | sure to skip the test gracefully if it's not there.
|
---|
| 2023 |
|
---|
| 2024 | =item t/lib ext lib
|
---|
| 2025 |
|
---|
| 2026 | Now that the core of Perl is tested, Test::More can be used. You can
|
---|
| 2027 | also use the full suite of core modules in the tests.
|
---|
| 2028 |
|
---|
| 2029 | =back
|
---|
| 2030 |
|
---|
| 2031 | When you say "make test" Perl uses the F<t/TEST> program to run the
|
---|
| 2032 | test suite (except under Win32 where it uses F<t/harness> instead.)
|
---|
| 2033 | All tests are run from the F<t/> directory, B<not> the directory
|
---|
| 2034 | which contains the test. This causes some problems with the tests
|
---|
| 2035 | in F<lib/>, so here's some opportunity for some patching.
|
---|
| 2036 |
|
---|
| 2037 | You must be triply conscious of cross-platform concerns. This usually
|
---|
| 2038 | boils down to using File::Spec and avoiding things like C<fork()> and
|
---|
| 2039 | C<system()> unless absolutely necessary.
|
---|
| 2040 |
|
---|
| 2041 | =head2 Special Make Test Targets
|
---|
| 2042 |
|
---|
| 2043 | There are various special make targets that can be used to test Perl
|
---|
| 2044 | slightly differently than the standard "test" target. Not all them
|
---|
| 2045 | are expected to give a 100% success rate. Many of them have several
|
---|
| 2046 | aliases, and many of them are not available on certain operating
|
---|
| 2047 | systems.
|
---|
| 2048 |
|
---|
| 2049 | =over 4
|
---|
| 2050 |
|
---|
| 2051 | =item coretest
|
---|
| 2052 |
|
---|
| 2053 | Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests).
|
---|
| 2054 |
|
---|
| 2055 | (Not available on Win32)
|
---|
| 2056 |
|
---|
| 2057 | =item test.deparse
|
---|
| 2058 |
|
---|
| 2059 | Run all the tests through B::Deparse. Not all tests will succeed.
|
---|
| 2060 |
|
---|
| 2061 | (Not available on Win32)
|
---|
| 2062 |
|
---|
| 2063 | =item test.taintwarn
|
---|
| 2064 |
|
---|
| 2065 | Run all tests with the B<-t> command-line switch. Not all tests
|
---|
| 2066 | are expected to succeed (until they're specifically fixed, of course).
|
---|
| 2067 |
|
---|
| 2068 | (Not available on Win32)
|
---|
| 2069 |
|
---|
| 2070 | =item minitest
|
---|
| 2071 |
|
---|
| 2072 | Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>,
|
---|
| 2073 | F<t/op>, and F<t/uni> tests.
|
---|
| 2074 |
|
---|
| 2075 | =item test.valgrind check.valgrind utest.valgrind ucheck.valgrind
|
---|
| 2076 |
|
---|
| 2077 | (Only in Linux) Run all the tests using the memory leak + naughty
|
---|
| 2078 | memory access tool "valgrind". The log files will be named
|
---|
| 2079 | F<testname.valgrind>.
|
---|
| 2080 |
|
---|
| 2081 | =item test.third check.third utest.third ucheck.third
|
---|
| 2082 |
|
---|
| 2083 | (Only in Tru64) Run all the tests using the memory leak + naughty
|
---|
| 2084 | memory access tool "Third Degree". The log files will be named
|
---|
| 2085 | F<perl.3log.testname>.
|
---|
| 2086 |
|
---|
| 2087 | =item test.torture torturetest
|
---|
| 2088 |
|
---|
| 2089 | Run all the usual tests and some extra tests. As of Perl 5.8.0 the
|
---|
| 2090 | only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>.
|
---|
| 2091 |
|
---|
| 2092 | You can also run the torture test with F<t/harness> by giving
|
---|
| 2093 | C<-torture> argument to F<t/harness>.
|
---|
| 2094 |
|
---|
| 2095 | =item utest ucheck test.utf8 check.utf8
|
---|
| 2096 |
|
---|
| 2097 | Run all the tests with -Mutf8. Not all tests will succeed.
|
---|
| 2098 |
|
---|
| 2099 | (Not available on Win32)
|
---|
| 2100 |
|
---|
| 2101 | =item minitest.utf16 test.utf16
|
---|
| 2102 |
|
---|
| 2103 | Runs the tests with UTF-16 encoded scripts, encoded with different
|
---|
| 2104 | versions of this encoding.
|
---|
| 2105 |
|
---|
| 2106 | C<make utest.utf16> runs the test suite with a combination of C<-utf8> and
|
---|
| 2107 | C<-utf16> arguments to F<t/TEST>.
|
---|
| 2108 |
|
---|
| 2109 | (Not available on Win32)
|
---|
| 2110 |
|
---|
| 2111 | =item test_harness
|
---|
| 2112 |
|
---|
| 2113 | Run the test suite with the F<t/harness> controlling program, instead of
|
---|
| 2114 | F<t/TEST>. F<t/harness> is more sophisticated, and uses the
|
---|
| 2115 | L<Test::Harness> module, thus using this test target supposes that perl
|
---|
| 2116 | mostly works. The main advantage for our purposes is that it prints a
|
---|
| 2117 | detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it
|
---|
| 2118 | doesn't redirect stderr to stdout.
|
---|
| 2119 |
|
---|
| 2120 | Note that under Win32 F<t/harness> is always used instead of F<t/TEST>, so
|
---|
| 2121 | there is no special "test_harness" target.
|
---|
| 2122 |
|
---|
| 2123 | Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES
|
---|
| 2124 | environment variables to control the behaviour of F<t/harness>. This means
|
---|
| 2125 | you can say
|
---|
| 2126 |
|
---|
| 2127 | nmake test TEST_FILES="op/*.t"
|
---|
| 2128 | nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t"
|
---|
| 2129 |
|
---|
| 2130 | =item test-notty test_notty
|
---|
| 2131 |
|
---|
| 2132 | Sets PERL_SKIP_TTY_TEST to true before running normal test.
|
---|
| 2133 |
|
---|
| 2134 | =back
|
---|
| 2135 |
|
---|
| 2136 | =head2 Running tests by hand
|
---|
| 2137 |
|
---|
| 2138 | You can run part of the test suite by hand by using one the following
|
---|
| 2139 | commands from the F<t/> directory :
|
---|
| 2140 |
|
---|
| 2141 | ./perl -I../lib TEST list-of-.t-files
|
---|
| 2142 |
|
---|
| 2143 | or
|
---|
| 2144 |
|
---|
| 2145 | ./perl -I../lib harness list-of-.t-files
|
---|
| 2146 |
|
---|
| 2147 | (if you don't specify test scripts, the whole test suite will be run.)
|
---|
| 2148 |
|
---|
| 2149 | =head3 Using t/harness for testing
|
---|
| 2150 |
|
---|
| 2151 | If you use C<harness> for testing you have several command line options
|
---|
| 2152 | available to you. The arguments are as follows, and are in the order
|
---|
| 2153 | that they must appear if used together.
|
---|
| 2154 |
|
---|
| 2155 | harness -v -torture -re=pattern LIST OF FILES TO TEST
|
---|
| 2156 | harness -v -torture -re LIST OF PATTERNS TO MATCH
|
---|
| 2157 |
|
---|
| 2158 | If C<LIST OF FILES TO TEST> is omitted the file list is obtained from
|
---|
| 2159 | the manifest. The file list may include shell wildcards which will be
|
---|
| 2160 | expanded out.
|
---|
| 2161 |
|
---|
| 2162 | =over 4
|
---|
| 2163 |
|
---|
| 2164 | =item -v
|
---|
| 2165 |
|
---|
| 2166 | Run the tests under verbose mode so you can see what tests were run,
|
---|
| 2167 | and debug outbut.
|
---|
| 2168 |
|
---|
| 2169 | =item -torture
|
---|
| 2170 |
|
---|
| 2171 | Run the torture tests as well as the normal set.
|
---|
| 2172 |
|
---|
| 2173 | =item -re=PATTERN
|
---|
| 2174 |
|
---|
| 2175 | Filter the file list so that all the test files run match PATTERN.
|
---|
| 2176 | Note that this form is distinct from the B<-re LIST OF PATTERNS> form below
|
---|
| 2177 | in that it allows the file list to be provided as well.
|
---|
| 2178 |
|
---|
| 2179 | =item -re LIST OF PATTERNS
|
---|
| 2180 |
|
---|
| 2181 | Filter the file list so that all the test files run match
|
---|
| 2182 | /(LIST|OF|PATTERNS)/. Note that with this form the patterns
|
---|
| 2183 | are joined by '|' and you cannot supply a list of files, instead
|
---|
| 2184 | the test files are obtained from the MANIFEST.
|
---|
| 2185 |
|
---|
| 2186 | =back
|
---|
| 2187 |
|
---|
| 2188 | You can run an individual test by a command similar to
|
---|
| 2189 |
|
---|
| 2190 | ./perl -I../lib patho/to/foo.t
|
---|
| 2191 |
|
---|
| 2192 | except that the harnesses set up some environment variables that may
|
---|
| 2193 | affect the execution of the test :
|
---|
| 2194 |
|
---|
| 2195 | =over 4
|
---|
| 2196 |
|
---|
| 2197 | =item PERL_CORE=1
|
---|
| 2198 |
|
---|
| 2199 | indicates that we're running this test part of the perl core test suite.
|
---|
| 2200 | This is useful for modules that have a dual life on CPAN.
|
---|
| 2201 |
|
---|
| 2202 | =item PERL_DESTRUCT_LEVEL=2
|
---|
| 2203 |
|
---|
| 2204 | is set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>)
|
---|
| 2205 |
|
---|
| 2206 | =item PERL
|
---|
| 2207 |
|
---|
| 2208 | (used only by F<t/TEST>) if set, overrides the path to the perl executable
|
---|
| 2209 | that should be used to run the tests (the default being F<./perl>).
|
---|
| 2210 |
|
---|
| 2211 | =item PERL_SKIP_TTY_TEST
|
---|
| 2212 |
|
---|
| 2213 | if set, tells to skip the tests that need a terminal. It's actually set
|
---|
| 2214 | automatically by the Makefile, but can also be forced artificially by
|
---|
| 2215 | running 'make test_notty'.
|
---|
| 2216 |
|
---|
| 2217 | =back
|
---|
| 2218 |
|
---|
| 2219 | =head1 EXTERNAL TOOLS FOR DEBUGGING PERL
|
---|
| 2220 |
|
---|
| 2221 | Sometimes it helps to use external tools while debugging and
|
---|
| 2222 | testing Perl. This section tries to guide you through using
|
---|
| 2223 | some common testing and debugging tools with Perl. This is
|
---|
| 2224 | meant as a guide to interfacing these tools with Perl, not
|
---|
| 2225 | as any kind of guide to the use of the tools themselves.
|
---|
| 2226 |
|
---|
| 2227 | B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or
|
---|
| 2228 | Third Degree greatly slows down the execution: seconds become minutes,
|
---|
| 2229 | minutes become hours. For example as of Perl 5.8.1, the
|
---|
| 2230 | ext/Encode/t/Unicode.t takes extraordinarily long to complete under
|
---|
| 2231 | e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more
|
---|
| 2232 | than six hours, even on a snappy computer-- the said test must be
|
---|
| 2233 | doing something that is quite unfriendly for memory debuggers. If you
|
---|
| 2234 | don't feel like waiting, that you can simply kill away the perl
|
---|
| 2235 | process.
|
---|
| 2236 |
|
---|
| 2237 | B<NOTE 2>: To minimize the number of memory leak false alarms (see
|
---|
| 2238 | L</PERL_DESTRUCT_LEVEL> for more information), you have to have
|
---|
| 2239 | environment variable PERL_DESTRUCT_LEVEL set to 2. The F<TEST>
|
---|
| 2240 | and harness scripts do that automatically. But if you are running
|
---|
| 2241 | some of the tests manually-- for csh-like shells:
|
---|
| 2242 |
|
---|
| 2243 | setenv PERL_DESTRUCT_LEVEL 2
|
---|
| 2244 |
|
---|
| 2245 | and for Bourne-type shells:
|
---|
| 2246 |
|
---|
| 2247 | PERL_DESTRUCT_LEVEL=2
|
---|
| 2248 | export PERL_DESTRUCT_LEVEL
|
---|
| 2249 |
|
---|
| 2250 | or in UNIXy environments you can also use the C<env> command:
|
---|
| 2251 |
|
---|
| 2252 | env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
|
---|
| 2253 |
|
---|
| 2254 | B<NOTE 3>: There are known memory leaks when there are compile-time
|
---|
| 2255 | errors within eval or require, seeing C<S_doeval> in the call stack
|
---|
| 2256 | is a good sign of these. Fixing these leaks is non-trivial,
|
---|
| 2257 | unfortunately, but they must be fixed eventually.
|
---|
| 2258 |
|
---|
| 2259 | =head2 Rational Software's Purify
|
---|
| 2260 |
|
---|
| 2261 | Purify is a commercial tool that is helpful in identifying
|
---|
| 2262 | memory overruns, wild pointers, memory leaks and other such
|
---|
| 2263 | badness. Perl must be compiled in a specific way for
|
---|
| 2264 | optimal testing with Purify. Purify is available under
|
---|
| 2265 | Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.
|
---|
| 2266 |
|
---|
| 2267 | =head2 Purify on Unix
|
---|
| 2268 |
|
---|
| 2269 | On Unix, Purify creates a new Perl binary. To get the most
|
---|
| 2270 | benefit out of Purify, you should create the perl to Purify
|
---|
| 2271 | using:
|
---|
| 2272 |
|
---|
| 2273 | sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
|
---|
| 2274 | -Uusemymalloc -Dusemultiplicity
|
---|
| 2275 |
|
---|
| 2276 | where these arguments mean:
|
---|
| 2277 |
|
---|
| 2278 | =over 4
|
---|
| 2279 |
|
---|
| 2280 | =item -Accflags=-DPURIFY
|
---|
| 2281 |
|
---|
| 2282 | Disables Perl's arena memory allocation functions, as well as
|
---|
| 2283 | forcing use of memory allocation functions derived from the
|
---|
| 2284 | system malloc.
|
---|
| 2285 |
|
---|
| 2286 | =item -Doptimize='-g'
|
---|
| 2287 |
|
---|
| 2288 | Adds debugging information so that you see the exact source
|
---|
| 2289 | statements where the problem occurs. Without this flag, all
|
---|
| 2290 | you will see is the source filename of where the error occurred.
|
---|
| 2291 |
|
---|
| 2292 | =item -Uusemymalloc
|
---|
| 2293 |
|
---|
| 2294 | Disable Perl's malloc so that Purify can more closely monitor
|
---|
| 2295 | allocations and leaks. Using Perl's malloc will make Purify
|
---|
| 2296 | report most leaks in the "potential" leaks category.
|
---|
| 2297 |
|
---|
| 2298 | =item -Dusemultiplicity
|
---|
| 2299 |
|
---|
| 2300 | Enabling the multiplicity option allows perl to clean up
|
---|
| 2301 | thoroughly when the interpreter shuts down, which reduces the
|
---|
| 2302 | number of bogus leak reports from Purify.
|
---|
| 2303 |
|
---|
| 2304 | =back
|
---|
| 2305 |
|
---|
| 2306 | Once you've compiled a perl suitable for Purify'ing, then you
|
---|
| 2307 | can just:
|
---|
| 2308 |
|
---|
| 2309 | make pureperl
|
---|
| 2310 |
|
---|
| 2311 | which creates a binary named 'pureperl' that has been Purify'ed.
|
---|
| 2312 | This binary is used in place of the standard 'perl' binary
|
---|
| 2313 | when you want to debug Perl memory problems.
|
---|
| 2314 |
|
---|
| 2315 | As an example, to show any memory leaks produced during the
|
---|
| 2316 | standard Perl testset you would create and run the Purify'ed
|
---|
| 2317 | perl as:
|
---|
| 2318 |
|
---|
| 2319 | make pureperl
|
---|
| 2320 | cd t
|
---|
| 2321 | ../pureperl -I../lib harness
|
---|
| 2322 |
|
---|
| 2323 | which would run Perl on test.pl and report any memory problems.
|
---|
| 2324 |
|
---|
| 2325 | Purify outputs messages in "Viewer" windows by default. If
|
---|
| 2326 | you don't have a windowing environment or if you simply
|
---|
| 2327 | want the Purify output to unobtrusively go to a log file
|
---|
| 2328 | instead of to the interactive window, use these following
|
---|
| 2329 | options to output to the log file "perl.log":
|
---|
| 2330 |
|
---|
| 2331 | setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
|
---|
| 2332 | -log-file=perl.log -append-logfile=yes"
|
---|
| 2333 |
|
---|
| 2334 | If you plan to use the "Viewer" windows, then you only need this option:
|
---|
| 2335 |
|
---|
| 2336 | setenv PURIFYOPTIONS "-chain-length=25"
|
---|
| 2337 |
|
---|
| 2338 | In Bourne-type shells:
|
---|
| 2339 |
|
---|
| 2340 | PURIFYOPTIONS="..."
|
---|
| 2341 | export PURIFYOPTIONS
|
---|
| 2342 |
|
---|
| 2343 | or if you have the "env" utility:
|
---|
| 2344 |
|
---|
| 2345 | env PURIFYOPTIONS="..." ../pureperl ...
|
---|
| 2346 |
|
---|
| 2347 | =head2 Purify on NT
|
---|
| 2348 |
|
---|
| 2349 | Purify on Windows NT instruments the Perl binary 'perl.exe'
|
---|
| 2350 | on the fly. There are several options in the makefile you
|
---|
| 2351 | should change to get the most use out of Purify:
|
---|
| 2352 |
|
---|
| 2353 | =over 4
|
---|
| 2354 |
|
---|
| 2355 | =item DEFINES
|
---|
| 2356 |
|
---|
| 2357 | You should add -DPURIFY to the DEFINES line so the DEFINES
|
---|
| 2358 | line looks something like:
|
---|
| 2359 |
|
---|
| 2360 | DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
|
---|
| 2361 |
|
---|
| 2362 | to disable Perl's arena memory allocation functions, as
|
---|
| 2363 | well as to force use of memory allocation functions derived
|
---|
| 2364 | from the system malloc.
|
---|
| 2365 |
|
---|
| 2366 | =item USE_MULTI = define
|
---|
| 2367 |
|
---|
| 2368 | Enabling the multiplicity option allows perl to clean up
|
---|
| 2369 | thoroughly when the interpreter shuts down, which reduces the
|
---|
| 2370 | number of bogus leak reports from Purify.
|
---|
| 2371 |
|
---|
| 2372 | =item #PERL_MALLOC = define
|
---|
| 2373 |
|
---|
| 2374 | Disable Perl's malloc so that Purify can more closely monitor
|
---|
| 2375 | allocations and leaks. Using Perl's malloc will make Purify
|
---|
| 2376 | report most leaks in the "potential" leaks category.
|
---|
| 2377 |
|
---|
| 2378 | =item CFG = Debug
|
---|
| 2379 |
|
---|
| 2380 | Adds debugging information so that you see the exact source
|
---|
| 2381 | statements where the problem occurs. Without this flag, all
|
---|
| 2382 | you will see is the source filename of where the error occurred.
|
---|
| 2383 |
|
---|
| 2384 | =back
|
---|
| 2385 |
|
---|
| 2386 | As an example, to show any memory leaks produced during the
|
---|
| 2387 | standard Perl testset you would create and run Purify as:
|
---|
| 2388 |
|
---|
| 2389 | cd win32
|
---|
| 2390 | make
|
---|
| 2391 | cd ../t
|
---|
| 2392 | purify ../perl -I../lib harness
|
---|
| 2393 |
|
---|
| 2394 | which would instrument Perl in memory, run Perl on test.pl,
|
---|
| 2395 | then finally report any memory problems.
|
---|
| 2396 |
|
---|
| 2397 | =head2 valgrind
|
---|
| 2398 |
|
---|
| 2399 | The excellent valgrind tool can be used to find out both memory leaks
|
---|
| 2400 | and illegal memory accesses. As of August 2003 it unfortunately works
|
---|
| 2401 | only on x86 (ELF) Linux. The special "test.valgrind" target can be used
|
---|
| 2402 | to run the tests under valgrind. Found errors and memory leaks are
|
---|
| 2403 | logged in files named F<test.valgrind>.
|
---|
| 2404 |
|
---|
| 2405 | As system libraries (most notably glibc) are also triggering errors,
|
---|
| 2406 | valgrind allows to suppress such errors using suppression files. The
|
---|
| 2407 | default suppression file that comes with valgrind already catches a lot
|
---|
| 2408 | of them. Some additional suppressions are defined in F<t/perl.supp>.
|
---|
| 2409 |
|
---|
| 2410 | To get valgrind and for more information see
|
---|
| 2411 |
|
---|
| 2412 | http://developer.kde.org/~sewardj/
|
---|
| 2413 |
|
---|
| 2414 | =head2 Compaq's/Digital's/HP's Third Degree
|
---|
| 2415 |
|
---|
| 2416 | Third Degree is a tool for memory leak detection and memory access checks.
|
---|
| 2417 | It is one of the many tools in the ATOM toolkit. The toolkit is only
|
---|
| 2418 | available on Tru64 (formerly known as Digital UNIX formerly known as
|
---|
| 2419 | DEC OSF/1).
|
---|
| 2420 |
|
---|
| 2421 | When building Perl, you must first run Configure with -Doptimize=-g
|
---|
| 2422 | and -Uusemymalloc flags, after that you can use the make targets
|
---|
| 2423 | "perl.third" and "test.third". (What is required is that Perl must be
|
---|
| 2424 | compiled using the C<-g> flag, you may need to re-Configure.)
|
---|
| 2425 |
|
---|
| 2426 | The short story is that with "atom" you can instrument the Perl
|
---|
| 2427 | executable to create a new executable called F<perl.third>. When the
|
---|
| 2428 | instrumented executable is run, it creates a log of dubious memory
|
---|
| 2429 | traffic in file called F<perl.3log>. See the manual pages of atom and
|
---|
| 2430 | third for more information. The most extensive Third Degree
|
---|
| 2431 | documentation is available in the Compaq "Tru64 UNIX Programmer's
|
---|
| 2432 | Guide", chapter "Debugging Programs with Third Degree".
|
---|
| 2433 |
|
---|
| 2434 | The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/
|
---|
| 2435 | subdirectory. There is a problem with these files: Third Degree is so
|
---|
| 2436 | effective that it finds problems also in the system libraries.
|
---|
| 2437 | Therefore you should used the Porting/thirdclean script to cleanup
|
---|
| 2438 | the F<*.3log> files.
|
---|
| 2439 |
|
---|
| 2440 | There are also leaks that for given certain definition of a leak,
|
---|
| 2441 | aren't. See L</PERL_DESTRUCT_LEVEL> for more information.
|
---|
| 2442 |
|
---|
| 2443 | =head2 PERL_DESTRUCT_LEVEL
|
---|
| 2444 |
|
---|
| 2445 | If you want to run any of the tests yourself manually using e.g.
|
---|
| 2446 | valgrind, or the pureperl or perl.third executables, please note that
|
---|
| 2447 | by default perl B<does not> explicitly cleanup all the memory it has
|
---|
| 2448 | allocated (such as global memory arenas) but instead lets the exit()
|
---|
| 2449 | of the whole program "take care" of such allocations, also known as
|
---|
| 2450 | "global destruction of objects".
|
---|
| 2451 |
|
---|
| 2452 | There is a way to tell perl to do complete cleanup: set the
|
---|
| 2453 | environment variable PERL_DESTRUCT_LEVEL to a non-zero value.
|
---|
| 2454 | The t/TEST wrapper does set this to 2, and this is what you
|
---|
| 2455 | need to do too, if you don't want to see the "global leaks":
|
---|
| 2456 | For example, for "third-degreed" Perl:
|
---|
| 2457 |
|
---|
| 2458 | env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
|
---|
| 2459 |
|
---|
| 2460 | (Note: the mod_perl apache module uses also this environment variable
|
---|
| 2461 | for its own purposes and extended its semantics. Refer to the mod_perl
|
---|
| 2462 | documentation for more information. Also, spawned threads do the
|
---|
| 2463 | equivalent of setting this variable to the value 1.)
|
---|
| 2464 |
|
---|
| 2465 | If, at the end of a run you get the message I<N scalars leaked>, you can
|
---|
| 2466 | recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause
|
---|
| 2467 | the addresses of all those leaked SVs to be dumped; it also converts
|
---|
| 2468 | C<new_SV()> from a macro into a real function, so you can use your
|
---|
| 2469 | favourite debugger to discover where those pesky SVs were allocated.
|
---|
| 2470 |
|
---|
| 2471 | =head2 Profiling
|
---|
| 2472 |
|
---|
| 2473 | Depending on your platform there are various of profiling Perl.
|
---|
| 2474 |
|
---|
| 2475 | There are two commonly used techniques of profiling executables:
|
---|
| 2476 | I<statistical time-sampling> and I<basic-block counting>.
|
---|
| 2477 |
|
---|
| 2478 | The first method takes periodically samples of the CPU program
|
---|
| 2479 | counter, and since the program counter can be correlated with the code
|
---|
| 2480 | generated for functions, we get a statistical view of in which
|
---|
| 2481 | functions the program is spending its time. The caveats are that very
|
---|
| 2482 | small/fast functions have lower probability of showing up in the
|
---|
| 2483 | profile, and that periodically interrupting the program (this is
|
---|
| 2484 | usually done rather frequently, in the scale of milliseconds) imposes
|
---|
| 2485 | an additional overhead that may skew the results. The first problem
|
---|
| 2486 | can be alleviated by running the code for longer (in general this is a
|
---|
| 2487 | good idea for profiling), the second problem is usually kept in guard
|
---|
| 2488 | by the profiling tools themselves.
|
---|
| 2489 |
|
---|
| 2490 | The second method divides up the generated code into I<basic blocks>.
|
---|
| 2491 | Basic blocks are sections of code that are entered only in the
|
---|
| 2492 | beginning and exited only at the end. For example, a conditional jump
|
---|
| 2493 | starts a basic block. Basic block profiling usually works by
|
---|
| 2494 | I<instrumenting> the code by adding I<enter basic block #nnnn>
|
---|
| 2495 | book-keeping code to the generated code. During the execution of the
|
---|
| 2496 | code the basic block counters are then updated appropriately. The
|
---|
| 2497 | caveat is that the added extra code can skew the results: again, the
|
---|
| 2498 | profiling tools usually try to factor their own effects out of the
|
---|
| 2499 | results.
|
---|
| 2500 |
|
---|
| 2501 | =head2 Gprof Profiling
|
---|
| 2502 |
|
---|
| 2503 | gprof is a profiling tool available in many UNIX platforms,
|
---|
| 2504 | it uses F<statistical time-sampling>.
|
---|
| 2505 |
|
---|
| 2506 | You can build a profiled version of perl called "perl.gprof" by
|
---|
| 2507 | invoking the make target "perl.gprof" (What is required is that Perl
|
---|
| 2508 | must be compiled using the C<-pg> flag, you may need to re-Configure).
|
---|
| 2509 | Running the profiled version of Perl will create an output file called
|
---|
| 2510 | F<gmon.out> is created which contains the profiling data collected
|
---|
| 2511 | during the execution.
|
---|
| 2512 |
|
---|
| 2513 | The gprof tool can then display the collected data in various ways.
|
---|
| 2514 | Usually gprof understands the following options:
|
---|
| 2515 |
|
---|
| 2516 | =over 4
|
---|
| 2517 |
|
---|
| 2518 | =item -a
|
---|
| 2519 |
|
---|
| 2520 | Suppress statically defined functions from the profile.
|
---|
| 2521 |
|
---|
| 2522 | =item -b
|
---|
| 2523 |
|
---|
| 2524 | Suppress the verbose descriptions in the profile.
|
---|
| 2525 |
|
---|
| 2526 | =item -e routine
|
---|
| 2527 |
|
---|
| 2528 | Exclude the given routine and its descendants from the profile.
|
---|
| 2529 |
|
---|
| 2530 | =item -f routine
|
---|
| 2531 |
|
---|
| 2532 | Display only the given routine and its descendants in the profile.
|
---|
| 2533 |
|
---|
| 2534 | =item -s
|
---|
| 2535 |
|
---|
| 2536 | Generate a summary file called F<gmon.sum> which then may be given
|
---|
| 2537 | to subsequent gprof runs to accumulate data over several runs.
|
---|
| 2538 |
|
---|
| 2539 | =item -z
|
---|
| 2540 |
|
---|
| 2541 | Display routines that have zero usage.
|
---|
| 2542 |
|
---|
| 2543 | =back
|
---|
| 2544 |
|
---|
| 2545 | For more detailed explanation of the available commands and output
|
---|
| 2546 | formats, see your own local documentation of gprof.
|
---|
| 2547 |
|
---|
| 2548 | =head2 GCC gcov Profiling
|
---|
| 2549 |
|
---|
| 2550 | Starting from GCC 3.0 I<basic block profiling> is officially available
|
---|
| 2551 | for the GNU CC.
|
---|
| 2552 |
|
---|
| 2553 | You can build a profiled version of perl called F<perl.gcov> by
|
---|
| 2554 | invoking the make target "perl.gcov" (what is required that Perl must
|
---|
| 2555 | be compiled using gcc with the flags C<-fprofile-arcs
|
---|
| 2556 | -ftest-coverage>, you may need to re-Configure).
|
---|
| 2557 |
|
---|
| 2558 | Running the profiled version of Perl will cause profile output to be
|
---|
| 2559 | generated. For each source file an accompanying ".da" file will be
|
---|
| 2560 | created.
|
---|
| 2561 |
|
---|
| 2562 | To display the results you use the "gcov" utility (which should
|
---|
| 2563 | be installed if you have gcc 3.0 or newer installed). F<gcov> is
|
---|
| 2564 | run on source code files, like this
|
---|
| 2565 |
|
---|
| 2566 | gcov sv.c
|
---|
| 2567 |
|
---|
| 2568 | which will cause F<sv.c.gcov> to be created. The F<.gcov> files
|
---|
| 2569 | contain the source code annotated with relative frequencies of
|
---|
| 2570 | execution indicated by "#" markers.
|
---|
| 2571 |
|
---|
| 2572 | Useful options of F<gcov> include C<-b> which will summarise the
|
---|
| 2573 | basic block, branch, and function call coverage, and C<-c> which
|
---|
| 2574 | instead of relative frequencies will use the actual counts. For
|
---|
| 2575 | more information on the use of F<gcov> and basic block profiling
|
---|
| 2576 | with gcc, see the latest GNU CC manual, as of GCC 3.0 see
|
---|
| 2577 |
|
---|
| 2578 | http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
|
---|
| 2579 |
|
---|
| 2580 | and its section titled "8. gcov: a Test Coverage Program"
|
---|
| 2581 |
|
---|
| 2582 | http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
|
---|
| 2583 |
|
---|
| 2584 | =head2 Pixie Profiling
|
---|
| 2585 |
|
---|
| 2586 | Pixie is a profiling tool available on IRIX and Tru64 (aka Digital
|
---|
| 2587 | UNIX aka DEC OSF/1) platforms. Pixie does its profiling using
|
---|
| 2588 | I<basic-block counting>.
|
---|
| 2589 |
|
---|
| 2590 | You can build a profiled version of perl called F<perl.pixie> by
|
---|
| 2591 | invoking the make target "perl.pixie" (what is required is that Perl
|
---|
| 2592 | must be compiled using the C<-g> flag, you may need to re-Configure).
|
---|
| 2593 |
|
---|
| 2594 | In Tru64 a file called F<perl.Addrs> will also be silently created,
|
---|
| 2595 | this file contains the addresses of the basic blocks. Running the
|
---|
| 2596 | profiled version of Perl will create a new file called "perl.Counts"
|
---|
| 2597 | which contains the counts for the basic block for that particular
|
---|
| 2598 | program execution.
|
---|
| 2599 |
|
---|
| 2600 | To display the results you use the F<prof> utility. The exact
|
---|
| 2601 | incantation depends on your operating system, "prof perl.Counts" in
|
---|
| 2602 | IRIX, and "prof -pixie -all -L. perl" in Tru64.
|
---|
| 2603 |
|
---|
| 2604 | In IRIX the following prof options are available:
|
---|
| 2605 |
|
---|
| 2606 | =over 4
|
---|
| 2607 |
|
---|
| 2608 | =item -h
|
---|
| 2609 |
|
---|
| 2610 | Reports the most heavily used lines in descending order of use.
|
---|
| 2611 | Useful for finding the hotspot lines.
|
---|
| 2612 |
|
---|
| 2613 | =item -l
|
---|
| 2614 |
|
---|
| 2615 | Groups lines by procedure, with procedures sorted in descending order of use.
|
---|
| 2616 | Within a procedure, lines are listed in source order.
|
---|
| 2617 | Useful for finding the hotspots of procedures.
|
---|
| 2618 |
|
---|
| 2619 | =back
|
---|
| 2620 |
|
---|
| 2621 | In Tru64 the following options are available:
|
---|
| 2622 |
|
---|
| 2623 | =over 4
|
---|
| 2624 |
|
---|
| 2625 | =item -p[rocedures]
|
---|
| 2626 |
|
---|
| 2627 | Procedures sorted in descending order by the number of cycles executed
|
---|
| 2628 | in each procedure. Useful for finding the hotspot procedures.
|
---|
| 2629 | (This is the default option.)
|
---|
| 2630 |
|
---|
| 2631 | =item -h[eavy]
|
---|
| 2632 |
|
---|
| 2633 | Lines sorted in descending order by the number of cycles executed in
|
---|
| 2634 | each line. Useful for finding the hotspot lines.
|
---|
| 2635 |
|
---|
| 2636 | =item -i[nvocations]
|
---|
| 2637 |
|
---|
| 2638 | The called procedures are sorted in descending order by number of calls
|
---|
| 2639 | made to the procedures. Useful for finding the most used procedures.
|
---|
| 2640 |
|
---|
| 2641 | =item -l[ines]
|
---|
| 2642 |
|
---|
| 2643 | Grouped by procedure, sorted by cycles executed per procedure.
|
---|
| 2644 | Useful for finding the hotspots of procedures.
|
---|
| 2645 |
|
---|
| 2646 | =item -testcoverage
|
---|
| 2647 |
|
---|
| 2648 | The compiler emitted code for these lines, but the code was unexecuted.
|
---|
| 2649 |
|
---|
| 2650 | =item -z[ero]
|
---|
| 2651 |
|
---|
| 2652 | Unexecuted procedures.
|
---|
| 2653 |
|
---|
| 2654 | =back
|
---|
| 2655 |
|
---|
| 2656 | For further information, see your system's manual pages for pixie and prof.
|
---|
| 2657 |
|
---|
| 2658 | =head2 Miscellaneous tricks
|
---|
| 2659 |
|
---|
| 2660 | =over 4
|
---|
| 2661 |
|
---|
| 2662 | =item *
|
---|
| 2663 |
|
---|
| 2664 | Those debugging perl with the DDD frontend over gdb may find the
|
---|
| 2665 | following useful:
|
---|
| 2666 |
|
---|
| 2667 | You can extend the data conversion shortcuts menu, so for example you
|
---|
| 2668 | can display an SV's IV value with one click, without doing any typing.
|
---|
| 2669 | To do that simply edit ~/.ddd/init file and add after:
|
---|
| 2670 |
|
---|
| 2671 | ! Display shortcuts.
|
---|
| 2672 | Ddd*gdbDisplayShortcuts: \
|
---|
| 2673 | /t () // Convert to Bin\n\
|
---|
| 2674 | /d () // Convert to Dec\n\
|
---|
| 2675 | /x () // Convert to Hex\n\
|
---|
| 2676 | /o () // Convert to Oct(\n\
|
---|
| 2677 |
|
---|
| 2678 | the following two lines:
|
---|
| 2679 |
|
---|
| 2680 | ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\
|
---|
| 2681 | ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
|
---|
| 2682 |
|
---|
| 2683 | so now you can do ivx and pvx lookups or you can plug there the
|
---|
| 2684 | sv_peek "conversion":
|
---|
| 2685 |
|
---|
| 2686 | Perl_sv_peek(my_perl, (SV*)()) // sv_peek
|
---|
| 2687 |
|
---|
| 2688 | (The my_perl is for threaded builds.)
|
---|
| 2689 | Just remember that every line, but the last one, should end with \n\
|
---|
| 2690 |
|
---|
| 2691 | Alternatively edit the init file interactively via:
|
---|
| 2692 | 3rd mouse button -> New Display -> Edit Menu
|
---|
| 2693 |
|
---|
| 2694 | Note: you can define up to 20 conversion shortcuts in the gdb
|
---|
| 2695 | section.
|
---|
| 2696 |
|
---|
| 2697 | =item *
|
---|
| 2698 |
|
---|
| 2699 | If you see in a debugger a memory area mysteriously full of 0xabababab,
|
---|
| 2700 | you may be seeing the effect of the Poison() macro, see L<perlclib>.
|
---|
| 2701 |
|
---|
| 2702 | =back
|
---|
| 2703 |
|
---|
| 2704 | =head2 CONCLUSION
|
---|
| 2705 |
|
---|
| 2706 | We've had a brief look around the Perl source, an overview of the stages
|
---|
| 2707 | F<perl> goes through when it's running your code, and how to use a
|
---|
| 2708 | debugger to poke at the Perl guts. We took a very simple problem and
|
---|
| 2709 | demonstrated how to solve it fully - with documentation, regression
|
---|
| 2710 | tests, and finally a patch for submission to p5p. Finally, we talked
|
---|
| 2711 | about how to use external tools to debug and test Perl.
|
---|
| 2712 |
|
---|
| 2713 | I'd now suggest you read over those references again, and then, as soon
|
---|
| 2714 | as possible, get your hands dirty. The best way to learn is by doing,
|
---|
| 2715 | so:
|
---|
| 2716 |
|
---|
| 2717 | =over 3
|
---|
| 2718 |
|
---|
| 2719 | =item *
|
---|
| 2720 |
|
---|
| 2721 | Subscribe to perl5-porters, follow the patches and try and understand
|
---|
| 2722 | them; don't be afraid to ask if there's a portion you're not clear on -
|
---|
| 2723 | who knows, you may unearth a bug in the patch...
|
---|
| 2724 |
|
---|
| 2725 | =item *
|
---|
| 2726 |
|
---|
| 2727 | Keep up to date with the bleeding edge Perl distributions and get
|
---|
| 2728 | familiar with the changes. Try and get an idea of what areas people are
|
---|
| 2729 | working on and the changes they're making.
|
---|
| 2730 |
|
---|
| 2731 | =item *
|
---|
| 2732 |
|
---|
| 2733 | Do read the README associated with your operating system, e.g. README.aix
|
---|
| 2734 | on the IBM AIX OS. Don't hesitate to supply patches to that README if
|
---|
| 2735 | you find anything missing or changed over a new OS release.
|
---|
| 2736 |
|
---|
| 2737 | =item *
|
---|
| 2738 |
|
---|
| 2739 | Find an area of Perl that seems interesting to you, and see if you can
|
---|
| 2740 | work out how it works. Scan through the source, and step over it in the
|
---|
| 2741 | debugger. Play, poke, investigate, fiddle! You'll probably get to
|
---|
| 2742 | understand not just your chosen area but a much wider range of F<perl>'s
|
---|
| 2743 | activity as well, and probably sooner than you'd think.
|
---|
| 2744 |
|
---|
| 2745 | =back
|
---|
| 2746 |
|
---|
| 2747 | =over 3
|
---|
| 2748 |
|
---|
| 2749 | =item I<The Road goes ever on and on, down from the door where it began.>
|
---|
| 2750 |
|
---|
| 2751 | =back
|
---|
| 2752 |
|
---|
| 2753 | If you can do these things, you've started on the long road to Perl porting.
|
---|
| 2754 | Thanks for wanting to help make Perl better - and happy hacking!
|
---|
| 2755 |
|
---|
| 2756 | =head1 AUTHOR
|
---|
| 2757 |
|
---|
| 2758 | This document was written by Nathan Torkington, and is maintained by
|
---|
| 2759 | the perl5-porters mailing list.
|
---|
| 2760 |
|
---|