1 | =head1 NAME
|
---|
2 |
|
---|
3 | perlcompile - Introduction to the Perl Compiler-Translator
|
---|
4 |
|
---|
5 | =head1 DESCRIPTION
|
---|
6 |
|
---|
7 | Perl has always had a compiler: your source is compiled into an
|
---|
8 | internal form (a parse tree) which is then optimized before being
|
---|
9 | run. Since version 5.005, Perl has shipped with a module
|
---|
10 | capable of inspecting the optimized parse tree (C<B>), and this has
|
---|
11 | been used to write many useful utilities, including a module that lets
|
---|
12 | you turn your Perl into C source code that can be compiled into a
|
---|
13 | native executable.
|
---|
14 |
|
---|
15 | The C<B> module provides access to the parse tree, and other modules
|
---|
16 | ("back ends") do things with the tree. Some write it out as
|
---|
17 | bytecode, C source code, or a semi-human-readable text. Another
|
---|
18 | traverses the parse tree to build a cross-reference of which
|
---|
19 | subroutines, formats, and variables are used where. Another checks
|
---|
20 | your code for dubious constructs. Yet another back end dumps the
|
---|
21 | parse tree back out as Perl source, acting as a source code beautifier
|
---|
22 | or deobfuscator.
|
---|
23 |
|
---|
24 | Because its original purpose was to be a way to produce C code
|
---|
25 | corresponding to a Perl program, and in turn a native executable, the
|
---|
26 | C<B> module and its associated back ends are known as "the
|
---|
27 | compiler", even though they don't really compile anything.
|
---|
28 | Different parts of the compiler are more accurately a "translator",
|
---|
29 | or an "inspector", but people want Perl to have a "compiler
|
---|
30 | option" not an "inspector gadget". What can you do?
|
---|
31 |
|
---|
32 | This document covers the use of the Perl compiler: which modules
|
---|
33 | it comprises, how to use the most important of the back end modules,
|
---|
34 | what problems there are, and how to work around them.
|
---|
35 |
|
---|
36 | =head2 Layout
|
---|
37 |
|
---|
38 | The compiler back ends are in the C<B::> hierarchy, and the front-end
|
---|
39 | (the module that you, the user of the compiler, will sometimes
|
---|
40 | interact with) is the O module. Some back ends (e.g., C<B::C>) have
|
---|
41 | programs (e.g., I<perlcc>) to hide the modules' complexity.
|
---|
42 |
|
---|
43 | Here are the important back ends to know about, with their status
|
---|
44 | expressed as a number from 0 (outline for later implementation) to
|
---|
45 | 10 (if there's a bug in it, we're very surprised):
|
---|
46 |
|
---|
47 | =over 4
|
---|
48 |
|
---|
49 | =item B::Bytecode
|
---|
50 |
|
---|
51 | Stores the parse tree in a machine-independent format, suitable
|
---|
52 | for later reloading through the ByteLoader module. Status: 5 (some
|
---|
53 | things work, some things don't, some things are untested).
|
---|
54 |
|
---|
55 | =item B::C
|
---|
56 |
|
---|
57 | Creates a C source file containing code to rebuild the parse tree
|
---|
58 | and resume the interpreter. Status: 6 (many things work adequately,
|
---|
59 | including programs using Tk).
|
---|
60 |
|
---|
61 | =item B::CC
|
---|
62 |
|
---|
63 | Creates a C source file corresponding to the run time code path in
|
---|
64 | the parse tree. This is the closest to a Perl-to-C translator there
|
---|
65 | is, but the code it generates is almost incomprehensible because it
|
---|
66 | translates the parse tree into a giant switch structure that
|
---|
67 | manipulates Perl structures. Eventual goal is to reduce (given
|
---|
68 | sufficient type information in the Perl program) some of the
|
---|
69 | Perl data structure manipulations into manipulations of C-level
|
---|
70 | ints, floats, etc. Status: 5 (some things work, including
|
---|
71 | uncomplicated Tk examples).
|
---|
72 |
|
---|
73 | =item B::Lint
|
---|
74 |
|
---|
75 | Complains if it finds dubious constructs in your source code. Status:
|
---|
76 | 6 (it works adequately, but only has a very limited number of areas
|
---|
77 | that it checks).
|
---|
78 |
|
---|
79 | =item B::Deparse
|
---|
80 |
|
---|
81 | Recreates the Perl source, making an attempt to format it coherently.
|
---|
82 | Status: 8 (it works nicely, but a few obscure things are missing).
|
---|
83 |
|
---|
84 | =item B::Xref
|
---|
85 |
|
---|
86 | Reports on the declaration and use of subroutines and variables.
|
---|
87 | Status: 8 (it works nicely, but still has a few lingering bugs).
|
---|
88 |
|
---|
89 | =back
|
---|
90 |
|
---|
91 | =head1 Using The Back Ends
|
---|
92 |
|
---|
93 | The following sections describe how to use the various compiler back
|
---|
94 | ends. They're presented roughly in order of maturity, so that the
|
---|
95 | most stable and proven back ends are described first, and the most
|
---|
96 | experimental and incomplete back ends are described last.
|
---|
97 |
|
---|
98 | The O module automatically enabled the B<-c> flag to Perl, which
|
---|
99 | prevents Perl from executing your code once it has been compiled.
|
---|
100 | This is why all the back ends print:
|
---|
101 |
|
---|
102 | myperlprogram syntax OK
|
---|
103 |
|
---|
104 | before producing any other output.
|
---|
105 |
|
---|
106 | =head2 The Cross Referencing Back End
|
---|
107 |
|
---|
108 | The cross referencing back end (B::Xref) produces a report on your program,
|
---|
109 | breaking down declarations and uses of subroutines and variables (and
|
---|
110 | formats) by file and subroutine. For instance, here's part of the
|
---|
111 | report from the I<pod2man> program that comes with Perl:
|
---|
112 |
|
---|
113 | Subroutine clear_noremap
|
---|
114 | Package (lexical)
|
---|
115 | $ready_to_print i1069, 1079
|
---|
116 | Package main
|
---|
117 | $& 1086
|
---|
118 | $. 1086
|
---|
119 | $0 1086
|
---|
120 | $1 1087
|
---|
121 | $2 1085, 1085
|
---|
122 | $3 1085, 1085
|
---|
123 | $ARGV 1086
|
---|
124 | %HTML_Escapes 1085, 1085
|
---|
125 |
|
---|
126 | This shows the variables used in the subroutine C<clear_noremap>. The
|
---|
127 | variable C<$ready_to_print> is a my() (lexical) variable,
|
---|
128 | B<i>ntroduced (first declared with my()) on line 1069, and used on
|
---|
129 | line 1079. The variable C<$&> from the main package is used on 1086,
|
---|
130 | and so on.
|
---|
131 |
|
---|
132 | A line number may be prefixed by a single letter:
|
---|
133 |
|
---|
134 | =over 4
|
---|
135 |
|
---|
136 | =item i
|
---|
137 |
|
---|
138 | Lexical variable introduced (declared with my()) for the first time.
|
---|
139 |
|
---|
140 | =item &
|
---|
141 |
|
---|
142 | Subroutine or method call.
|
---|
143 |
|
---|
144 | =item s
|
---|
145 |
|
---|
146 | Subroutine defined.
|
---|
147 |
|
---|
148 | =item r
|
---|
149 |
|
---|
150 | Format defined.
|
---|
151 |
|
---|
152 | =back
|
---|
153 |
|
---|
154 | The most useful option the cross referencer has is to save the report
|
---|
155 | to a separate file. For instance, to save the report on
|
---|
156 | I<myperlprogram> to the file I<report>:
|
---|
157 |
|
---|
158 | $ perl -MO=Xref,-oreport myperlprogram
|
---|
159 |
|
---|
160 | =head2 The Decompiling Back End
|
---|
161 |
|
---|
162 | The Deparse back end turns your Perl source back into Perl source. It
|
---|
163 | can reformat along the way, making it useful as a de-obfuscator. The
|
---|
164 | most basic way to use it is:
|
---|
165 |
|
---|
166 | $ perl -MO=Deparse myperlprogram
|
---|
167 |
|
---|
168 | You'll notice immediately that Perl has no idea of how to paragraph
|
---|
169 | your code. You'll have to separate chunks of code from each other
|
---|
170 | with newlines by hand. However, watch what it will do with
|
---|
171 | one-liners:
|
---|
172 |
|
---|
173 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0
|
---|
174 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op;
|
---|
175 | die$@ if$@; rename$was,$_ unless$was eq $_}'
|
---|
176 | -e syntax OK
|
---|
177 | $op = shift @ARGV || die("usage: $0 code [...]");
|
---|
178 | chomp(@ARGV = <ARGV>) unless @ARGV;
|
---|
179 | foreach $_ (@ARGV) {
|
---|
180 | $was = $_;
|
---|
181 | eval $op;
|
---|
182 | die $@ if $@;
|
---|
183 | rename $was, $_ unless $was eq $_;
|
---|
184 | }
|
---|
185 |
|
---|
186 | The decompiler has several options for the code it generates. For
|
---|
187 | instance, you can set the size of each indent from 4 (as above) to
|
---|
188 | 2 with:
|
---|
189 |
|
---|
190 | $ perl -MO=Deparse,-si2 myperlprogram
|
---|
191 |
|
---|
192 | The B<-p> option adds parentheses where normally they are omitted:
|
---|
193 |
|
---|
194 | $ perl -MO=Deparse -e 'print "Hello, world\n"'
|
---|
195 | -e syntax OK
|
---|
196 | print "Hello, world\n";
|
---|
197 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"'
|
---|
198 | -e syntax OK
|
---|
199 | print("Hello, world\n");
|
---|
200 |
|
---|
201 | See L<B::Deparse> for more information on the formatting options.
|
---|
202 |
|
---|
203 | =head2 The Lint Back End
|
---|
204 |
|
---|
205 | The lint back end (B::Lint) inspects programs for poor style. One
|
---|
206 | programmer's bad style is another programmer's useful tool, so options
|
---|
207 | let you select what is complained about.
|
---|
208 |
|
---|
209 | To run the style checker across your source code:
|
---|
210 |
|
---|
211 | $ perl -MO=Lint myperlprogram
|
---|
212 |
|
---|
213 | To disable context checks and undefined subroutines:
|
---|
214 |
|
---|
215 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram
|
---|
216 |
|
---|
217 | See L<B::Lint> for information on the options.
|
---|
218 |
|
---|
219 | =head2 The Simple C Back End
|
---|
220 |
|
---|
221 | This module saves the internal compiled state of your Perl program
|
---|
222 | to a C source file, which can be turned into a native executable
|
---|
223 | for that particular platform using a C compiler. The resulting
|
---|
224 | program links against the Perl interpreter library, so it
|
---|
225 | will not save you disk space (unless you build Perl with a shared
|
---|
226 | library) or program size. It may, however, save you startup time.
|
---|
227 |
|
---|
228 | The C<perlcc> tool generates such executables by default.
|
---|
229 |
|
---|
230 | perlcc myperlprogram.pl
|
---|
231 |
|
---|
232 | =head2 The Bytecode Back End
|
---|
233 |
|
---|
234 | This back end is only useful if you also have a way to load and
|
---|
235 | execute the bytecode that it produces. The ByteLoader module provides
|
---|
236 | this functionality.
|
---|
237 |
|
---|
238 | To turn a Perl program into executable byte code, you can use C<perlcc>
|
---|
239 | with the C<-B> switch:
|
---|
240 |
|
---|
241 | perlcc -B myperlprogram.pl
|
---|
242 |
|
---|
243 | The byte code is machine independent, so once you have a compiled
|
---|
244 | module or program, it is as portable as Perl source (assuming that
|
---|
245 | the user of the module or program has a modern-enough Perl interpreter
|
---|
246 | to decode the byte code).
|
---|
247 |
|
---|
248 | See B<B::Bytecode> for information on options to control the
|
---|
249 | optimization and nature of the code generated by the Bytecode module.
|
---|
250 |
|
---|
251 | =head2 The Optimized C Back End
|
---|
252 |
|
---|
253 | The optimized C back end will turn your Perl program's run time
|
---|
254 | code-path into an equivalent (but optimized) C program that manipulates
|
---|
255 | the Perl data structures directly. The program will still link against
|
---|
256 | the Perl interpreter library, to allow for eval(), C<s///e>,
|
---|
257 | C<require>, etc.
|
---|
258 |
|
---|
259 | The C<perlcc> tool generates such executables when using the -O
|
---|
260 | switch. To compile a Perl program (ending in C<.pl>
|
---|
261 | or C<.p>):
|
---|
262 |
|
---|
263 | perlcc -O myperlprogram.pl
|
---|
264 |
|
---|
265 | To produce a shared library from a Perl module (ending in C<.pm>):
|
---|
266 |
|
---|
267 | perlcc -O Myperlmodule.pm
|
---|
268 |
|
---|
269 | For more information, see L<perlcc> and L<B::CC>.
|
---|
270 |
|
---|
271 | =head1 Module List for the Compiler Suite
|
---|
272 |
|
---|
273 | =over 4
|
---|
274 |
|
---|
275 | =item B
|
---|
276 |
|
---|
277 | This module is the introspective ("reflective" in Java terms)
|
---|
278 | module, which allows a Perl program to inspect its innards. The
|
---|
279 | back end modules all use this module to gain access to the compiled
|
---|
280 | parse tree. You, the user of a back end module, will not need to
|
---|
281 | interact with B.
|
---|
282 |
|
---|
283 | =item O
|
---|
284 |
|
---|
285 | This module is the front-end to the compiler's back ends. Normally
|
---|
286 | called something like this:
|
---|
287 |
|
---|
288 | $ perl -MO=Deparse myperlprogram
|
---|
289 |
|
---|
290 | This is like saying C<use O 'Deparse'> in your Perl program.
|
---|
291 |
|
---|
292 | =item B::Asmdata
|
---|
293 |
|
---|
294 | This module is used by the B::Assembler module, which is in turn used
|
---|
295 | by the B::Bytecode module, which stores a parse-tree as
|
---|
296 | bytecode for later loading. It's not a back end itself, but rather a
|
---|
297 | component of a back end.
|
---|
298 |
|
---|
299 | =item B::Assembler
|
---|
300 |
|
---|
301 | This module turns a parse-tree into data suitable for storing
|
---|
302 | and later decoding back into a parse-tree. It's not a back end
|
---|
303 | itself, but rather a component of a back end. It's used by the
|
---|
304 | I<assemble> program that produces bytecode.
|
---|
305 |
|
---|
306 | =item B::Bblock
|
---|
307 |
|
---|
308 | This module is used by the B::CC back end. It walks "basic blocks".
|
---|
309 | A basic block is a series of operations which is known to execute from
|
---|
310 | start to finish, with no possibility of branching or halting.
|
---|
311 |
|
---|
312 | =item B::Bytecode
|
---|
313 |
|
---|
314 | This module is a back end that generates bytecode from a
|
---|
315 | program's parse tree. This bytecode is written to a file, from where
|
---|
316 | it can later be reconstructed back into a parse tree. The goal is to
|
---|
317 | do the expensive program compilation once, save the interpreter's
|
---|
318 | state into a file, and then restore the state from the file when the
|
---|
319 | program is to be executed. See L</"The Bytecode Back End">
|
---|
320 | for details about usage.
|
---|
321 |
|
---|
322 | =item B::C
|
---|
323 |
|
---|
324 | This module writes out C code corresponding to the parse tree and
|
---|
325 | other interpreter internal structures. You compile the corresponding
|
---|
326 | C file, and get an executable file that will restore the internal
|
---|
327 | structures and the Perl interpreter will begin running the
|
---|
328 | program. See L</"The Simple C Back End"> for details about usage.
|
---|
329 |
|
---|
330 | =item B::CC
|
---|
331 |
|
---|
332 | This module writes out C code corresponding to your program's
|
---|
333 | operations. Unlike the B::C module, which merely stores the
|
---|
334 | interpreter and its state in a C program, the B::CC module makes a
|
---|
335 | C program that does not involve the interpreter. As a consequence,
|
---|
336 | programs translated into C by B::CC can execute faster than normal
|
---|
337 | interpreted programs. See L</"The Optimized C Back End"> for
|
---|
338 | details about usage.
|
---|
339 |
|
---|
340 | =item B::Concise
|
---|
341 |
|
---|
342 | This module prints a concise (but complete) version of the Perl parse
|
---|
343 | tree. Its output is more customizable than the one of B::Terse or
|
---|
344 | B::Debug (and it can emulate them). This module useful for people who
|
---|
345 | are writing their own back end, or who are learning about the Perl
|
---|
346 | internals. It's not useful to the average programmer.
|
---|
347 |
|
---|
348 | =item B::Debug
|
---|
349 |
|
---|
350 | This module dumps the Perl parse tree in verbose detail to STDOUT.
|
---|
351 | It's useful for people who are writing their own back end, or who
|
---|
352 | are learning about the Perl internals. It's not useful to the
|
---|
353 | average programmer.
|
---|
354 |
|
---|
355 | =item B::Deparse
|
---|
356 |
|
---|
357 | This module produces Perl source code from the compiled parse tree.
|
---|
358 | It is useful in debugging and deconstructing other people's code,
|
---|
359 | also as a pretty-printer for your own source. See
|
---|
360 | L</"The Decompiling Back End"> for details about usage.
|
---|
361 |
|
---|
362 | =item B::Disassembler
|
---|
363 |
|
---|
364 | This module turns bytecode back into a parse tree. It's not a back
|
---|
365 | end itself, but rather a component of a back end. It's used by the
|
---|
366 | I<disassemble> program that comes with the bytecode.
|
---|
367 |
|
---|
368 | =item B::Lint
|
---|
369 |
|
---|
370 | This module inspects the compiled form of your source code for things
|
---|
371 | which, while some people frown on them, aren't necessarily bad enough
|
---|
372 | to justify a warning. For instance, use of an array in scalar context
|
---|
373 | without explicitly saying C<scalar(@array)> is something that Lint
|
---|
374 | can identify. See L</"The Lint Back End"> for details about usage.
|
---|
375 |
|
---|
376 | =item B::Showlex
|
---|
377 |
|
---|
378 | This module prints out the my() variables used in a function or a
|
---|
379 | file. To get a list of the my() variables used in the subroutine
|
---|
380 | mysub() defined in the file myperlprogram:
|
---|
381 |
|
---|
382 | $ perl -MO=Showlex,mysub myperlprogram
|
---|
383 |
|
---|
384 | To get a list of the my() variables used in the file myperlprogram:
|
---|
385 |
|
---|
386 | $ perl -MO=Showlex myperlprogram
|
---|
387 |
|
---|
388 | [BROKEN]
|
---|
389 |
|
---|
390 | =item B::Stackobj
|
---|
391 |
|
---|
392 | This module is used by the B::CC module. It's not a back end itself,
|
---|
393 | but rather a component of a back end.
|
---|
394 |
|
---|
395 | =item B::Stash
|
---|
396 |
|
---|
397 | This module is used by the L<perlcc> program, which compiles a module
|
---|
398 | into an executable. B::Stash prints the symbol tables in use by a
|
---|
399 | program, and is used to prevent B::CC from producing C code for the
|
---|
400 | B::* and O modules. It's not a back end itself, but rather a
|
---|
401 | component of a back end.
|
---|
402 |
|
---|
403 | =item B::Terse
|
---|
404 |
|
---|
405 | This module prints the contents of the parse tree, but without as much
|
---|
406 | information as B::Debug. For comparison, C<print "Hello, world.">
|
---|
407 | produced 96 lines of output from B::Debug, but only 6 from B::Terse.
|
---|
408 |
|
---|
409 | This module is useful for people who are writing their own back end,
|
---|
410 | or who are learning about the Perl internals. It's not useful to the
|
---|
411 | average programmer.
|
---|
412 |
|
---|
413 | =item B::Xref
|
---|
414 |
|
---|
415 | This module prints a report on where the variables, subroutines, and
|
---|
416 | formats are defined and used within a program and the modules it
|
---|
417 | loads. See L</"The Cross Referencing Back End"> for details about
|
---|
418 | usage.
|
---|
419 |
|
---|
420 | =back
|
---|
421 |
|
---|
422 | =head1 KNOWN PROBLEMS
|
---|
423 |
|
---|
424 | The simple C backend currently only saves typeglobs with alphanumeric
|
---|
425 | names.
|
---|
426 |
|
---|
427 | The optimized C backend outputs code for more modules than it should
|
---|
428 | (e.g., DirHandle). It also has little hope of properly handling
|
---|
429 | C<goto LABEL> outside the running subroutine (C<goto &sub> is okay).
|
---|
430 | C<goto LABEL> currently does not work at all in this backend.
|
---|
431 | It also creates a huge initialization function that gives
|
---|
432 | C compilers headaches. Splitting the initialization function gives
|
---|
433 | better results. Other problems include: unsigned math does not
|
---|
434 | work correctly; some opcodes are handled incorrectly by default
|
---|
435 | opcode handling mechanism.
|
---|
436 |
|
---|
437 | BEGIN{} blocks are executed while compiling your code. Any external
|
---|
438 | state that is initialized in BEGIN{}, such as opening files, initiating
|
---|
439 | database connections etc., do not behave properly. To work around
|
---|
440 | this, Perl has an INIT{} block that corresponds to code being executed
|
---|
441 | before your program begins running but after your program has finished
|
---|
442 | being compiled. Execution order: BEGIN{}, (possible save of state
|
---|
443 | through compiler back-end), INIT{}, program runs, END{}.
|
---|
444 |
|
---|
445 | =head1 AUTHOR
|
---|
446 |
|
---|
447 | This document was originally written by Nathan Torkington, and is now
|
---|
448 | maintained by the perl5-porters mailing list
|
---|
449 | I<[email protected]>.
|
---|
450 |
|
---|
451 | =cut
|
---|