1 | =head1 NAME
|
---|
2 |
|
---|
3 | perlsec - Perl security
|
---|
4 |
|
---|
5 | =head1 DESCRIPTION
|
---|
6 |
|
---|
7 | Perl is designed to make it easy to program securely even when running
|
---|
8 | with extra privileges, like setuid or setgid programs. Unlike most
|
---|
9 | command line shells, which are based on multiple substitution passes on
|
---|
10 | each line of the script, Perl uses a more conventional evaluation scheme
|
---|
11 | with fewer hidden snags. Additionally, because the language has more
|
---|
12 | builtin functionality, it can rely less upon external (and possibly
|
---|
13 | untrustworthy) programs to accomplish its purposes.
|
---|
14 |
|
---|
15 | Perl automatically enables a set of special security checks, called I<taint
|
---|
16 | mode>, when it detects its program running with differing real and effective
|
---|
17 | user or group IDs. The setuid bit in Unix permissions is mode 04000, the
|
---|
18 | setgid bit mode 02000; either or both may be set. You can also enable taint
|
---|
19 | mode explicitly by using the B<-T> command line flag. This flag is
|
---|
20 | I<strongly> suggested for server programs and any program run on behalf of
|
---|
21 | someone else, such as a CGI script. Once taint mode is on, it's on for
|
---|
22 | the remainder of your script.
|
---|
23 |
|
---|
24 | While in this mode, Perl takes special precautions called I<taint
|
---|
25 | checks> to prevent both obvious and subtle traps. Some of these checks
|
---|
26 | are reasonably simple, such as verifying that path directories aren't
|
---|
27 | writable by others; careful programmers have always used checks like
|
---|
28 | these. Other checks, however, are best supported by the language itself,
|
---|
29 | and it is these checks especially that contribute to making a set-id Perl
|
---|
30 | program more secure than the corresponding C program.
|
---|
31 |
|
---|
32 | You may not use data derived from outside your program to affect
|
---|
33 | something else outside your program--at least, not by accident. All
|
---|
34 | command line arguments, environment variables, locale information (see
|
---|
35 | L<perllocale>), results of certain system calls (C<readdir()>,
|
---|
36 | C<readlink()>, the variable of C<shmread()>, the messages returned by
|
---|
37 | C<msgrcv()>, the password, gcos and shell fields returned by the
|
---|
38 | C<getpwxxx()> calls), and all file input are marked as "tainted".
|
---|
39 | Tainted data may not be used directly or indirectly in any command
|
---|
40 | that invokes a sub-shell, nor in any command that modifies files,
|
---|
41 | directories, or processes, B<with the following exceptions>:
|
---|
42 |
|
---|
43 | =over 4
|
---|
44 |
|
---|
45 | =item *
|
---|
46 |
|
---|
47 | Arguments to C<print> and C<syswrite> are B<not> checked for taintedness.
|
---|
48 |
|
---|
49 | =item *
|
---|
50 |
|
---|
51 | Symbolic methods
|
---|
52 |
|
---|
53 | $obj->$method(@args);
|
---|
54 |
|
---|
55 | and symbolic sub references
|
---|
56 |
|
---|
57 | &{$foo}(@args);
|
---|
58 | $foo->(@args);
|
---|
59 |
|
---|
60 | are not checked for taintedness. This requires extra carefulness
|
---|
61 | unless you want external data to affect your control flow. Unless
|
---|
62 | you carefully limit what these symbolic values are, people are able
|
---|
63 | to call functions B<outside> your Perl code, such as POSIX::system,
|
---|
64 | in which case they are able to run arbitrary external code.
|
---|
65 |
|
---|
66 | =back
|
---|
67 |
|
---|
68 | For efficiency reasons, Perl takes a conservative view of
|
---|
69 | whether data is tainted. If an expression contains tainted data,
|
---|
70 | any subexpression may be considered tainted, even if the value
|
---|
71 | of the subexpression is not itself affected by the tainted data.
|
---|
72 |
|
---|
73 | Because taintedness is associated with each scalar value, some
|
---|
74 | elements of an array or hash can be tainted and others not.
|
---|
75 | The keys of a hash are never tainted.
|
---|
76 |
|
---|
77 | For example:
|
---|
78 |
|
---|
79 | $arg = shift; # $arg is tainted
|
---|
80 | $hid = $arg, 'bar'; # $hid is also tainted
|
---|
81 | $line = <>; # Tainted
|
---|
82 | $line = <STDIN>; # Also tainted
|
---|
83 | open FOO, "/home/me/bar" or die $!;
|
---|
84 | $line = <FOO>; # Still tainted
|
---|
85 | $path = $ENV{'PATH'}; # Tainted, but see below
|
---|
86 | $data = 'abc'; # Not tainted
|
---|
87 |
|
---|
88 | system "echo $arg"; # Insecure
|
---|
89 | system "/bin/echo", $arg; # Considered insecure
|
---|
90 | # (Perl doesn't know about /bin/echo)
|
---|
91 | system "echo $hid"; # Insecure
|
---|
92 | system "echo $data"; # Insecure until PATH set
|
---|
93 |
|
---|
94 | $path = $ENV{'PATH'}; # $path now tainted
|
---|
95 |
|
---|
96 | $ENV{'PATH'} = '/bin:/usr/bin';
|
---|
97 | delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
|
---|
98 |
|
---|
99 | $path = $ENV{'PATH'}; # $path now NOT tainted
|
---|
100 | system "echo $data"; # Is secure now!
|
---|
101 |
|
---|
102 | open(FOO, "< $arg"); # OK - read-only file
|
---|
103 | open(FOO, "> $arg"); # Not OK - trying to write
|
---|
104 |
|
---|
105 | open(FOO,"echo $arg|"); # Not OK
|
---|
106 | open(FOO,"-|")
|
---|
107 | or exec 'echo', $arg; # Also not OK
|
---|
108 |
|
---|
109 | $shout = `echo $arg`; # Insecure, $shout now tainted
|
---|
110 |
|
---|
111 | unlink $data, $arg; # Insecure
|
---|
112 | umask $arg; # Insecure
|
---|
113 |
|
---|
114 | exec "echo $arg"; # Insecure
|
---|
115 | exec "echo", $arg; # Insecure
|
---|
116 | exec "sh", '-c', $arg; # Very insecure!
|
---|
117 |
|
---|
118 | @files = <*.c>; # insecure (uses readdir() or similar)
|
---|
119 | @files = glob('*.c'); # insecure (uses readdir() or similar)
|
---|
120 |
|
---|
121 | # In Perl releases older than 5.6.0 the <*.c> and glob('*.c') would
|
---|
122 | # have used an external program to do the filename expansion; but in
|
---|
123 | # either case the result is tainted since the list of filenames comes
|
---|
124 | # from outside of the program.
|
---|
125 |
|
---|
126 | $bad = ($arg, 23); # $bad will be tainted
|
---|
127 | $arg, `true`; # Insecure (although it isn't really)
|
---|
128 |
|
---|
129 | If you try to do something insecure, you will get a fatal error saying
|
---|
130 | something like "Insecure dependency" or "Insecure $ENV{PATH}".
|
---|
131 |
|
---|
132 | The exception to the principle of "one tainted value taints the whole
|
---|
133 | expression" is with the ternary conditional operator C<?:>. Since code
|
---|
134 | with a ternary conditional
|
---|
135 |
|
---|
136 | $result = $tainted_value ? "Untainted" : "Also untainted";
|
---|
137 |
|
---|
138 | is effectively
|
---|
139 |
|
---|
140 | if ( $tainted_value ) {
|
---|
141 | $result = "Untainted";
|
---|
142 | } else {
|
---|
143 | $result = "Also untainted";
|
---|
144 | }
|
---|
145 |
|
---|
146 | it doesn't make sense for C<$result> to be tainted.
|
---|
147 |
|
---|
148 | =head2 Laundering and Detecting Tainted Data
|
---|
149 |
|
---|
150 | To test whether a variable contains tainted data, and whose use would
|
---|
151 | thus trigger an "Insecure dependency" message, you can use the
|
---|
152 | C<tainted()> function of the Scalar::Util module, available in your
|
---|
153 | nearby CPAN mirror, and included in Perl starting from the release 5.8.0.
|
---|
154 | Or you may be able to use the following C<is_tainted()> function.
|
---|
155 |
|
---|
156 | sub is_tainted {
|
---|
157 | return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 };
|
---|
158 | }
|
---|
159 |
|
---|
160 | This function makes use of the fact that the presence of tainted data
|
---|
161 | anywhere within an expression renders the entire expression tainted. It
|
---|
162 | would be inefficient for every operator to test every argument for
|
---|
163 | taintedness. Instead, the slightly more efficient and conservative
|
---|
164 | approach is used that if any tainted value has been accessed within the
|
---|
165 | same expression, the whole expression is considered tainted.
|
---|
166 |
|
---|
167 | But testing for taintedness gets you only so far. Sometimes you have just
|
---|
168 | to clear your data's taintedness. Values may be untainted by using them
|
---|
169 | as keys in a hash; otherwise the only way to bypass the tainting
|
---|
170 | mechanism is by referencing subpatterns from a regular expression match.
|
---|
171 | Perl presumes that if you reference a substring using $1, $2, etc., that
|
---|
172 | you knew what you were doing when you wrote the pattern. That means using
|
---|
173 | a bit of thought--don't just blindly untaint anything, or you defeat the
|
---|
174 | entire mechanism. It's better to verify that the variable has only good
|
---|
175 | characters (for certain values of "good") rather than checking whether it
|
---|
176 | has any bad characters. That's because it's far too easy to miss bad
|
---|
177 | characters that you never thought of.
|
---|
178 |
|
---|
179 | Here's a test to make sure that the data contains nothing but "word"
|
---|
180 | characters (alphabetics, numerics, and underscores), a hyphen, an at sign,
|
---|
181 | or a dot.
|
---|
182 |
|
---|
183 | if ($data =~ /^([-\@\w.]+)$/) {
|
---|
184 | $data = $1; # $data now untainted
|
---|
185 | } else {
|
---|
186 | die "Bad data in '$data'"; # log this somewhere
|
---|
187 | }
|
---|
188 |
|
---|
189 | This is fairly secure because C</\w+/> doesn't normally match shell
|
---|
190 | metacharacters, nor are dot, dash, or at going to mean something special
|
---|
191 | to the shell. Use of C</.+/> would have been insecure in theory because
|
---|
192 | it lets everything through, but Perl doesn't check for that. The lesson
|
---|
193 | is that when untainting, you must be exceedingly careful with your patterns.
|
---|
194 | Laundering data using regular expression is the I<only> mechanism for
|
---|
195 | untainting dirty data, unless you use the strategy detailed below to fork
|
---|
196 | a child of lesser privilege.
|
---|
197 |
|
---|
198 | The example does not untaint C<$data> if C<use locale> is in effect,
|
---|
199 | because the characters matched by C<\w> are determined by the locale.
|
---|
200 | Perl considers that locale definitions are untrustworthy because they
|
---|
201 | contain data from outside the program. If you are writing a
|
---|
202 | locale-aware program, and want to launder data with a regular expression
|
---|
203 | containing C<\w>, put C<no locale> ahead of the expression in the same
|
---|
204 | block. See L<perllocale/SECURITY> for further discussion and examples.
|
---|
205 |
|
---|
206 | =head2 Switches On the "#!" Line
|
---|
207 |
|
---|
208 | When you make a script executable, in order to make it usable as a
|
---|
209 | command, the system will pass switches to perl from the script's #!
|
---|
210 | line. Perl checks that any command line switches given to a setuid
|
---|
211 | (or setgid) script actually match the ones set on the #! line. Some
|
---|
212 | Unix and Unix-like environments impose a one-switch limit on the #!
|
---|
213 | line, so you may need to use something like C<-wU> instead of C<-w -U>
|
---|
214 | under such systems. (This issue should arise only in Unix or
|
---|
215 | Unix-like environments that support #! and setuid or setgid scripts.)
|
---|
216 |
|
---|
217 | =head2 Taint mode and @INC
|
---|
218 |
|
---|
219 | When the taint mode (C<-T>) is in effect, the "." directory is removed
|
---|
220 | from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB>
|
---|
221 | are ignored by Perl. You can still adjust C<@INC> from outside the
|
---|
222 | program by using the C<-I> command line option as explained in
|
---|
223 | L<perlrun>. The two environment variables are ignored because
|
---|
224 | they are obscured, and a user running a program could be unaware that
|
---|
225 | they are set, whereas the C<-I> option is clearly visible and
|
---|
226 | therefore permitted.
|
---|
227 |
|
---|
228 | Another way to modify C<@INC> without modifying the program, is to use
|
---|
229 | the C<lib> pragma, e.g.:
|
---|
230 |
|
---|
231 | perl -Mlib=/foo program
|
---|
232 |
|
---|
233 | The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former
|
---|
234 | will automagically remove any duplicated directories, while the later
|
---|
235 | will not.
|
---|
236 |
|
---|
237 | Note that if a tainted string is added to C<@INC>, the following
|
---|
238 | problem will be reported:
|
---|
239 |
|
---|
240 | Insecure dependency in require while running with -T switch
|
---|
241 |
|
---|
242 | =head2 Cleaning Up Your Path
|
---|
243 |
|
---|
244 | For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to
|
---|
245 | a known value, and each directory in the path must be absolute and
|
---|
246 | non-writable by others than its owner and group. You may be surprised to
|
---|
247 | get this message even if the pathname to your executable is fully
|
---|
248 | qualified. This is I<not> generated because you didn't supply a full path
|
---|
249 | to the program; instead, it's generated because you never set your PATH
|
---|
250 | environment variable, or you didn't set it to something that was safe.
|
---|
251 | Because Perl can't guarantee that the executable in question isn't itself
|
---|
252 | going to turn around and execute some other program that is dependent on
|
---|
253 | your PATH, it makes sure you set the PATH.
|
---|
254 |
|
---|
255 | The PATH isn't the only environment variable which can cause problems.
|
---|
256 | Because some shells may use the variables IFS, CDPATH, ENV, and
|
---|
257 | BASH_ENV, Perl checks that those are either empty or untainted when
|
---|
258 | starting subprocesses. You may wish to add something like this to your
|
---|
259 | setid and taint-checking scripts.
|
---|
260 |
|
---|
261 | delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer
|
---|
262 |
|
---|
263 | It's also possible to get into trouble with other operations that don't
|
---|
264 | care whether they use tainted values. Make judicious use of the file
|
---|
265 | tests in dealing with any user-supplied filenames. When possible, do
|
---|
266 | opens and such B<after> properly dropping any special user (or group!)
|
---|
267 | privileges. Perl doesn't prevent you from opening tainted filenames for reading,
|
---|
268 | so be careful what you print out. The tainting mechanism is intended to
|
---|
269 | prevent stupid mistakes, not to remove the need for thought.
|
---|
270 |
|
---|
271 | Perl does not call the shell to expand wild cards when you pass C<system>
|
---|
272 | and C<exec> explicit parameter lists instead of strings with possible shell
|
---|
273 | wildcards in them. Unfortunately, the C<open>, C<glob>, and
|
---|
274 | backtick functions provide no such alternate calling convention, so more
|
---|
275 | subterfuge will be required.
|
---|
276 |
|
---|
277 | Perl provides a reasonably safe way to open a file or pipe from a setuid
|
---|
278 | or setgid program: just create a child process with reduced privilege who
|
---|
279 | does the dirty work for you. First, fork a child using the special
|
---|
280 | C<open> syntax that connects the parent and child by a pipe. Now the
|
---|
281 | child resets its ID set and any other per-process attributes, like
|
---|
282 | environment variables, umasks, current working directories, back to the
|
---|
283 | originals or known safe values. Then the child process, which no longer
|
---|
284 | has any special permissions, does the C<open> or other system call.
|
---|
285 | Finally, the child passes the data it managed to access back to the
|
---|
286 | parent. Because the file or pipe was opened in the child while running
|
---|
287 | under less privilege than the parent, it's not apt to be tricked into
|
---|
288 | doing something it shouldn't.
|
---|
289 |
|
---|
290 | Here's a way to do backticks reasonably safely. Notice how the C<exec> is
|
---|
291 | not called with a string that the shell could expand. This is by far the
|
---|
292 | best way to call something that might be subjected to shell escapes: just
|
---|
293 | never call the shell at all.
|
---|
294 |
|
---|
295 | use English '-no_match_vars';
|
---|
296 | die "Can't fork: $!" unless defined($pid = open(KID, "-|"));
|
---|
297 | if ($pid) { # parent
|
---|
298 | while (<KID>) {
|
---|
299 | # do something
|
---|
300 | }
|
---|
301 | close KID;
|
---|
302 | } else {
|
---|
303 | my @temp = ($EUID, $EGID);
|
---|
304 | my $orig_uid = $UID;
|
---|
305 | my $orig_gid = $GID;
|
---|
306 | $EUID = $UID;
|
---|
307 | $EGID = $GID;
|
---|
308 | # Drop privileges
|
---|
309 | $UID = $orig_uid;
|
---|
310 | $GID = $orig_gid;
|
---|
311 | # Make sure privs are really gone
|
---|
312 | ($EUID, $EGID) = @temp;
|
---|
313 | die "Can't drop privileges"
|
---|
314 | unless $UID == $EUID && $GID eq $EGID;
|
---|
315 | $ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH.
|
---|
316 | # Consider sanitizing the environment even more.
|
---|
317 | exec 'myprog', 'arg1', 'arg2'
|
---|
318 | or die "can't exec myprog: $!";
|
---|
319 | }
|
---|
320 |
|
---|
321 | A similar strategy would work for wildcard expansion via C<glob>, although
|
---|
322 | you can use C<readdir> instead.
|
---|
323 |
|
---|
324 | Taint checking is most useful when although you trust yourself not to have
|
---|
325 | written a program to give away the farm, you don't necessarily trust those
|
---|
326 | who end up using it not to try to trick it into doing something bad. This
|
---|
327 | is the kind of security checking that's useful for set-id programs and
|
---|
328 | programs launched on someone else's behalf, like CGI programs.
|
---|
329 |
|
---|
330 | This is quite different, however, from not even trusting the writer of the
|
---|
331 | code not to try to do something evil. That's the kind of trust needed
|
---|
332 | when someone hands you a program you've never seen before and says, "Here,
|
---|
333 | run this." For that kind of safety, check out the Safe module,
|
---|
334 | included standard in the Perl distribution. This module allows the
|
---|
335 | programmer to set up special compartments in which all system operations
|
---|
336 | are trapped and namespace access is carefully controlled.
|
---|
337 |
|
---|
338 | =head2 Security Bugs
|
---|
339 |
|
---|
340 | Beyond the obvious problems that stem from giving special privileges to
|
---|
341 | systems as flexible as scripts, on many versions of Unix, set-id scripts
|
---|
342 | are inherently insecure right from the start. The problem is a race
|
---|
343 | condition in the kernel. Between the time the kernel opens the file to
|
---|
344 | see which interpreter to run and when the (now-set-id) interpreter turns
|
---|
345 | around and reopens the file to interpret it, the file in question may have
|
---|
346 | changed, especially if you have symbolic links on your system.
|
---|
347 |
|
---|
348 | Fortunately, sometimes this kernel "feature" can be disabled.
|
---|
349 | Unfortunately, there are two ways to disable it. The system can simply
|
---|
350 | outlaw scripts with any set-id bit set, which doesn't help much.
|
---|
351 | Alternately, it can simply ignore the set-id bits on scripts. If the
|
---|
352 | latter is true, Perl can emulate the setuid and setgid mechanism when it
|
---|
353 | notices the otherwise useless setuid/gid bits on Perl scripts. It does
|
---|
354 | this via a special executable called F<suidperl> that is automatically
|
---|
355 | invoked for you if it's needed.
|
---|
356 |
|
---|
357 | However, if the kernel set-id script feature isn't disabled, Perl will
|
---|
358 | complain loudly that your set-id script is insecure. You'll need to
|
---|
359 | either disable the kernel set-id script feature, or put a C wrapper around
|
---|
360 | the script. A C wrapper is just a compiled program that does nothing
|
---|
361 | except call your Perl program. Compiled programs are not subject to the
|
---|
362 | kernel bug that plagues set-id scripts. Here's a simple wrapper, written
|
---|
363 | in C:
|
---|
364 |
|
---|
365 | #define REAL_PATH "/path/to/script"
|
---|
366 | main(ac, av)
|
---|
367 | char **av;
|
---|
368 | {
|
---|
369 | execv(REAL_PATH, av);
|
---|
370 | }
|
---|
371 |
|
---|
372 | Compile this wrapper into a binary executable and then make I<it> rather
|
---|
373 | than your script setuid or setgid.
|
---|
374 |
|
---|
375 | In recent years, vendors have begun to supply systems free of this
|
---|
376 | inherent security bug. On such systems, when the kernel passes the name
|
---|
377 | of the set-id script to open to the interpreter, rather than using a
|
---|
378 | pathname subject to meddling, it instead passes I</dev/fd/3>. This is a
|
---|
379 | special file already opened on the script, so that there can be no race
|
---|
380 | condition for evil scripts to exploit. On these systems, Perl should be
|
---|
381 | compiled with C<-DSETUID_SCRIPTS_ARE_SECURE_NOW>. The F<Configure>
|
---|
382 | program that builds Perl tries to figure this out for itself, so you
|
---|
383 | should never have to specify this yourself. Most modern releases of
|
---|
384 | SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition.
|
---|
385 |
|
---|
386 | Prior to release 5.6.1 of Perl, bugs in the code of F<suidperl> could
|
---|
387 | introduce a security hole.
|
---|
388 |
|
---|
389 | =head2 Protecting Your Programs
|
---|
390 |
|
---|
391 | There are a number of ways to hide the source to your Perl programs,
|
---|
392 | with varying levels of "security".
|
---|
393 |
|
---|
394 | First of all, however, you I<can't> take away read permission, because
|
---|
395 | the source code has to be readable in order to be compiled and
|
---|
396 | interpreted. (That doesn't mean that a CGI script's source is
|
---|
397 | readable by people on the web, though.) So you have to leave the
|
---|
398 | permissions at the socially friendly 0755 level. This lets
|
---|
399 | people on your local system only see your source.
|
---|
400 |
|
---|
401 | Some people mistakenly regard this as a security problem. If your program does
|
---|
402 | insecure things, and relies on people not knowing how to exploit those
|
---|
403 | insecurities, it is not secure. It is often possible for someone to
|
---|
404 | determine the insecure things and exploit them without viewing the
|
---|
405 | source. Security through obscurity, the name for hiding your bugs
|
---|
406 | instead of fixing them, is little security indeed.
|
---|
407 |
|
---|
408 | You can try using encryption via source filters (Filter::* from CPAN,
|
---|
409 | or Filter::Util::Call and Filter::Simple since Perl 5.8).
|
---|
410 | But crackers might be able to decrypt it. You can try using the byte
|
---|
411 | code compiler and interpreter described below, but crackers might be
|
---|
412 | able to de-compile it. You can try using the native-code compiler
|
---|
413 | described below, but crackers might be able to disassemble it. These
|
---|
414 | pose varying degrees of difficulty to people wanting to get at your
|
---|
415 | code, but none can definitively conceal it (this is true of every
|
---|
416 | language, not just Perl).
|
---|
417 |
|
---|
418 | If you're concerned about people profiting from your code, then the
|
---|
419 | bottom line is that nothing but a restrictive licence will give you
|
---|
420 | legal security. License your software and pepper it with threatening
|
---|
421 | statements like "This is unpublished proprietary software of XYZ Corp.
|
---|
422 | Your access to it does not give you permission to use it blah blah
|
---|
423 | blah." You should see a lawyer to be sure your licence's wording will
|
---|
424 | stand up in court.
|
---|
425 |
|
---|
426 | =head2 Unicode
|
---|
427 |
|
---|
428 | Unicode is a new and complex technology and one may easily overlook
|
---|
429 | certain security pitfalls. See L<perluniintro> for an overview and
|
---|
430 | L<perlunicode> for details, and L<perlunicode/"Security Implications
|
---|
431 | of Unicode"> for security implications in particular.
|
---|
432 |
|
---|
433 | =head2 Algorithmic Complexity Attacks
|
---|
434 |
|
---|
435 | Certain internal algorithms used in the implementation of Perl can
|
---|
436 | be attacked by choosing the input carefully to consume large amounts
|
---|
437 | of either time or space or both. This can lead into the so-called
|
---|
438 | I<Denial of Service> (DoS) attacks.
|
---|
439 |
|
---|
440 | =over 4
|
---|
441 |
|
---|
442 | =item *
|
---|
443 |
|
---|
444 | Hash Function - the algorithm used to "order" hash elements has been
|
---|
445 | changed several times during the development of Perl, mainly to be
|
---|
446 | reasonably fast. In Perl 5.8.1 also the security aspect was taken
|
---|
447 | into account.
|
---|
448 |
|
---|
449 | In Perls before 5.8.1 one could rather easily generate data that as
|
---|
450 | hash keys would cause Perl to consume large amounts of time because
|
---|
451 | internal structure of hashes would badly degenerate. In Perl 5.8.1
|
---|
452 | the hash function is randomly perturbed by a pseudorandom seed which
|
---|
453 | makes generating such naughty hash keys harder.
|
---|
454 | See L<perlrun/PERL_HASH_SEED> for more information.
|
---|
455 |
|
---|
456 | The random perturbation is done by default but if one wants for some
|
---|
457 | reason emulate the old behaviour one can set the environment variable
|
---|
458 | PERL_HASH_SEED to zero (or any other integer). One possible reason
|
---|
459 | for wanting to emulate the old behaviour is that in the new behaviour
|
---|
460 | consecutive runs of Perl will order hash keys differently, which may
|
---|
461 | confuse some applications (like Data::Dumper: the outputs of two
|
---|
462 | different runs are no more identical).
|
---|
463 |
|
---|
464 | B<Perl has never guaranteed any ordering of the hash keys>, and the
|
---|
465 | ordering has already changed several times during the lifetime of
|
---|
466 | Perl 5. Also, the ordering of hash keys has always been, and
|
---|
467 | continues to be, affected by the insertion order.
|
---|
468 |
|
---|
469 | Also note that while the order of the hash elements might be
|
---|
470 | randomised, this "pseudoordering" should B<not> be used for
|
---|
471 | applications like shuffling a list randomly (use List::Util::shuffle()
|
---|
472 | for that, see L<List::Util>, a standard core module since Perl 5.8.0;
|
---|
473 | or the CPAN module Algorithm::Numerical::Shuffle), or for generating
|
---|
474 | permutations (use e.g. the CPAN modules Algorithm::Permute or
|
---|
475 | Algorithm::FastPermute), or for any cryptographic applications.
|
---|
476 |
|
---|
477 | =item *
|
---|
478 |
|
---|
479 | Regular expressions - Perl's regular expression engine is so called
|
---|
480 | NFA (Non-Finite Automaton), which among other things means that it can
|
---|
481 | rather easily consume large amounts of both time and space if the
|
---|
482 | regular expression may match in several ways. Careful crafting of the
|
---|
483 | regular expressions can help but quite often there really isn't much
|
---|
484 | one can do (the book "Mastering Regular Expressions" is required
|
---|
485 | reading, see L<perlfaq2>). Running out of space manifests itself by
|
---|
486 | Perl running out of memory.
|
---|
487 |
|
---|
488 | =item *
|
---|
489 |
|
---|
490 | Sorting - the quicksort algorithm used in Perls before 5.8.0 to
|
---|
491 | implement the sort() function is very easy to trick into misbehaving
|
---|
492 | so that it consumes a lot of time. Nothing more is required than
|
---|
493 | resorting a list already sorted. Starting from Perl 5.8.0 a different
|
---|
494 | sorting algorithm, mergesort, is used. Mergesort is insensitive to
|
---|
495 | its input data, so it cannot be similarly fooled.
|
---|
496 |
|
---|
497 | =back
|
---|
498 |
|
---|
499 | See L<http://www.cs.rice.edu/~scrosby/hash/> for more information,
|
---|
500 | and any computer science text book on the algorithmic complexity.
|
---|
501 |
|
---|
502 | =head1 SEE ALSO
|
---|
503 |
|
---|
504 | L<perlrun> for its description of cleaning up environment variables.
|
---|