1 | =head1 NAME
|
---|
2 |
|
---|
3 | perlfaq4 - Data Manipulation ($Revision: 1.73 $, $Date: 2005/12/31 00:54:37 $)
|
---|
4 |
|
---|
5 | =head1 DESCRIPTION
|
---|
6 |
|
---|
7 | This section of the FAQ answers questions related to manipulating
|
---|
8 | numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
|
---|
9 |
|
---|
10 | =head1 Data: Numbers
|
---|
11 |
|
---|
12 | =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
|
---|
13 |
|
---|
14 | Internally, your computer represents floating-point numbers
|
---|
15 | in binary. Digital (as in powers of two) computers cannot
|
---|
16 | store all numbers exactly. Some real numbers lose precision
|
---|
17 | in the process. This is a problem with how computers store
|
---|
18 | numbers and affects all computer languages, not just Perl.
|
---|
19 |
|
---|
20 | L<perlnumber> show the gory details of number
|
---|
21 | representations and conversions.
|
---|
22 |
|
---|
23 | To limit the number of decimal places in your numbers, you
|
---|
24 | can use the printf or sprintf function. See the
|
---|
25 | L<"Floating Point Arithmetic"|perlop> for more details.
|
---|
26 |
|
---|
27 | printf "%.2f", 10/3;
|
---|
28 |
|
---|
29 | my $number = sprintf "%.2f", 10/3;
|
---|
30 |
|
---|
31 | =head2 Why is int() broken?
|
---|
32 |
|
---|
33 | Your int() is most probably working just fine. It's the numbers that
|
---|
34 | aren't quite what you think.
|
---|
35 |
|
---|
36 | First, see the above item "Why am I getting long decimals
|
---|
37 | (eg, 19.9499999999999) instead of the numbers I should be getting
|
---|
38 | (eg, 19.95)?".
|
---|
39 |
|
---|
40 | For example, this
|
---|
41 |
|
---|
42 | print int(0.6/0.2-2), "\n";
|
---|
43 |
|
---|
44 | will in most computers print 0, not 1, because even such simple
|
---|
45 | numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
|
---|
46 | numbers. What you think in the above as 'three' is really more like
|
---|
47 | 2.9999999999999995559.
|
---|
48 |
|
---|
49 | =head2 Why isn't my octal data interpreted correctly?
|
---|
50 |
|
---|
51 | Perl only understands octal and hex numbers as such when they occur as
|
---|
52 | literals in your program. Octal literals in perl must start with a
|
---|
53 | leading "0" and hexadecimal literals must start with a leading "0x".
|
---|
54 | If they are read in from somewhere and assigned, no automatic
|
---|
55 | conversion takes place. You must explicitly use oct() or hex() if you
|
---|
56 | want the values converted to decimal. oct() interprets hex ("0x350"),
|
---|
57 | octal ("0350" or even without the leading "0", like "377") and binary
|
---|
58 | ("0b1010") numbers, while hex() only converts hexadecimal ones, with
|
---|
59 | or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
|
---|
60 | The inverse mapping from decimal to octal can be done with either the
|
---|
61 | "%o" or "%O" sprintf() formats.
|
---|
62 |
|
---|
63 | This problem shows up most often when people try using chmod(), mkdir(),
|
---|
64 | umask(), or sysopen(), which by widespread tradition typically take
|
---|
65 | permissions in octal.
|
---|
66 |
|
---|
67 | chmod(644, $file); # WRONG
|
---|
68 | chmod(0644, $file); # right
|
---|
69 |
|
---|
70 | Note the mistake in the first line was specifying the decimal literal
|
---|
71 | 644, rather than the intended octal literal 0644. The problem can
|
---|
72 | be seen with:
|
---|
73 |
|
---|
74 | printf("%#o",644); # prints 01204
|
---|
75 |
|
---|
76 | Surely you had not intended C<chmod(01204, $file);> - did you? If you
|
---|
77 | want to use numeric literals as arguments to chmod() et al. then please
|
---|
78 | try to express them as octal constants, that is with a leading zero and
|
---|
79 | with the following digits restricted to the set 0..7.
|
---|
80 |
|
---|
81 | =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
|
---|
82 |
|
---|
83 | Remember that int() merely truncates toward 0. For rounding to a
|
---|
84 | certain number of digits, sprintf() or printf() is usually the easiest
|
---|
85 | route.
|
---|
86 |
|
---|
87 | printf("%.3f", 3.1415926535); # prints 3.142
|
---|
88 |
|
---|
89 | The POSIX module (part of the standard Perl distribution) implements
|
---|
90 | ceil(), floor(), and a number of other mathematical and trigonometric
|
---|
91 | functions.
|
---|
92 |
|
---|
93 | use POSIX;
|
---|
94 | $ceil = ceil(3.5); # 4
|
---|
95 | $floor = floor(3.5); # 3
|
---|
96 |
|
---|
97 | In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
|
---|
98 | module. With 5.004, the Math::Trig module (part of the standard Perl
|
---|
99 | distribution) implements the trigonometric functions. Internally it
|
---|
100 | uses the Math::Complex module and some functions can break out from
|
---|
101 | the real axis into the complex plane, for example the inverse sine of
|
---|
102 | 2.
|
---|
103 |
|
---|
104 | Rounding in financial applications can have serious implications, and
|
---|
105 | the rounding method used should be specified precisely. In these
|
---|
106 | cases, it probably pays not to trust whichever system rounding is
|
---|
107 | being used by Perl, but to instead implement the rounding function you
|
---|
108 | need yourself.
|
---|
109 |
|
---|
110 | To see why, notice how you'll still have an issue on half-way-point
|
---|
111 | alternation:
|
---|
112 |
|
---|
113 | for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
|
---|
114 |
|
---|
115 | 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
|
---|
116 | 0.8 0.8 0.9 0.9 1.0 1.0
|
---|
117 |
|
---|
118 | Don't blame Perl. It's the same as in C. IEEE says we have to do this.
|
---|
119 | Perl numbers whose absolute values are integers under 2**31 (on 32 bit
|
---|
120 | machines) will work pretty much like mathematical integers. Other numbers
|
---|
121 | are not guaranteed.
|
---|
122 |
|
---|
123 | =head2 How do I convert between numeric representations/bases/radixes?
|
---|
124 |
|
---|
125 | As always with Perl there is more than one way to do it. Below
|
---|
126 | are a few examples of approaches to making common conversions
|
---|
127 | between number representations. This is intended to be representational
|
---|
128 | rather than exhaustive.
|
---|
129 |
|
---|
130 | Some of the examples below use the Bit::Vector module from CPAN.
|
---|
131 | The reason you might choose Bit::Vector over the perl built in
|
---|
132 | functions is that it works with numbers of ANY size, that it is
|
---|
133 | optimized for speed on some operations, and for at least some
|
---|
134 | programmers the notation might be familiar.
|
---|
135 |
|
---|
136 | =over 4
|
---|
137 |
|
---|
138 | =item How do I convert hexadecimal into decimal
|
---|
139 |
|
---|
140 | Using perl's built in conversion of 0x notation:
|
---|
141 |
|
---|
142 | $dec = 0xDEADBEEF;
|
---|
143 |
|
---|
144 | Using the hex function:
|
---|
145 |
|
---|
146 | $dec = hex("DEADBEEF");
|
---|
147 |
|
---|
148 | Using pack:
|
---|
149 |
|
---|
150 | $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
|
---|
151 |
|
---|
152 | Using the CPAN module Bit::Vector:
|
---|
153 |
|
---|
154 | use Bit::Vector;
|
---|
155 | $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
|
---|
156 | $dec = $vec->to_Dec();
|
---|
157 |
|
---|
158 | =item How do I convert from decimal to hexadecimal
|
---|
159 |
|
---|
160 | Using sprintf:
|
---|
161 |
|
---|
162 | $hex = sprintf("%X", 3735928559); # upper case A-F
|
---|
163 | $hex = sprintf("%x", 3735928559); # lower case a-f
|
---|
164 |
|
---|
165 | Using unpack:
|
---|
166 |
|
---|
167 | $hex = unpack("H*", pack("N", 3735928559));
|
---|
168 |
|
---|
169 | Using Bit::Vector:
|
---|
170 |
|
---|
171 | use Bit::Vector;
|
---|
172 | $vec = Bit::Vector->new_Dec(32, -559038737);
|
---|
173 | $hex = $vec->to_Hex();
|
---|
174 |
|
---|
175 | And Bit::Vector supports odd bit counts:
|
---|
176 |
|
---|
177 | use Bit::Vector;
|
---|
178 | $vec = Bit::Vector->new_Dec(33, 3735928559);
|
---|
179 | $vec->Resize(32); # suppress leading 0 if unwanted
|
---|
180 | $hex = $vec->to_Hex();
|
---|
181 |
|
---|
182 | =item How do I convert from octal to decimal
|
---|
183 |
|
---|
184 | Using Perl's built in conversion of numbers with leading zeros:
|
---|
185 |
|
---|
186 | $dec = 033653337357; # note the leading 0!
|
---|
187 |
|
---|
188 | Using the oct function:
|
---|
189 |
|
---|
190 | $dec = oct("33653337357");
|
---|
191 |
|
---|
192 | Using Bit::Vector:
|
---|
193 |
|
---|
194 | use Bit::Vector;
|
---|
195 | $vec = Bit::Vector->new(32);
|
---|
196 | $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
|
---|
197 | $dec = $vec->to_Dec();
|
---|
198 |
|
---|
199 | =item How do I convert from decimal to octal
|
---|
200 |
|
---|
201 | Using sprintf:
|
---|
202 |
|
---|
203 | $oct = sprintf("%o", 3735928559);
|
---|
204 |
|
---|
205 | Using Bit::Vector:
|
---|
206 |
|
---|
207 | use Bit::Vector;
|
---|
208 | $vec = Bit::Vector->new_Dec(32, -559038737);
|
---|
209 | $oct = reverse join('', $vec->Chunk_List_Read(3));
|
---|
210 |
|
---|
211 | =item How do I convert from binary to decimal
|
---|
212 |
|
---|
213 | Perl 5.6 lets you write binary numbers directly with
|
---|
214 | the 0b notation:
|
---|
215 |
|
---|
216 | $number = 0b10110110;
|
---|
217 |
|
---|
218 | Using oct:
|
---|
219 |
|
---|
220 | my $input = "10110110";
|
---|
221 | $decimal = oct( "0b$input" );
|
---|
222 |
|
---|
223 | Using pack and ord:
|
---|
224 |
|
---|
225 | $decimal = ord(pack('B8', '10110110'));
|
---|
226 |
|
---|
227 | Using pack and unpack for larger strings:
|
---|
228 |
|
---|
229 | $int = unpack("N", pack("B32",
|
---|
230 | substr("0" x 32 . "11110101011011011111011101111", -32)));
|
---|
231 | $dec = sprintf("%d", $int);
|
---|
232 |
|
---|
233 | # substr() is used to left pad a 32 character string with zeros.
|
---|
234 |
|
---|
235 | Using Bit::Vector:
|
---|
236 |
|
---|
237 | $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
|
---|
238 | $dec = $vec->to_Dec();
|
---|
239 |
|
---|
240 | =item How do I convert from decimal to binary
|
---|
241 |
|
---|
242 | Using sprintf (perl 5.6+):
|
---|
243 |
|
---|
244 | $bin = sprintf("%b", 3735928559);
|
---|
245 |
|
---|
246 | Using unpack:
|
---|
247 |
|
---|
248 | $bin = unpack("B*", pack("N", 3735928559));
|
---|
249 |
|
---|
250 | Using Bit::Vector:
|
---|
251 |
|
---|
252 | use Bit::Vector;
|
---|
253 | $vec = Bit::Vector->new_Dec(32, -559038737);
|
---|
254 | $bin = $vec->to_Bin();
|
---|
255 |
|
---|
256 | The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
|
---|
257 | are left as an exercise to the inclined reader.
|
---|
258 |
|
---|
259 | =back
|
---|
260 |
|
---|
261 | =head2 Why doesn't & work the way I want it to?
|
---|
262 |
|
---|
263 | The behavior of binary arithmetic operators depends on whether they're
|
---|
264 | used on numbers or strings. The operators treat a string as a series
|
---|
265 | of bits and work with that (the string C<"3"> is the bit pattern
|
---|
266 | C<00110011>). The operators work with the binary form of a number
|
---|
267 | (the number C<3> is treated as the bit pattern C<00000011>).
|
---|
268 |
|
---|
269 | So, saying C<11 & 3> performs the "and" operation on numbers (yielding
|
---|
270 | C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
|
---|
271 | (yielding C<"1">).
|
---|
272 |
|
---|
273 | Most problems with C<&> and C<|> arise because the programmer thinks
|
---|
274 | they have a number but really it's a string. The rest arise because
|
---|
275 | the programmer says:
|
---|
276 |
|
---|
277 | if ("\020\020" & "\101\101") {
|
---|
278 | # ...
|
---|
279 | }
|
---|
280 |
|
---|
281 | but a string consisting of two null bytes (the result of C<"\020\020"
|
---|
282 | & "\101\101">) is not a false value in Perl. You need:
|
---|
283 |
|
---|
284 | if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
|
---|
285 | # ...
|
---|
286 | }
|
---|
287 |
|
---|
288 | =head2 How do I multiply matrices?
|
---|
289 |
|
---|
290 | Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
|
---|
291 | or the PDL extension (also available from CPAN).
|
---|
292 |
|
---|
293 | =head2 How do I perform an operation on a series of integers?
|
---|
294 |
|
---|
295 | To call a function on each element in an array, and collect the
|
---|
296 | results, use:
|
---|
297 |
|
---|
298 | @results = map { my_func($_) } @array;
|
---|
299 |
|
---|
300 | For example:
|
---|
301 |
|
---|
302 | @triple = map { 3 * $_ } @single;
|
---|
303 |
|
---|
304 | To call a function on each element of an array, but ignore the
|
---|
305 | results:
|
---|
306 |
|
---|
307 | foreach $iterator (@array) {
|
---|
308 | some_func($iterator);
|
---|
309 | }
|
---|
310 |
|
---|
311 | To call a function on each integer in a (small) range, you B<can> use:
|
---|
312 |
|
---|
313 | @results = map { some_func($_) } (5 .. 25);
|
---|
314 |
|
---|
315 | but you should be aware that the C<..> operator creates an array of
|
---|
316 | all integers in the range. This can take a lot of memory for large
|
---|
317 | ranges. Instead use:
|
---|
318 |
|
---|
319 | @results = ();
|
---|
320 | for ($i=5; $i < 500_005; $i++) {
|
---|
321 | push(@results, some_func($i));
|
---|
322 | }
|
---|
323 |
|
---|
324 | This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
|
---|
325 | loop will iterate over the range, without creating the entire range.
|
---|
326 |
|
---|
327 | for my $i (5 .. 500_005) {
|
---|
328 | push(@results, some_func($i));
|
---|
329 | }
|
---|
330 |
|
---|
331 | will not create a list of 500,000 integers.
|
---|
332 |
|
---|
333 | =head2 How can I output Roman numerals?
|
---|
334 |
|
---|
335 | Get the http://www.cpan.org/modules/by-module/Roman module.
|
---|
336 |
|
---|
337 | =head2 Why aren't my random numbers random?
|
---|
338 |
|
---|
339 | If you're using a version of Perl before 5.004, you must call C<srand>
|
---|
340 | once at the start of your program to seed the random number generator.
|
---|
341 |
|
---|
342 | BEGIN { srand() if $] < 5.004 }
|
---|
343 |
|
---|
344 | 5.004 and later automatically call C<srand> at the beginning. Don't
|
---|
345 | call C<srand> more than once---you make your numbers less random, rather
|
---|
346 | than more.
|
---|
347 |
|
---|
348 | Computers are good at being predictable and bad at being random
|
---|
349 | (despite appearances caused by bugs in your programs :-). see the
|
---|
350 | F<random> article in the "Far More Than You Ever Wanted To Know"
|
---|
351 | collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
|
---|
352 | Tom Phoenix, talks more about this. John von Neumann said, "Anyone
|
---|
353 | who attempts to generate random numbers by deterministic means is, of
|
---|
354 | course, living in a state of sin."
|
---|
355 |
|
---|
356 | If you want numbers that are more random than C<rand> with C<srand>
|
---|
357 | provides, you should also check out the Math::TrulyRandom module from
|
---|
358 | CPAN. It uses the imperfections in your system's timer to generate
|
---|
359 | random numbers, but this takes quite a while. If you want a better
|
---|
360 | pseudorandom generator than comes with your operating system, look at
|
---|
361 | "Numerical Recipes in C" at http://www.nr.com/ .
|
---|
362 |
|
---|
363 | =head2 How do I get a random number between X and Y?
|
---|
364 |
|
---|
365 | C<rand($x)> returns a number such that
|
---|
366 | C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
|
---|
367 | figure out is a random number in the range from 0 to the
|
---|
368 | difference between your I<X> and I<Y>.
|
---|
369 |
|
---|
370 | That is, to get a number between 10 and 15, inclusive, you
|
---|
371 | want a random number between 0 and 5 that you can then add
|
---|
372 | to 10.
|
---|
373 |
|
---|
374 | my $number = 10 + int rand( 15-10+1 );
|
---|
375 |
|
---|
376 | Hence you derive the following simple function to abstract
|
---|
377 | that. It selects a random integer between the two given
|
---|
378 | integers (inclusive), For example: C<random_int_in(50,120)>.
|
---|
379 |
|
---|
380 | sub random_int_in ($$) {
|
---|
381 | my($min, $max) = @_;
|
---|
382 | # Assumes that the two arguments are integers themselves!
|
---|
383 | return $min if $min == $max;
|
---|
384 | ($min, $max) = ($max, $min) if $min > $max;
|
---|
385 | return $min + int rand(1 + $max - $min);
|
---|
386 | }
|
---|
387 |
|
---|
388 | =head1 Data: Dates
|
---|
389 |
|
---|
390 | =head2 How do I find the day or week of the year?
|
---|
391 |
|
---|
392 | The localtime function returns the day of the year. Without an
|
---|
393 | argument localtime uses the current time.
|
---|
394 |
|
---|
395 | $day_of_year = (localtime)[7];
|
---|
396 |
|
---|
397 | The POSIX module can also format a date as the day of the year or
|
---|
398 | week of the year.
|
---|
399 |
|
---|
400 | use POSIX qw/strftime/;
|
---|
401 | my $day_of_year = strftime "%j", localtime;
|
---|
402 | my $week_of_year = strftime "%W", localtime;
|
---|
403 |
|
---|
404 | To get the day of year for any date, use the Time::Local module to get
|
---|
405 | a time in epoch seconds for the argument to localtime.
|
---|
406 |
|
---|
407 | use POSIX qw/strftime/;
|
---|
408 | use Time::Local;
|
---|
409 | my $week_of_year = strftime "%W",
|
---|
410 | localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
|
---|
411 |
|
---|
412 | The Date::Calc module provides two functions to calculate these.
|
---|
413 |
|
---|
414 | use Date::Calc;
|
---|
415 | my $day_of_year = Day_of_Year( 1987, 12, 18 );
|
---|
416 | my $week_of_year = Week_of_Year( 1987, 12, 18 );
|
---|
417 |
|
---|
418 | =head2 How do I find the current century or millennium?
|
---|
419 |
|
---|
420 | Use the following simple functions:
|
---|
421 |
|
---|
422 | sub get_century {
|
---|
423 | return int((((localtime(shift || time))[5] + 1999))/100);
|
---|
424 | }
|
---|
425 |
|
---|
426 | sub get_millennium {
|
---|
427 | return 1+int((((localtime(shift || time))[5] + 1899))/1000);
|
---|
428 | }
|
---|
429 |
|
---|
430 | On some systems, the POSIX module's strftime() function has
|
---|
431 | been extended in a non-standard way to use a C<%C> format,
|
---|
432 | which they sometimes claim is the "century". It isn't,
|
---|
433 | because on most such systems, this is only the first two
|
---|
434 | digits of the four-digit year, and thus cannot be used to
|
---|
435 | reliably determine the current century or millennium.
|
---|
436 |
|
---|
437 | =head2 How can I compare two dates and find the difference?
|
---|
438 |
|
---|
439 | (contributed by brian d foy)
|
---|
440 |
|
---|
441 | You could just store all your dates as a number and then subtract. Life
|
---|
442 | isn't always that simple though. If you want to work with formatted
|
---|
443 | dates, the Date::Manip, Date::Calc, or DateTime modules can help you.
|
---|
444 |
|
---|
445 |
|
---|
446 | =head2 How can I take a string and turn it into epoch seconds?
|
---|
447 |
|
---|
448 | If it's a regular enough string that it always has the same format,
|
---|
449 | you can split it up and pass the parts to C<timelocal> in the standard
|
---|
450 | Time::Local module. Otherwise, you should look into the Date::Calc
|
---|
451 | and Date::Manip modules from CPAN.
|
---|
452 |
|
---|
453 | =head2 How can I find the Julian Day?
|
---|
454 |
|
---|
455 | (contributed by brian d foy and Dave Cross)
|
---|
456 |
|
---|
457 | You can use the Time::JulianDay module available on CPAN. Ensure that
|
---|
458 | you really want to find a Julian day, though, as many people have
|
---|
459 | different ideas about Julian days. See
|
---|
460 | http://www.hermetic.ch/cal_stud/jdn.htm for instance.
|
---|
461 |
|
---|
462 | You can also try the DateTime module, which can convert a date/time
|
---|
463 | to a Julian Day.
|
---|
464 |
|
---|
465 | $ perl -MDateTime -le'print DateTime->today->jd'
|
---|
466 | 2453401.5
|
---|
467 |
|
---|
468 | Or the modified Julian Day
|
---|
469 |
|
---|
470 | $ perl -MDateTime -le'print DateTime->today->mjd'
|
---|
471 | 53401
|
---|
472 |
|
---|
473 | Or even the day of the year (which is what some people think of as a
|
---|
474 | Julian day)
|
---|
475 |
|
---|
476 | $ perl -MDateTime -le'print DateTime->today->doy'
|
---|
477 | 31
|
---|
478 |
|
---|
479 | =head2 How do I find yesterday's date?
|
---|
480 |
|
---|
481 | (contributed by brian d foy)
|
---|
482 |
|
---|
483 | Use one of the Date modules. The C<DateTime> module makes it simple, and
|
---|
484 | give you the same time of day, only the day before.
|
---|
485 |
|
---|
486 | use DateTime;
|
---|
487 |
|
---|
488 | my $yesterday = DateTime->now->subtract( days => 1 );
|
---|
489 |
|
---|
490 | print "Yesterday was $yesterday\n";
|
---|
491 |
|
---|
492 | You can also use the C<Date::Calc> module using its Today_and_Now
|
---|
493 | function.
|
---|
494 |
|
---|
495 | use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
|
---|
496 |
|
---|
497 | my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
|
---|
498 |
|
---|
499 | print "@date\n";
|
---|
500 |
|
---|
501 | Most people try to use the time rather than the calendar to figure out
|
---|
502 | dates, but that assumes that days are twenty-four hours each. For
|
---|
503 | most people, there are two days a year when they aren't: the switch to
|
---|
504 | and from summer time throws this off. Let the modules do the work.
|
---|
505 |
|
---|
506 | =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
|
---|
507 |
|
---|
508 | Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
|
---|
509 | Y2K compliant (whatever that means). The programmers you've hired to
|
---|
510 | use it, however, probably are not.
|
---|
511 |
|
---|
512 | Long answer: The question belies a true understanding of the issue.
|
---|
513 | Perl is just as Y2K compliant as your pencil--no more, and no less.
|
---|
514 | Can you use your pencil to write a non-Y2K-compliant memo? Of course
|
---|
515 | you can. Is that the pencil's fault? Of course it isn't.
|
---|
516 |
|
---|
517 | The date and time functions supplied with Perl (gmtime and localtime)
|
---|
518 | supply adequate information to determine the year well beyond 2000
|
---|
519 | (2038 is when trouble strikes for 32-bit machines). The year returned
|
---|
520 | by these functions when used in a list context is the year minus 1900.
|
---|
521 | For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
|
---|
522 | number. To avoid the year 2000 problem simply do not treat the year as
|
---|
523 | a 2-digit number. It isn't.
|
---|
524 |
|
---|
525 | When gmtime() and localtime() are used in scalar context they return
|
---|
526 | a timestamp string that contains a fully-expanded year. For example,
|
---|
527 | C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
|
---|
528 | 2001". There's no year 2000 problem here.
|
---|
529 |
|
---|
530 | That doesn't mean that Perl can't be used to create non-Y2K compliant
|
---|
531 | programs. It can. But so can your pencil. It's the fault of the user,
|
---|
532 | not the language. At the risk of inflaming the NRA: "Perl doesn't
|
---|
533 | break Y2K, people do." See http://www.perl.org/about/y2k.html for
|
---|
534 | a longer exposition.
|
---|
535 |
|
---|
536 | =head1 Data: Strings
|
---|
537 |
|
---|
538 | =head2 How do I validate input?
|
---|
539 |
|
---|
540 | (contributed by brian d foy)
|
---|
541 |
|
---|
542 | There are many ways to ensure that values are what you expect or
|
---|
543 | want to accept. Besides the specific examples that we cover in the
|
---|
544 | perlfaq, you can also look at the modules with "Assert" and "Validate"
|
---|
545 | in their names, along with other modules such as C<Regexp::Common>.
|
---|
546 |
|
---|
547 | Some modules have validation for particular types of input, such
|
---|
548 | as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
|
---|
549 | and C<Data::Validate::IP>.
|
---|
550 |
|
---|
551 | =head2 How do I unescape a string?
|
---|
552 |
|
---|
553 | It depends just what you mean by "escape". URL escapes are dealt
|
---|
554 | with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
|
---|
555 | character are removed with
|
---|
556 |
|
---|
557 | s/\\(.)/$1/g;
|
---|
558 |
|
---|
559 | This won't expand C<"\n"> or C<"\t"> or any other special escapes.
|
---|
560 |
|
---|
561 | =head2 How do I remove consecutive pairs of characters?
|
---|
562 |
|
---|
563 | (contributed by brian d foy)
|
---|
564 |
|
---|
565 | You can use the substitution operator to find pairs of characters (or
|
---|
566 | runs of characters) and replace them with a single instance. In this
|
---|
567 | substitution, we find a character in C<(.)>. The memory parentheses
|
---|
568 | store the matched character in the back-reference C<\1> and we use
|
---|
569 | that to require that the same thing immediately follow it. We replace
|
---|
570 | that part of the string with the character in C<$1>.
|
---|
571 |
|
---|
572 | s/(.)\1/$1/g;
|
---|
573 |
|
---|
574 | We can also use the transliteration operator, C<tr///>. In this
|
---|
575 | example, the search list side of our C<tr///> contains nothing, but
|
---|
576 | the C<c> option complements that so it contains everything. The
|
---|
577 | replacement list also contains nothing, so the transliteration is
|
---|
578 | almost a no-op since it won't do any replacements (or more exactly,
|
---|
579 | replace the character with itself). However, the C<s> option squashes
|
---|
580 | duplicated and consecutive characters in the string so a character
|
---|
581 | does not show up next to itself
|
---|
582 |
|
---|
583 | my $str = 'Haarlem'; # in the Netherlands
|
---|
584 | $str =~ tr///cs; # Now Harlem, like in New York
|
---|
585 |
|
---|
586 | =head2 How do I expand function calls in a string?
|
---|
587 |
|
---|
588 | (contributed by brian d foy)
|
---|
589 |
|
---|
590 | This is documented in L<perlref>, and although it's not the easiest
|
---|
591 | thing to read, it does work. In each of these examples, we call the
|
---|
592 | function inside the braces used to dereference a reference. If we
|
---|
593 | have a more than one return value, we can construct and dereference an
|
---|
594 | anonymous array. In this case, we call the function in list context.
|
---|
595 |
|
---|
596 | print "The time values are @{ [localtime] }.\n";
|
---|
597 |
|
---|
598 | If we want to call the function in scalar context, we have to do a bit
|
---|
599 | more work. We can really have any code we like inside the braces, so
|
---|
600 | we simply have to end with the scalar reference, although how you do
|
---|
601 | that is up to you, and you can use code inside the braces.
|
---|
602 |
|
---|
603 | print "The time is ${\(scalar localtime)}.\n"
|
---|
604 |
|
---|
605 | print "The time is ${ my $x = localtime; \$x }.\n";
|
---|
606 |
|
---|
607 | If your function already returns a reference, you don't need to create
|
---|
608 | the reference yourself.
|
---|
609 |
|
---|
610 | sub timestamp { my $t = localtime; \$t }
|
---|
611 |
|
---|
612 | print "The time is ${ timestamp() }.\n";
|
---|
613 |
|
---|
614 | The C<Interpolation> module can also do a lot of magic for you. You can
|
---|
615 | specify a variable name, in this case C<E>, to set up a tied hash that
|
---|
616 | does the interpolation for you. It has several other methods to do this
|
---|
617 | as well.
|
---|
618 |
|
---|
619 | use Interpolation E => 'eval';
|
---|
620 | print "The time values are $E{localtime()}.\n";
|
---|
621 |
|
---|
622 | In most cases, it is probably easier to simply use string concatenation,
|
---|
623 | which also forces scalar context.
|
---|
624 |
|
---|
625 | print "The time is " . localtime . ".\n";
|
---|
626 |
|
---|
627 | =head2 How do I find matching/nesting anything?
|
---|
628 |
|
---|
629 | This isn't something that can be done in one regular expression, no
|
---|
630 | matter how complicated. To find something between two single
|
---|
631 | characters, a pattern like C</x([^x]*)x/> will get the intervening
|
---|
632 | bits in $1. For multiple ones, then something more like
|
---|
633 | C</alpha(.*?)omega/> would be needed. But none of these deals with
|
---|
634 | nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
|
---|
635 | C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
|
---|
636 | L<perlre/(??{ code })>. For other cases, you'll have to write a
|
---|
637 | parser.
|
---|
638 |
|
---|
639 | If you are serious about writing a parser, there are a number of
|
---|
640 | modules or oddities that will make your life a lot easier. There are
|
---|
641 | the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
|
---|
642 | and the byacc program. Starting from perl 5.8 the Text::Balanced is
|
---|
643 | part of the standard distribution.
|
---|
644 |
|
---|
645 | One simple destructive, inside-out approach that you might try is to
|
---|
646 | pull out the smallest nesting parts one at a time:
|
---|
647 |
|
---|
648 | while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
|
---|
649 | # do something with $1
|
---|
650 | }
|
---|
651 |
|
---|
652 | A more complicated and sneaky approach is to make Perl's regular
|
---|
653 | expression engine do it for you. This is courtesy Dean Inada, and
|
---|
654 | rather has the nature of an Obfuscated Perl Contest entry, but it
|
---|
655 | really does work:
|
---|
656 |
|
---|
657 | # $_ contains the string to parse
|
---|
658 | # BEGIN and END are the opening and closing markers for the
|
---|
659 | # nested text.
|
---|
660 |
|
---|
661 | @( = ('(','');
|
---|
662 | @) = (')','');
|
---|
663 | ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
|
---|
664 | @$ = (eval{/$re/},$@!~/unmatched/i);
|
---|
665 | print join("\n",@$[0..$#$]) if( $$[-1] );
|
---|
666 |
|
---|
667 | =head2 How do I reverse a string?
|
---|
668 |
|
---|
669 | Use reverse() in scalar context, as documented in
|
---|
670 | L<perlfunc/reverse>.
|
---|
671 |
|
---|
672 | $reversed = reverse $string;
|
---|
673 |
|
---|
674 | =head2 How do I expand tabs in a string?
|
---|
675 |
|
---|
676 | You can do it yourself:
|
---|
677 |
|
---|
678 | 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
|
---|
679 |
|
---|
680 | Or you can just use the Text::Tabs module (part of the standard Perl
|
---|
681 | distribution).
|
---|
682 |
|
---|
683 | use Text::Tabs;
|
---|
684 | @expanded_lines = expand(@lines_with_tabs);
|
---|
685 |
|
---|
686 | =head2 How do I reformat a paragraph?
|
---|
687 |
|
---|
688 | Use Text::Wrap (part of the standard Perl distribution):
|
---|
689 |
|
---|
690 | use Text::Wrap;
|
---|
691 | print wrap("\t", ' ', @paragraphs);
|
---|
692 |
|
---|
693 | The paragraphs you give to Text::Wrap should not contain embedded
|
---|
694 | newlines. Text::Wrap doesn't justify the lines (flush-right).
|
---|
695 |
|
---|
696 | Or use the CPAN module Text::Autoformat. Formatting files can be easily
|
---|
697 | done by making a shell alias, like so:
|
---|
698 |
|
---|
699 | alias fmt="perl -i -MText::Autoformat -n0777 \
|
---|
700 | -e 'print autoformat $_, {all=>1}' $*"
|
---|
701 |
|
---|
702 | See the documentation for Text::Autoformat to appreciate its many
|
---|
703 | capabilities.
|
---|
704 |
|
---|
705 | =head2 How can I access or change N characters of a string?
|
---|
706 |
|
---|
707 | You can access the first characters of a string with substr().
|
---|
708 | To get the first character, for example, start at position 0
|
---|
709 | and grab the string of length 1.
|
---|
710 |
|
---|
711 |
|
---|
712 | $string = "Just another Perl Hacker";
|
---|
713 | $first_char = substr( $string, 0, 1 ); # 'J'
|
---|
714 |
|
---|
715 | To change part of a string, you can use the optional fourth
|
---|
716 | argument which is the replacement string.
|
---|
717 |
|
---|
718 | substr( $string, 13, 4, "Perl 5.8.0" );
|
---|
719 |
|
---|
720 | You can also use substr() as an lvalue.
|
---|
721 |
|
---|
722 | substr( $string, 13, 4 ) = "Perl 5.8.0";
|
---|
723 |
|
---|
724 | =head2 How do I change the Nth occurrence of something?
|
---|
725 |
|
---|
726 | You have to keep track of N yourself. For example, let's say you want
|
---|
727 | to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
|
---|
728 | C<"whosoever"> or C<"whomsoever">, case insensitively. These
|
---|
729 | all assume that $_ contains the string to be altered.
|
---|
730 |
|
---|
731 | $count = 0;
|
---|
732 | s{((whom?)ever)}{
|
---|
733 | ++$count == 5 # is it the 5th?
|
---|
734 | ? "${2}soever" # yes, swap
|
---|
735 | : $1 # renege and leave it there
|
---|
736 | }ige;
|
---|
737 |
|
---|
738 | In the more general case, you can use the C</g> modifier in a C<while>
|
---|
739 | loop, keeping count of matches.
|
---|
740 |
|
---|
741 | $WANT = 3;
|
---|
742 | $count = 0;
|
---|
743 | $_ = "One fish two fish red fish blue fish";
|
---|
744 | while (/(\w+)\s+fish\b/gi) {
|
---|
745 | if (++$count == $WANT) {
|
---|
746 | print "The third fish is a $1 one.\n";
|
---|
747 | }
|
---|
748 | }
|
---|
749 |
|
---|
750 | That prints out: C<"The third fish is a red one."> You can also use a
|
---|
751 | repetition count and repeated pattern like this:
|
---|
752 |
|
---|
753 | /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
|
---|
754 |
|
---|
755 | =head2 How can I count the number of occurrences of a substring within a string?
|
---|
756 |
|
---|
757 | There are a number of ways, with varying efficiency. If you want a
|
---|
758 | count of a certain single character (X) within a string, you can use the
|
---|
759 | C<tr///> function like so:
|
---|
760 |
|
---|
761 | $string = "ThisXlineXhasXsomeXx'sXinXit";
|
---|
762 | $count = ($string =~ tr/X//);
|
---|
763 | print "There are $count X characters in the string";
|
---|
764 |
|
---|
765 | This is fine if you are just looking for a single character. However,
|
---|
766 | if you are trying to count multiple character substrings within a
|
---|
767 | larger string, C<tr///> won't work. What you can do is wrap a while()
|
---|
768 | loop around a global pattern match. For example, let's count negative
|
---|
769 | integers:
|
---|
770 |
|
---|
771 | $string = "-9 55 48 -2 23 -76 4 14 -44";
|
---|
772 | while ($string =~ /-\d+/g) { $count++ }
|
---|
773 | print "There are $count negative numbers in the string";
|
---|
774 |
|
---|
775 | Another version uses a global match in list context, then assigns the
|
---|
776 | result to a scalar, producing a count of the number of matches.
|
---|
777 |
|
---|
778 | $count = () = $string =~ /-\d+/g;
|
---|
779 |
|
---|
780 | =head2 How do I capitalize all the words on one line?
|
---|
781 |
|
---|
782 | To make the first letter of each word upper case:
|
---|
783 |
|
---|
784 | $line =~ s/\b(\w)/\U$1/g;
|
---|
785 |
|
---|
786 | This has the strange effect of turning "C<don't do it>" into "C<Don'T
|
---|
787 | Do It>". Sometimes you might want this. Other times you might need a
|
---|
788 | more thorough solution (Suggested by brian d foy):
|
---|
789 |
|
---|
790 | $string =~ s/ (
|
---|
791 | (^\w) #at the beginning of the line
|
---|
792 | | # or
|
---|
793 | (\s\w) #preceded by whitespace
|
---|
794 | )
|
---|
795 | /\U$1/xg;
|
---|
796 | $string =~ /([\w']+)/\u\L$1/g;
|
---|
797 |
|
---|
798 | To make the whole line upper case:
|
---|
799 |
|
---|
800 | $line = uc($line);
|
---|
801 |
|
---|
802 | To force each word to be lower case, with the first letter upper case:
|
---|
803 |
|
---|
804 | $line =~ s/(\w+)/\u\L$1/g;
|
---|
805 |
|
---|
806 | You can (and probably should) enable locale awareness of those
|
---|
807 | characters by placing a C<use locale> pragma in your program.
|
---|
808 | See L<perllocale> for endless details on locales.
|
---|
809 |
|
---|
810 | This is sometimes referred to as putting something into "title
|
---|
811 | case", but that's not quite accurate. Consider the proper
|
---|
812 | capitalization of the movie I<Dr. Strangelove or: How I Learned to
|
---|
813 | Stop Worrying and Love the Bomb>, for example.
|
---|
814 |
|
---|
815 | Damian Conway's L<Text::Autoformat> module provides some smart
|
---|
816 | case transformations:
|
---|
817 |
|
---|
818 | use Text::Autoformat;
|
---|
819 | my $x = "Dr. Strangelove or: How I Learned to Stop ".
|
---|
820 | "Worrying and Love the Bomb";
|
---|
821 |
|
---|
822 | print $x, "\n";
|
---|
823 | for my $style (qw( sentence title highlight ))
|
---|
824 | {
|
---|
825 | print autoformat($x, { case => $style }), "\n";
|
---|
826 | }
|
---|
827 |
|
---|
828 | =head2 How can I split a [character] delimited string except when inside [character]?
|
---|
829 |
|
---|
830 | Several modules can handle this sort of pasing---Text::Balanced,
|
---|
831 | Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
|
---|
832 |
|
---|
833 | Take the example case of trying to split a string that is
|
---|
834 | comma-separated into its different fields. You can't use C<split(/,/)>
|
---|
835 | because you shouldn't split if the comma is inside quotes. For
|
---|
836 | example, take a data line like this:
|
---|
837 |
|
---|
838 | SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
|
---|
839 |
|
---|
840 | Due to the restriction of the quotes, this is a fairly complex
|
---|
841 | problem. Thankfully, we have Jeffrey Friedl, author of
|
---|
842 | I<Mastering Regular Expressions>, to handle these for us. He
|
---|
843 | suggests (assuming your string is contained in $text):
|
---|
844 |
|
---|
845 | @new = ();
|
---|
846 | push(@new, $+) while $text =~ m{
|
---|
847 | "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
|
---|
848 | | ([^,]+),?
|
---|
849 | | ,
|
---|
850 | }gx;
|
---|
851 | push(@new, undef) if substr($text,-1,1) eq ',';
|
---|
852 |
|
---|
853 | If you want to represent quotation marks inside a
|
---|
854 | quotation-mark-delimited field, escape them with backslashes (eg,
|
---|
855 | C<"like \"this\"">.
|
---|
856 |
|
---|
857 | Alternatively, the Text::ParseWords module (part of the standard Perl
|
---|
858 | distribution) lets you say:
|
---|
859 |
|
---|
860 | use Text::ParseWords;
|
---|
861 | @new = quotewords(",", 0, $text);
|
---|
862 |
|
---|
863 | There's also a Text::CSV (Comma-Separated Values) module on CPAN.
|
---|
864 |
|
---|
865 | =head2 How do I strip blank space from the beginning/end of a string?
|
---|
866 |
|
---|
867 | (contributed by brian d foy)
|
---|
868 |
|
---|
869 | A substitution can do this for you. For a single line, you want to
|
---|
870 | replace all the leading or trailing whitespace with nothing. You
|
---|
871 | can do that with a pair of substitutions.
|
---|
872 |
|
---|
873 | s/^\s+//;
|
---|
874 | s/\s+$//;
|
---|
875 |
|
---|
876 | You can also write that as a single substitution, although it turns
|
---|
877 | out the combined statement is slower than the separate ones. That
|
---|
878 | might not matter to you, though.
|
---|
879 |
|
---|
880 | s/^\s+|\s+$//g;
|
---|
881 |
|
---|
882 | In this regular expression, the alternation matches either at the
|
---|
883 | beginning or the end of the string since the anchors have a lower
|
---|
884 | precedence than the alternation. With the C</g> flag, the substitution
|
---|
885 | makes all possible matches, so it gets both. Remember, the trailing
|
---|
886 | newline matches the C<\s+>, and the C<$> anchor can match to the
|
---|
887 | physical end of the string, so the newline disappears too. Just add
|
---|
888 | the newline to the output, which has the added benefit of preserving
|
---|
889 | "blank" (consisting entirely of whitespace) lines which the C<^\s+>
|
---|
890 | would remove all by itself.
|
---|
891 |
|
---|
892 | while( <> )
|
---|
893 | {
|
---|
894 | s/^\s+|\s+$//g;
|
---|
895 | print "$_\n";
|
---|
896 | }
|
---|
897 |
|
---|
898 | For a multi-line string, you can apply the regular expression
|
---|
899 | to each logical line in the string by adding the C</m> flag (for
|
---|
900 | "multi-line"). With the C</m> flag, the C<$> matches I<before> an
|
---|
901 | embedded newline, so it doesn't remove it. It still removes the
|
---|
902 | newline at the end of the string.
|
---|
903 |
|
---|
904 | $string =~ s/^\s+|\s+$//gm;
|
---|
905 |
|
---|
906 | Remember that lines consisting entirely of whitespace will disappear,
|
---|
907 | since the first part of the alternation can match the entire string
|
---|
908 | and replace it with nothing. If need to keep embedded blank lines,
|
---|
909 | you have to do a little more work. Instead of matching any whitespace
|
---|
910 | (since that includes a newline), just match the other whitespace.
|
---|
911 |
|
---|
912 | $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
|
---|
913 |
|
---|
914 | =head2 How do I pad a string with blanks or pad a number with zeroes?
|
---|
915 |
|
---|
916 | In the following examples, C<$pad_len> is the length to which you wish
|
---|
917 | to pad the string, C<$text> or C<$num> contains the string to be padded,
|
---|
918 | and C<$pad_char> contains the padding character. You can use a single
|
---|
919 | character string constant instead of the C<$pad_char> variable if you
|
---|
920 | know what it is in advance. And in the same way you can use an integer in
|
---|
921 | place of C<$pad_len> if you know the pad length in advance.
|
---|
922 |
|
---|
923 | The simplest method uses the C<sprintf> function. It can pad on the left
|
---|
924 | or right with blanks and on the left with zeroes and it will not
|
---|
925 | truncate the result. The C<pack> function can only pad strings on the
|
---|
926 | right with blanks and it will truncate the result to a maximum length of
|
---|
927 | C<$pad_len>.
|
---|
928 |
|
---|
929 | # Left padding a string with blanks (no truncation):
|
---|
930 | $padded = sprintf("%${pad_len}s", $text);
|
---|
931 | $padded = sprintf("%*s", $pad_len, $text); # same thing
|
---|
932 |
|
---|
933 | # Right padding a string with blanks (no truncation):
|
---|
934 | $padded = sprintf("%-${pad_len}s", $text);
|
---|
935 | $padded = sprintf("%-*s", $pad_len, $text); # same thing
|
---|
936 |
|
---|
937 | # Left padding a number with 0 (no truncation):
|
---|
938 | $padded = sprintf("%0${pad_len}d", $num);
|
---|
939 | $padded = sprintf("%0*d", $pad_len, $num); # same thing
|
---|
940 |
|
---|
941 | # Right padding a string with blanks using pack (will truncate):
|
---|
942 | $padded = pack("A$pad_len",$text);
|
---|
943 |
|
---|
944 | If you need to pad with a character other than blank or zero you can use
|
---|
945 | one of the following methods. They all generate a pad string with the
|
---|
946 | C<x> operator and combine that with C<$text>. These methods do
|
---|
947 | not truncate C<$text>.
|
---|
948 |
|
---|
949 | Left and right padding with any character, creating a new string:
|
---|
950 |
|
---|
951 | $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
|
---|
952 | $padded = $text . $pad_char x ( $pad_len - length( $text ) );
|
---|
953 |
|
---|
954 | Left and right padding with any character, modifying C<$text> directly:
|
---|
955 |
|
---|
956 | substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
|
---|
957 | $text .= $pad_char x ( $pad_len - length( $text ) );
|
---|
958 |
|
---|
959 | =head2 How do I extract selected columns from a string?
|
---|
960 |
|
---|
961 | Use substr() or unpack(), both documented in L<perlfunc>.
|
---|
962 | If you prefer thinking in terms of columns instead of widths,
|
---|
963 | you can use this kind of thing:
|
---|
964 |
|
---|
965 | # determine the unpack format needed to split Linux ps output
|
---|
966 | # arguments are cut columns
|
---|
967 | my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
|
---|
968 |
|
---|
969 | sub cut2fmt {
|
---|
970 | my(@positions) = @_;
|
---|
971 | my $template = '';
|
---|
972 | my $lastpos = 1;
|
---|
973 | for my $place (@positions) {
|
---|
974 | $template .= "A" . ($place - $lastpos) . " ";
|
---|
975 | $lastpos = $place;
|
---|
976 | }
|
---|
977 | $template .= "A*";
|
---|
978 | return $template;
|
---|
979 | }
|
---|
980 |
|
---|
981 | =head2 How do I find the soundex value of a string?
|
---|
982 |
|
---|
983 | (contributed by brian d foy)
|
---|
984 |
|
---|
985 | You can use the Text::Soundex module. If you want to do fuzzy or close
|
---|
986 | matching, you might also try the String::Approx, and Text::Metaphone,
|
---|
987 | and Text::DoubleMetaphone modules.
|
---|
988 |
|
---|
989 | =head2 How can I expand variables in text strings?
|
---|
990 |
|
---|
991 | Let's assume that you have a string that contains placeholder
|
---|
992 | variables.
|
---|
993 |
|
---|
994 | $text = 'this has a $foo in it and a $bar';
|
---|
995 |
|
---|
996 | You can use a substitution with a double evaluation. The
|
---|
997 | first /e turns C<$1> into C<$foo>, and the second /e turns
|
---|
998 | C<$foo> into its value. You may want to wrap this in an
|
---|
999 | C<eval>: if you try to get the value of an undeclared variable
|
---|
1000 | while running under C<use strict>, you get a fatal error.
|
---|
1001 |
|
---|
1002 | eval { $text =~ s/(\$\w+)/$1/eeg };
|
---|
1003 | die if $@;
|
---|
1004 |
|
---|
1005 | It's probably better in the general case to treat those
|
---|
1006 | variables as entries in some special hash. For example:
|
---|
1007 |
|
---|
1008 | %user_defs = (
|
---|
1009 | foo => 23,
|
---|
1010 | bar => 19,
|
---|
1011 | );
|
---|
1012 | $text =~ s/\$(\w+)/$user_defs{$1}/g;
|
---|
1013 |
|
---|
1014 | =head2 What's wrong with always quoting "$vars"?
|
---|
1015 |
|
---|
1016 | The problem is that those double-quotes force stringification--
|
---|
1017 | coercing numbers and references into strings--even when you
|
---|
1018 | don't want them to be strings. Think of it this way: double-quote
|
---|
1019 | expansion is used to produce new strings. If you already
|
---|
1020 | have a string, why do you need more?
|
---|
1021 |
|
---|
1022 | If you get used to writing odd things like these:
|
---|
1023 |
|
---|
1024 | print "$var"; # BAD
|
---|
1025 | $new = "$old"; # BAD
|
---|
1026 | somefunc("$var"); # BAD
|
---|
1027 |
|
---|
1028 | You'll be in trouble. Those should (in 99.8% of the cases) be
|
---|
1029 | the simpler and more direct:
|
---|
1030 |
|
---|
1031 | print $var;
|
---|
1032 | $new = $old;
|
---|
1033 | somefunc($var);
|
---|
1034 |
|
---|
1035 | Otherwise, besides slowing you down, you're going to break code when
|
---|
1036 | the thing in the scalar is actually neither a string nor a number, but
|
---|
1037 | a reference:
|
---|
1038 |
|
---|
1039 | func(\@array);
|
---|
1040 | sub func {
|
---|
1041 | my $aref = shift;
|
---|
1042 | my $oref = "$aref"; # WRONG
|
---|
1043 | }
|
---|
1044 |
|
---|
1045 | You can also get into subtle problems on those few operations in Perl
|
---|
1046 | that actually do care about the difference between a string and a
|
---|
1047 | number, such as the magical C<++> autoincrement operator or the
|
---|
1048 | syscall() function.
|
---|
1049 |
|
---|
1050 | Stringification also destroys arrays.
|
---|
1051 |
|
---|
1052 | @lines = `command`;
|
---|
1053 | print "@lines"; # WRONG - extra blanks
|
---|
1054 | print @lines; # right
|
---|
1055 |
|
---|
1056 | =head2 Why don't my E<lt>E<lt>HERE documents work?
|
---|
1057 |
|
---|
1058 | Check for these three things:
|
---|
1059 |
|
---|
1060 | =over 4
|
---|
1061 |
|
---|
1062 | =item There must be no space after the E<lt>E<lt> part.
|
---|
1063 |
|
---|
1064 | =item There (probably) should be a semicolon at the end.
|
---|
1065 |
|
---|
1066 | =item You can't (easily) have any space in front of the tag.
|
---|
1067 |
|
---|
1068 | =back
|
---|
1069 |
|
---|
1070 | If you want to indent the text in the here document, you
|
---|
1071 | can do this:
|
---|
1072 |
|
---|
1073 | # all in one
|
---|
1074 | ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
|
---|
1075 | your text
|
---|
1076 | goes here
|
---|
1077 | HERE_TARGET
|
---|
1078 |
|
---|
1079 | But the HERE_TARGET must still be flush against the margin.
|
---|
1080 | If you want that indented also, you'll have to quote
|
---|
1081 | in the indentation.
|
---|
1082 |
|
---|
1083 | ($quote = <<' FINIS') =~ s/^\s+//gm;
|
---|
1084 | ...we will have peace, when you and all your works have
|
---|
1085 | perished--and the works of your dark master to whom you
|
---|
1086 | would deliver us. You are a liar, Saruman, and a corrupter
|
---|
1087 | of men's hearts. --Theoden in /usr/src/perl/taint.c
|
---|
1088 | FINIS
|
---|
1089 | $quote =~ s/\s+--/\n--/;
|
---|
1090 |
|
---|
1091 | A nice general-purpose fixer-upper function for indented here documents
|
---|
1092 | follows. It expects to be called with a here document as its argument.
|
---|
1093 | It looks to see whether each line begins with a common substring, and
|
---|
1094 | if so, strips that substring off. Otherwise, it takes the amount of leading
|
---|
1095 | whitespace found on the first line and removes that much off each
|
---|
1096 | subsequent line.
|
---|
1097 |
|
---|
1098 | sub fix {
|
---|
1099 | local $_ = shift;
|
---|
1100 | my ($white, $leader); # common whitespace and common leading string
|
---|
1101 | if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
|
---|
1102 | ($white, $leader) = ($2, quotemeta($1));
|
---|
1103 | } else {
|
---|
1104 | ($white, $leader) = (/^(\s+)/, '');
|
---|
1105 | }
|
---|
1106 | s/^\s*?$leader(?:$white)?//gm;
|
---|
1107 | return $_;
|
---|
1108 | }
|
---|
1109 |
|
---|
1110 | This works with leading special strings, dynamically determined:
|
---|
1111 |
|
---|
1112 | $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
|
---|
1113 | @@@ int
|
---|
1114 | @@@ runops() {
|
---|
1115 | @@@ SAVEI32(runlevel);
|
---|
1116 | @@@ runlevel++;
|
---|
1117 | @@@ while ( op = (*op->op_ppaddr)() );
|
---|
1118 | @@@ TAINT_NOT;
|
---|
1119 | @@@ return 0;
|
---|
1120 | @@@ }
|
---|
1121 | MAIN_INTERPRETER_LOOP
|
---|
1122 |
|
---|
1123 | Or with a fixed amount of leading whitespace, with remaining
|
---|
1124 | indentation correctly preserved:
|
---|
1125 |
|
---|
1126 | $poem = fix<<EVER_ON_AND_ON;
|
---|
1127 | Now far ahead the Road has gone,
|
---|
1128 | And I must follow, if I can,
|
---|
1129 | Pursuing it with eager feet,
|
---|
1130 | Until it joins some larger way
|
---|
1131 | Where many paths and errands meet.
|
---|
1132 | And whither then? I cannot say.
|
---|
1133 | --Bilbo in /usr/src/perl/pp_ctl.c
|
---|
1134 | EVER_ON_AND_ON
|
---|
1135 |
|
---|
1136 | =head1 Data: Arrays
|
---|
1137 |
|
---|
1138 | =head2 What is the difference between a list and an array?
|
---|
1139 |
|
---|
1140 | An array has a changeable length. A list does not. An array is something
|
---|
1141 | you can push or pop, while a list is a set of values. Some people make
|
---|
1142 | the distinction that a list is a value while an array is a variable.
|
---|
1143 | Subroutines are passed and return lists, you put things into list
|
---|
1144 | context, you initialize arrays with lists, and you foreach() across
|
---|
1145 | a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
|
---|
1146 | in scalar context behave like the number of elements in them, subroutines
|
---|
1147 | access their arguments through the array C<@_>, and push/pop/shift only work
|
---|
1148 | on arrays.
|
---|
1149 |
|
---|
1150 | As a side note, there's no such thing as a list in scalar context.
|
---|
1151 | When you say
|
---|
1152 |
|
---|
1153 | $scalar = (2, 5, 7, 9);
|
---|
1154 |
|
---|
1155 | you're using the comma operator in scalar context, so it uses the scalar
|
---|
1156 | comma operator. There never was a list there at all! This causes the
|
---|
1157 | last value to be returned: 9.
|
---|
1158 |
|
---|
1159 | =head2 What is the difference between $array[1] and @array[1]?
|
---|
1160 |
|
---|
1161 | The former is a scalar value; the latter an array slice, making
|
---|
1162 | it a list with one (scalar) value. You should use $ when you want a
|
---|
1163 | scalar value (most of the time) and @ when you want a list with one
|
---|
1164 | scalar value in it (very, very rarely; nearly never, in fact).
|
---|
1165 |
|
---|
1166 | Sometimes it doesn't make a difference, but sometimes it does.
|
---|
1167 | For example, compare:
|
---|
1168 |
|
---|
1169 | $good[0] = `some program that outputs several lines`;
|
---|
1170 |
|
---|
1171 | with
|
---|
1172 |
|
---|
1173 | @bad[0] = `same program that outputs several lines`;
|
---|
1174 |
|
---|
1175 | The C<use warnings> pragma and the B<-w> flag will warn you about these
|
---|
1176 | matters.
|
---|
1177 |
|
---|
1178 | =head2 How can I remove duplicate elements from a list or array?
|
---|
1179 |
|
---|
1180 | (contributed by brian d foy)
|
---|
1181 |
|
---|
1182 | Use a hash. When you think the words "unique" or "duplicated", think
|
---|
1183 | "hash keys".
|
---|
1184 |
|
---|
1185 | If you don't care about the order of the elements, you could just
|
---|
1186 | create the hash then extract the keys. It's not important how you
|
---|
1187 | create that hash: just that you use C<keys> to get the unique
|
---|
1188 | elements.
|
---|
1189 |
|
---|
1190 | my %hash = map { $_, 1 } @array;
|
---|
1191 | # or a hash slice: @hash{ @array } = ();
|
---|
1192 | # or a foreach: $hash{$_} = 1 foreach ( @array );
|
---|
1193 |
|
---|
1194 | my @unique = keys %hash;
|
---|
1195 |
|
---|
1196 | You can also go through each element and skip the ones you've seen
|
---|
1197 | before. Use a hash to keep track. The first time the loop sees an
|
---|
1198 | element, that element has no key in C<%Seen>. The C<next> statement
|
---|
1199 | creates the key and immediately uses its value, which is C<undef>, so
|
---|
1200 | the loop continues to the C<push> and increments the value for that
|
---|
1201 | key. The next time the loop sees that same element, its key exists in
|
---|
1202 | the hash I<and> the value for that key is true (since it's not 0 or
|
---|
1203 | undef), so the next skips that iteration and the loop goes to the next
|
---|
1204 | element.
|
---|
1205 |
|
---|
1206 | my @unique = ();
|
---|
1207 | my %seen = ();
|
---|
1208 |
|
---|
1209 | foreach my $elem ( @array )
|
---|
1210 | {
|
---|
1211 | next if $seen{ $elem }++;
|
---|
1212 | push @unique, $elem;
|
---|
1213 | }
|
---|
1214 |
|
---|
1215 | You can write this more briefly using a grep, which does the
|
---|
1216 | same thing.
|
---|
1217 |
|
---|
1218 | my %seen = ();
|
---|
1219 | my @unique = grep { ! $seen{ $_ }++ } @array;
|
---|
1220 |
|
---|
1221 | =head2 How can I tell whether a certain element is contained in a list or array?
|
---|
1222 |
|
---|
1223 | (portions of this answer contributed by Anno Siegel)
|
---|
1224 |
|
---|
1225 | Hearing the word "in" is an I<in>dication that you probably should have
|
---|
1226 | used a hash, not a list or array, to store your data. Hashes are
|
---|
1227 | designed to answer this question quickly and efficiently. Arrays aren't.
|
---|
1228 |
|
---|
1229 | That being said, there are several ways to approach this. If you
|
---|
1230 | are going to make this query many times over arbitrary string values,
|
---|
1231 | the fastest way is probably to invert the original array and maintain a
|
---|
1232 | hash whose keys are the first array's values.
|
---|
1233 |
|
---|
1234 | @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
|
---|
1235 | %is_blue = ();
|
---|
1236 | for (@blues) { $is_blue{$_} = 1 }
|
---|
1237 |
|
---|
1238 | Now you can check whether $is_blue{$some_color}. It might have been a
|
---|
1239 | good idea to keep the blues all in a hash in the first place.
|
---|
1240 |
|
---|
1241 | If the values are all small integers, you could use a simple indexed
|
---|
1242 | array. This kind of an array will take up less space:
|
---|
1243 |
|
---|
1244 | @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
|
---|
1245 | @is_tiny_prime = ();
|
---|
1246 | for (@primes) { $is_tiny_prime[$_] = 1 }
|
---|
1247 | # or simply @istiny_prime[@primes] = (1) x @primes;
|
---|
1248 |
|
---|
1249 | Now you check whether $is_tiny_prime[$some_number].
|
---|
1250 |
|
---|
1251 | If the values in question are integers instead of strings, you can save
|
---|
1252 | quite a lot of space by using bit strings instead:
|
---|
1253 |
|
---|
1254 | @articles = ( 1..10, 150..2000, 2017 );
|
---|
1255 | undef $read;
|
---|
1256 | for (@articles) { vec($read,$_,1) = 1 }
|
---|
1257 |
|
---|
1258 | Now check whether C<vec($read,$n,1)> is true for some C<$n>.
|
---|
1259 |
|
---|
1260 | These methods guarantee fast individual tests but require a re-organization
|
---|
1261 | of the original list or array. They only pay off if you have to test
|
---|
1262 | multiple values against the same array.
|
---|
1263 |
|
---|
1264 | If you are testing only once, the standard module List::Util exports
|
---|
1265 | the function C<first> for this purpose. It works by stopping once it
|
---|
1266 | finds the element. It's written in C for speed, and its Perl equivalant
|
---|
1267 | looks like this subroutine:
|
---|
1268 |
|
---|
1269 | sub first (&@) {
|
---|
1270 | my $code = shift;
|
---|
1271 | foreach (@_) {
|
---|
1272 | return $_ if &{$code}();
|
---|
1273 | }
|
---|
1274 | undef;
|
---|
1275 | }
|
---|
1276 |
|
---|
1277 | If speed is of little concern, the common idiom uses grep in scalar context
|
---|
1278 | (which returns the number of items that passed its condition) to traverse the
|
---|
1279 | entire list. This does have the benefit of telling you how many matches it
|
---|
1280 | found, though.
|
---|
1281 |
|
---|
1282 | my $is_there = grep $_ eq $whatever, @array;
|
---|
1283 |
|
---|
1284 | If you want to actually extract the matching elements, simply use grep in
|
---|
1285 | list context.
|
---|
1286 |
|
---|
1287 | my @matches = grep $_ eq $whatever, @array;
|
---|
1288 |
|
---|
1289 | =head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
|
---|
1290 |
|
---|
1291 | Use a hash. Here's code to do both and more. It assumes that
|
---|
1292 | each element is unique in a given array:
|
---|
1293 |
|
---|
1294 | @union = @intersection = @difference = ();
|
---|
1295 | %count = ();
|
---|
1296 | foreach $element (@array1, @array2) { $count{$element}++ }
|
---|
1297 | foreach $element (keys %count) {
|
---|
1298 | push @union, $element;
|
---|
1299 | push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
|
---|
1300 | }
|
---|
1301 |
|
---|
1302 | Note that this is the I<symmetric difference>, that is, all elements in
|
---|
1303 | either A or in B but not in both. Think of it as an xor operation.
|
---|
1304 |
|
---|
1305 | =head2 How do I test whether two arrays or hashes are equal?
|
---|
1306 |
|
---|
1307 | The following code works for single-level arrays. It uses a stringwise
|
---|
1308 | comparison, and does not distinguish defined versus undefined empty
|
---|
1309 | strings. Modify if you have other needs.
|
---|
1310 |
|
---|
1311 | $are_equal = compare_arrays(\@frogs, \@toads);
|
---|
1312 |
|
---|
1313 | sub compare_arrays {
|
---|
1314 | my ($first, $second) = @_;
|
---|
1315 | no warnings; # silence spurious -w undef complaints
|
---|
1316 | return 0 unless @$first == @$second;
|
---|
1317 | for (my $i = 0; $i < @$first; $i++) {
|
---|
1318 | return 0 if $first->[$i] ne $second->[$i];
|
---|
1319 | }
|
---|
1320 | return 1;
|
---|
1321 | }
|
---|
1322 |
|
---|
1323 | For multilevel structures, you may wish to use an approach more
|
---|
1324 | like this one. It uses the CPAN module FreezeThaw:
|
---|
1325 |
|
---|
1326 | use FreezeThaw qw(cmpStr);
|
---|
1327 | @a = @b = ( "this", "that", [ "more", "stuff" ] );
|
---|
1328 |
|
---|
1329 | printf "a and b contain %s arrays\n",
|
---|
1330 | cmpStr(\@a, \@b) == 0
|
---|
1331 | ? "the same"
|
---|
1332 | : "different";
|
---|
1333 |
|
---|
1334 | This approach also works for comparing hashes. Here
|
---|
1335 | we'll demonstrate two different answers:
|
---|
1336 |
|
---|
1337 | use FreezeThaw qw(cmpStr cmpStrHard);
|
---|
1338 |
|
---|
1339 | %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
|
---|
1340 | $a{EXTRA} = \%b;
|
---|
1341 | $b{EXTRA} = \%a;
|
---|
1342 |
|
---|
1343 | printf "a and b contain %s hashes\n",
|
---|
1344 | cmpStr(\%a, \%b) == 0 ? "the same" : "different";
|
---|
1345 |
|
---|
1346 | printf "a and b contain %s hashes\n",
|
---|
1347 | cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
|
---|
1348 |
|
---|
1349 |
|
---|
1350 | The first reports that both those the hashes contain the same data,
|
---|
1351 | while the second reports that they do not. Which you prefer is left as
|
---|
1352 | an exercise to the reader.
|
---|
1353 |
|
---|
1354 | =head2 How do I find the first array element for which a condition is true?
|
---|
1355 |
|
---|
1356 | To find the first array element which satisfies a condition, you can
|
---|
1357 | use the first() function in the List::Util module, which comes with
|
---|
1358 | Perl 5.8. This example finds the first element that contains "Perl".
|
---|
1359 |
|
---|
1360 | use List::Util qw(first);
|
---|
1361 |
|
---|
1362 | my $element = first { /Perl/ } @array;
|
---|
1363 |
|
---|
1364 | If you cannot use List::Util, you can make your own loop to do the
|
---|
1365 | same thing. Once you find the element, you stop the loop with last.
|
---|
1366 |
|
---|
1367 | my $found;
|
---|
1368 | foreach ( @array )
|
---|
1369 | {
|
---|
1370 | if( /Perl/ ) { $found = $_; last }
|
---|
1371 | }
|
---|
1372 |
|
---|
1373 | If you want the array index, you can iterate through the indices
|
---|
1374 | and check the array element at each index until you find one
|
---|
1375 | that satisfies the condition.
|
---|
1376 |
|
---|
1377 | my( $found, $index ) = ( undef, -1 );
|
---|
1378 | for( $i = 0; $i < @array; $i++ )
|
---|
1379 | {
|
---|
1380 | if( $array[$i] =~ /Perl/ )
|
---|
1381 | {
|
---|
1382 | $found = $array[$i];
|
---|
1383 | $index = $i;
|
---|
1384 | last;
|
---|
1385 | }
|
---|
1386 | }
|
---|
1387 |
|
---|
1388 | =head2 How do I handle linked lists?
|
---|
1389 |
|
---|
1390 | In general, you usually don't need a linked list in Perl, since with
|
---|
1391 | regular arrays, you can push and pop or shift and unshift at either end,
|
---|
1392 | or you can use splice to add and/or remove arbitrary number of elements at
|
---|
1393 | arbitrary points. Both pop and shift are both O(1) operations on Perl's
|
---|
1394 | dynamic arrays. In the absence of shifts and pops, push in general
|
---|
1395 | needs to reallocate on the order every log(N) times, and unshift will
|
---|
1396 | need to copy pointers each time.
|
---|
1397 |
|
---|
1398 | If you really, really wanted, you could use structures as described in
|
---|
1399 | L<perldsc> or L<perltoot> and do just what the algorithm book tells you
|
---|
1400 | to do. For example, imagine a list node like this:
|
---|
1401 |
|
---|
1402 | $node = {
|
---|
1403 | VALUE => 42,
|
---|
1404 | LINK => undef,
|
---|
1405 | };
|
---|
1406 |
|
---|
1407 | You could walk the list this way:
|
---|
1408 |
|
---|
1409 | print "List: ";
|
---|
1410 | for ($node = $head; $node; $node = $node->{LINK}) {
|
---|
1411 | print $node->{VALUE}, " ";
|
---|
1412 | }
|
---|
1413 | print "\n";
|
---|
1414 |
|
---|
1415 | You could add to the list this way:
|
---|
1416 |
|
---|
1417 | my ($head, $tail);
|
---|
1418 | $tail = append($head, 1); # grow a new head
|
---|
1419 | for $value ( 2 .. 10 ) {
|
---|
1420 | $tail = append($tail, $value);
|
---|
1421 | }
|
---|
1422 |
|
---|
1423 | sub append {
|
---|
1424 | my($list, $value) = @_;
|
---|
1425 | my $node = { VALUE => $value };
|
---|
1426 | if ($list) {
|
---|
1427 | $node->{LINK} = $list->{LINK};
|
---|
1428 | $list->{LINK} = $node;
|
---|
1429 | } else {
|
---|
1430 | $_[0] = $node; # replace caller's version
|
---|
1431 | }
|
---|
1432 | return $node;
|
---|
1433 | }
|
---|
1434 |
|
---|
1435 | But again, Perl's built-in are virtually always good enough.
|
---|
1436 |
|
---|
1437 | =head2 How do I handle circular lists?
|
---|
1438 |
|
---|
1439 | Circular lists could be handled in the traditional fashion with linked
|
---|
1440 | lists, or you could just do something like this with an array:
|
---|
1441 |
|
---|
1442 | unshift(@array, pop(@array)); # the last shall be first
|
---|
1443 | push(@array, shift(@array)); # and vice versa
|
---|
1444 |
|
---|
1445 | =head2 How do I shuffle an array randomly?
|
---|
1446 |
|
---|
1447 | If you either have Perl 5.8.0 or later installed, or if you have
|
---|
1448 | Scalar-List-Utils 1.03 or later installed, you can say:
|
---|
1449 |
|
---|
1450 | use List::Util 'shuffle';
|
---|
1451 |
|
---|
1452 | @shuffled = shuffle(@list);
|
---|
1453 |
|
---|
1454 | If not, you can use a Fisher-Yates shuffle.
|
---|
1455 |
|
---|
1456 | sub fisher_yates_shuffle {
|
---|
1457 | my $deck = shift; # $deck is a reference to an array
|
---|
1458 | my $i = @$deck;
|
---|
1459 | while (--$i) {
|
---|
1460 | my $j = int rand ($i+1);
|
---|
1461 | @$deck[$i,$j] = @$deck[$j,$i];
|
---|
1462 | }
|
---|
1463 | }
|
---|
1464 |
|
---|
1465 | # shuffle my mpeg collection
|
---|
1466 | #
|
---|
1467 | my @mpeg = <audio/*/*.mp3>;
|
---|
1468 | fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
|
---|
1469 | print @mpeg;
|
---|
1470 |
|
---|
1471 | Note that the above implementation shuffles an array in place,
|
---|
1472 | unlike the List::Util::shuffle() which takes a list and returns
|
---|
1473 | a new shuffled list.
|
---|
1474 |
|
---|
1475 | You've probably seen shuffling algorithms that work using splice,
|
---|
1476 | randomly picking another element to swap the current element with
|
---|
1477 |
|
---|
1478 | srand;
|
---|
1479 | @new = ();
|
---|
1480 | @old = 1 .. 10; # just a demo
|
---|
1481 | while (@old) {
|
---|
1482 | push(@new, splice(@old, rand @old, 1));
|
---|
1483 | }
|
---|
1484 |
|
---|
1485 | This is bad because splice is already O(N), and since you do it N times,
|
---|
1486 | you just invented a quadratic algorithm; that is, O(N**2). This does
|
---|
1487 | not scale, although Perl is so efficient that you probably won't notice
|
---|
1488 | this until you have rather largish arrays.
|
---|
1489 |
|
---|
1490 | =head2 How do I process/modify each element of an array?
|
---|
1491 |
|
---|
1492 | Use C<for>/C<foreach>:
|
---|
1493 |
|
---|
1494 | for (@lines) {
|
---|
1495 | s/foo/bar/; # change that word
|
---|
1496 | tr/XZ/ZX/; # swap those letters
|
---|
1497 | }
|
---|
1498 |
|
---|
1499 | Here's another; let's compute spherical volumes:
|
---|
1500 |
|
---|
1501 | for (@volumes = @radii) { # @volumes has changed parts
|
---|
1502 | $_ **= 3;
|
---|
1503 | $_ *= (4/3) * 3.14159; # this will be constant folded
|
---|
1504 | }
|
---|
1505 |
|
---|
1506 | which can also be done with map() which is made to transform
|
---|
1507 | one list into another:
|
---|
1508 |
|
---|
1509 | @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
|
---|
1510 |
|
---|
1511 | If you want to do the same thing to modify the values of the
|
---|
1512 | hash, you can use the C<values> function. As of Perl 5.6
|
---|
1513 | the values are not copied, so if you modify $orbit (in this
|
---|
1514 | case), you modify the value.
|
---|
1515 |
|
---|
1516 | for $orbit ( values %orbits ) {
|
---|
1517 | ($orbit **= 3) *= (4/3) * 3.14159;
|
---|
1518 | }
|
---|
1519 |
|
---|
1520 | Prior to perl 5.6 C<values> returned copies of the values,
|
---|
1521 | so older perl code often contains constructions such as
|
---|
1522 | C<@orbits{keys %orbits}> instead of C<values %orbits> where
|
---|
1523 | the hash is to be modified.
|
---|
1524 |
|
---|
1525 | =head2 How do I select a random element from an array?
|
---|
1526 |
|
---|
1527 | Use the rand() function (see L<perlfunc/rand>):
|
---|
1528 |
|
---|
1529 | $index = rand @array;
|
---|
1530 | $element = $array[$index];
|
---|
1531 |
|
---|
1532 | Or, simply:
|
---|
1533 | my $element = $array[ rand @array ];
|
---|
1534 |
|
---|
1535 | =head2 How do I permute N elements of a list?
|
---|
1536 |
|
---|
1537 | Use the List::Permutor module on CPAN. If the list is
|
---|
1538 | actually an array, try the Algorithm::Permute module (also
|
---|
1539 | on CPAN). It's written in XS code and is very efficient.
|
---|
1540 |
|
---|
1541 | use Algorithm::Permute;
|
---|
1542 | my @array = 'a'..'d';
|
---|
1543 | my $p_iterator = Algorithm::Permute->new ( \@array );
|
---|
1544 | while (my @perm = $p_iterator->next) {
|
---|
1545 | print "next permutation: (@perm)\n";
|
---|
1546 | }
|
---|
1547 |
|
---|
1548 | For even faster execution, you could do:
|
---|
1549 |
|
---|
1550 | use Algorithm::Permute;
|
---|
1551 | my @array = 'a'..'d';
|
---|
1552 | Algorithm::Permute::permute {
|
---|
1553 | print "next permutation: (@array)\n";
|
---|
1554 | } @array;
|
---|
1555 |
|
---|
1556 | Here's a little program that generates all permutations of
|
---|
1557 | all the words on each line of input. The algorithm embodied
|
---|
1558 | in the permute() function is discussed in Volume 4 (still
|
---|
1559 | unpublished) of Knuth's I<The Art of Computer Programming>
|
---|
1560 | and will work on any list:
|
---|
1561 |
|
---|
1562 | #!/usr/bin/perl -n
|
---|
1563 | # Fischer-Kause ordered permutation generator
|
---|
1564 |
|
---|
1565 | sub permute (&@) {
|
---|
1566 | my $code = shift;
|
---|
1567 | my @idx = 0..$#_;
|
---|
1568 | while ( $code->(@_[@idx]) ) {
|
---|
1569 | my $p = $#idx;
|
---|
1570 | --$p while $idx[$p-1] > $idx[$p];
|
---|
1571 | my $q = $p or return;
|
---|
1572 | push @idx, reverse splice @idx, $p;
|
---|
1573 | ++$q while $idx[$p-1] > $idx[$q];
|
---|
1574 | @idx[$p-1,$q]=@idx[$q,$p-1];
|
---|
1575 | }
|
---|
1576 | }
|
---|
1577 |
|
---|
1578 | permute {print"@_\n"} split;
|
---|
1579 |
|
---|
1580 | =head2 How do I sort an array by (anything)?
|
---|
1581 |
|
---|
1582 | Supply a comparison function to sort() (described in L<perlfunc/sort>):
|
---|
1583 |
|
---|
1584 | @list = sort { $a <=> $b } @list;
|
---|
1585 |
|
---|
1586 | The default sort function is cmp, string comparison, which would
|
---|
1587 | sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
|
---|
1588 | the numerical comparison operator.
|
---|
1589 |
|
---|
1590 | If you have a complicated function needed to pull out the part you
|
---|
1591 | want to sort on, then don't do it inside the sort function. Pull it
|
---|
1592 | out first, because the sort BLOCK can be called many times for the
|
---|
1593 | same element. Here's an example of how to pull out the first word
|
---|
1594 | after the first number on each item, and then sort those words
|
---|
1595 | case-insensitively.
|
---|
1596 |
|
---|
1597 | @idx = ();
|
---|
1598 | for (@data) {
|
---|
1599 | ($item) = /\d+\s*(\S+)/;
|
---|
1600 | push @idx, uc($item);
|
---|
1601 | }
|
---|
1602 | @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
|
---|
1603 |
|
---|
1604 | which could also be written this way, using a trick
|
---|
1605 | that's come to be known as the Schwartzian Transform:
|
---|
1606 |
|
---|
1607 | @sorted = map { $_->[0] }
|
---|
1608 | sort { $a->[1] cmp $b->[1] }
|
---|
1609 | map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
|
---|
1610 |
|
---|
1611 | If you need to sort on several fields, the following paradigm is useful.
|
---|
1612 |
|
---|
1613 | @sorted = sort { field1($a) <=> field1($b) ||
|
---|
1614 | field2($a) cmp field2($b) ||
|
---|
1615 | field3($a) cmp field3($b)
|
---|
1616 | } @data;
|
---|
1617 |
|
---|
1618 | This can be conveniently combined with precalculation of keys as given
|
---|
1619 | above.
|
---|
1620 |
|
---|
1621 | See the F<sort> article in the "Far More Than You Ever Wanted
|
---|
1622 | To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
|
---|
1623 | more about this approach.
|
---|
1624 |
|
---|
1625 | See also the question below on sorting hashes.
|
---|
1626 |
|
---|
1627 | =head2 How do I manipulate arrays of bits?
|
---|
1628 |
|
---|
1629 | Use pack() and unpack(), or else vec() and the bitwise operations.
|
---|
1630 |
|
---|
1631 | For example, this sets $vec to have bit N set if $ints[N] was set:
|
---|
1632 |
|
---|
1633 | $vec = '';
|
---|
1634 | foreach(@ints) { vec($vec,$_,1) = 1 }
|
---|
1635 |
|
---|
1636 | Here's how, given a vector in $vec, you can
|
---|
1637 | get those bits into your @ints array:
|
---|
1638 |
|
---|
1639 | sub bitvec_to_list {
|
---|
1640 | my $vec = shift;
|
---|
1641 | my @ints;
|
---|
1642 | # Find null-byte density then select best algorithm
|
---|
1643 | if ($vec =~ tr/\0// / length $vec > 0.95) {
|
---|
1644 | use integer;
|
---|
1645 | my $i;
|
---|
1646 | # This method is faster with mostly null-bytes
|
---|
1647 | while($vec =~ /[^\0]/g ) {
|
---|
1648 | $i = -9 + 8 * pos $vec;
|
---|
1649 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1650 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1651 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1652 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1653 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1654 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1655 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1656 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
1657 | }
|
---|
1658 | } else {
|
---|
1659 | # This method is a fast general algorithm
|
---|
1660 | use integer;
|
---|
1661 | my $bits = unpack "b*", $vec;
|
---|
1662 | push @ints, 0 if $bits =~ s/^(\d)// && $1;
|
---|
1663 | push @ints, pos $bits while($bits =~ /1/g);
|
---|
1664 | }
|
---|
1665 | return \@ints;
|
---|
1666 | }
|
---|
1667 |
|
---|
1668 | This method gets faster the more sparse the bit vector is.
|
---|
1669 | (Courtesy of Tim Bunce and Winfried Koenig.)
|
---|
1670 |
|
---|
1671 | You can make the while loop a lot shorter with this suggestion
|
---|
1672 | from Benjamin Goldberg:
|
---|
1673 |
|
---|
1674 | while($vec =~ /[^\0]+/g ) {
|
---|
1675 | push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
|
---|
1676 | }
|
---|
1677 |
|
---|
1678 | Or use the CPAN module Bit::Vector:
|
---|
1679 |
|
---|
1680 | $vector = Bit::Vector->new($num_of_bits);
|
---|
1681 | $vector->Index_List_Store(@ints);
|
---|
1682 | @ints = $vector->Index_List_Read();
|
---|
1683 |
|
---|
1684 | Bit::Vector provides efficient methods for bit vector, sets of small integers
|
---|
1685 | and "big int" math.
|
---|
1686 |
|
---|
1687 | Here's a more extensive illustration using vec():
|
---|
1688 |
|
---|
1689 | # vec demo
|
---|
1690 | $vector = "\xff\x0f\xef\xfe";
|
---|
1691 | print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
|
---|
1692 | unpack("N", $vector), "\n";
|
---|
1693 | $is_set = vec($vector, 23, 1);
|
---|
1694 | print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
|
---|
1695 | pvec($vector);
|
---|
1696 |
|
---|
1697 | set_vec(1,1,1);
|
---|
1698 | set_vec(3,1,1);
|
---|
1699 | set_vec(23,1,1);
|
---|
1700 |
|
---|
1701 | set_vec(3,1,3);
|
---|
1702 | set_vec(3,2,3);
|
---|
1703 | set_vec(3,4,3);
|
---|
1704 | set_vec(3,4,7);
|
---|
1705 | set_vec(3,8,3);
|
---|
1706 | set_vec(3,8,7);
|
---|
1707 |
|
---|
1708 | set_vec(0,32,17);
|
---|
1709 | set_vec(1,32,17);
|
---|
1710 |
|
---|
1711 | sub set_vec {
|
---|
1712 | my ($offset, $width, $value) = @_;
|
---|
1713 | my $vector = '';
|
---|
1714 | vec($vector, $offset, $width) = $value;
|
---|
1715 | print "offset=$offset width=$width value=$value\n";
|
---|
1716 | pvec($vector);
|
---|
1717 | }
|
---|
1718 |
|
---|
1719 | sub pvec {
|
---|
1720 | my $vector = shift;
|
---|
1721 | my $bits = unpack("b*", $vector);
|
---|
1722 | my $i = 0;
|
---|
1723 | my $BASE = 8;
|
---|
1724 |
|
---|
1725 | print "vector length in bytes: ", length($vector), "\n";
|
---|
1726 | @bytes = unpack("A8" x length($vector), $bits);
|
---|
1727 | print "bits are: @bytes\n\n";
|
---|
1728 | }
|
---|
1729 |
|
---|
1730 | =head2 Why does defined() return true on empty arrays and hashes?
|
---|
1731 |
|
---|
1732 | The short story is that you should probably only use defined on scalars or
|
---|
1733 | functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
|
---|
1734 | in the 5.004 release or later of Perl for more detail.
|
---|
1735 |
|
---|
1736 | =head1 Data: Hashes (Associative Arrays)
|
---|
1737 |
|
---|
1738 | =head2 How do I process an entire hash?
|
---|
1739 |
|
---|
1740 | Use the each() function (see L<perlfunc/each>) if you don't care
|
---|
1741 | whether it's sorted:
|
---|
1742 |
|
---|
1743 | while ( ($key, $value) = each %hash) {
|
---|
1744 | print "$key = $value\n";
|
---|
1745 | }
|
---|
1746 |
|
---|
1747 | If you want it sorted, you'll have to use foreach() on the result of
|
---|
1748 | sorting the keys as shown in an earlier question.
|
---|
1749 |
|
---|
1750 | =head2 What happens if I add or remove keys from a hash while iterating over it?
|
---|
1751 |
|
---|
1752 | (contributed by brian d foy)
|
---|
1753 |
|
---|
1754 | The easy answer is "Don't do that!"
|
---|
1755 |
|
---|
1756 | If you iterate through the hash with each(), you can delete the key
|
---|
1757 | most recently returned without worrying about it. If you delete or add
|
---|
1758 | other keys, the iterator may skip or double up on them since perl
|
---|
1759 | may rearrange the hash table. See the
|
---|
1760 | entry for C<each()> in L<perlfunc>.
|
---|
1761 |
|
---|
1762 | =head2 How do I look up a hash element by value?
|
---|
1763 |
|
---|
1764 | Create a reverse hash:
|
---|
1765 |
|
---|
1766 | %by_value = reverse %by_key;
|
---|
1767 | $key = $by_value{$value};
|
---|
1768 |
|
---|
1769 | That's not particularly efficient. It would be more space-efficient
|
---|
1770 | to use:
|
---|
1771 |
|
---|
1772 | while (($key, $value) = each %by_key) {
|
---|
1773 | $by_value{$value} = $key;
|
---|
1774 | }
|
---|
1775 |
|
---|
1776 | If your hash could have repeated values, the methods above will only find
|
---|
1777 | one of the associated keys. This may or may not worry you. If it does
|
---|
1778 | worry you, you can always reverse the hash into a hash of arrays instead:
|
---|
1779 |
|
---|
1780 | while (($key, $value) = each %by_key) {
|
---|
1781 | push @{$key_list_by_value{$value}}, $key;
|
---|
1782 | }
|
---|
1783 |
|
---|
1784 | =head2 How can I know how many entries are in a hash?
|
---|
1785 |
|
---|
1786 | If you mean how many keys, then all you have to do is
|
---|
1787 | use the keys() function in a scalar context:
|
---|
1788 |
|
---|
1789 | $num_keys = keys %hash;
|
---|
1790 |
|
---|
1791 | The keys() function also resets the iterator, which means that you may
|
---|
1792 | see strange results if you use this between uses of other hash operators
|
---|
1793 | such as each().
|
---|
1794 |
|
---|
1795 | =head2 How do I sort a hash (optionally by value instead of key)?
|
---|
1796 |
|
---|
1797 | (contributed by brian d foy)
|
---|
1798 |
|
---|
1799 | To sort a hash, start with the keys. In this example, we give the list of
|
---|
1800 | keys to the sort function which then compares them ASCIIbetically (which
|
---|
1801 | might be affected by your locale settings). The output list has the keys
|
---|
1802 | in ASCIIbetical order. Once we have the keys, we can go through them to
|
---|
1803 | create a report which lists the keys in ASCIIbetical order.
|
---|
1804 |
|
---|
1805 | my @keys = sort { $a cmp $b } keys %hash;
|
---|
1806 |
|
---|
1807 | foreach my $key ( @keys )
|
---|
1808 | {
|
---|
1809 | printf "%-20s %6d\n", $key, $hash{$value};
|
---|
1810 | }
|
---|
1811 |
|
---|
1812 | We could get more fancy in the C<sort()> block though. Instead of
|
---|
1813 | comparing the keys, we can compute a value with them and use that
|
---|
1814 | value as the comparison.
|
---|
1815 |
|
---|
1816 | For instance, to make our report order case-insensitive, we use
|
---|
1817 | the C<\L> sequence in a double-quoted string to make everything
|
---|
1818 | lowercase. The C<sort()> block then compares the lowercased
|
---|
1819 | values to determine in which order to put the keys.
|
---|
1820 |
|
---|
1821 | my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
|
---|
1822 |
|
---|
1823 | Note: if the computation is expensive or the hash has many elements,
|
---|
1824 | you may want to look at the Schwartzian Transform to cache the
|
---|
1825 | computation results.
|
---|
1826 |
|
---|
1827 | If we want to sort by the hash value instead, we use the hash key
|
---|
1828 | to look it up. We still get out a list of keys, but this time they
|
---|
1829 | are ordered by their value.
|
---|
1830 |
|
---|
1831 | my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
|
---|
1832 |
|
---|
1833 | From there we can get more complex. If the hash values are the same,
|
---|
1834 | we can provide a secondary sort on the hash key.
|
---|
1835 |
|
---|
1836 | my @keys = sort {
|
---|
1837 | $hash{$a} <=> $hash{$b}
|
---|
1838 | or
|
---|
1839 | "\L$a" cmp "\L$b"
|
---|
1840 | } keys %hash;
|
---|
1841 |
|
---|
1842 | =head2 How can I always keep my hash sorted?
|
---|
1843 |
|
---|
1844 | You can look into using the DB_File module and tie() using the
|
---|
1845 | $DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
|
---|
1846 | The Tie::IxHash module from CPAN might also be instructive.
|
---|
1847 |
|
---|
1848 | =head2 What's the difference between "delete" and "undef" with hashes?
|
---|
1849 |
|
---|
1850 | Hashes contain pairs of scalars: the first is the key, the
|
---|
1851 | second is the value. The key will be coerced to a string,
|
---|
1852 | although the value can be any kind of scalar: string,
|
---|
1853 | number, or reference. If a key $key is present in
|
---|
1854 | %hash, C<exists($hash{$key})> will return true. The value
|
---|
1855 | for a given key can be C<undef>, in which case
|
---|
1856 | C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
|
---|
1857 | will return true. This corresponds to (C<$key>, C<undef>)
|
---|
1858 | being in the hash.
|
---|
1859 |
|
---|
1860 | Pictures help... here's the %hash table:
|
---|
1861 |
|
---|
1862 | keys values
|
---|
1863 | +------+------+
|
---|
1864 | | a | 3 |
|
---|
1865 | | x | 7 |
|
---|
1866 | | d | 0 |
|
---|
1867 | | e | 2 |
|
---|
1868 | +------+------+
|
---|
1869 |
|
---|
1870 | And these conditions hold
|
---|
1871 |
|
---|
1872 | $hash{'a'} is true
|
---|
1873 | $hash{'d'} is false
|
---|
1874 | defined $hash{'d'} is true
|
---|
1875 | defined $hash{'a'} is true
|
---|
1876 | exists $hash{'a'} is true (Perl5 only)
|
---|
1877 | grep ($_ eq 'a', keys %hash) is true
|
---|
1878 |
|
---|
1879 | If you now say
|
---|
1880 |
|
---|
1881 | undef $hash{'a'}
|
---|
1882 |
|
---|
1883 | your table now reads:
|
---|
1884 |
|
---|
1885 |
|
---|
1886 | keys values
|
---|
1887 | +------+------+
|
---|
1888 | | a | undef|
|
---|
1889 | | x | 7 |
|
---|
1890 | | d | 0 |
|
---|
1891 | | e | 2 |
|
---|
1892 | +------+------+
|
---|
1893 |
|
---|
1894 | and these conditions now hold; changes in caps:
|
---|
1895 |
|
---|
1896 | $hash{'a'} is FALSE
|
---|
1897 | $hash{'d'} is false
|
---|
1898 | defined $hash{'d'} is true
|
---|
1899 | defined $hash{'a'} is FALSE
|
---|
1900 | exists $hash{'a'} is true (Perl5 only)
|
---|
1901 | grep ($_ eq 'a', keys %hash) is true
|
---|
1902 |
|
---|
1903 | Notice the last two: you have an undef value, but a defined key!
|
---|
1904 |
|
---|
1905 | Now, consider this:
|
---|
1906 |
|
---|
1907 | delete $hash{'a'}
|
---|
1908 |
|
---|
1909 | your table now reads:
|
---|
1910 |
|
---|
1911 | keys values
|
---|
1912 | +------+------+
|
---|
1913 | | x | 7 |
|
---|
1914 | | d | 0 |
|
---|
1915 | | e | 2 |
|
---|
1916 | +------+------+
|
---|
1917 |
|
---|
1918 | and these conditions now hold; changes in caps:
|
---|
1919 |
|
---|
1920 | $hash{'a'} is false
|
---|
1921 | $hash{'d'} is false
|
---|
1922 | defined $hash{'d'} is true
|
---|
1923 | defined $hash{'a'} is false
|
---|
1924 | exists $hash{'a'} is FALSE (Perl5 only)
|
---|
1925 | grep ($_ eq 'a', keys %hash) is FALSE
|
---|
1926 |
|
---|
1927 | See, the whole entry is gone!
|
---|
1928 |
|
---|
1929 | =head2 Why don't my tied hashes make the defined/exists distinction?
|
---|
1930 |
|
---|
1931 | This depends on the tied hash's implementation of EXISTS().
|
---|
1932 | For example, there isn't the concept of undef with hashes
|
---|
1933 | that are tied to DBM* files. It also means that exists() and
|
---|
1934 | defined() do the same thing with a DBM* file, and what they
|
---|
1935 | end up doing is not what they do with ordinary hashes.
|
---|
1936 |
|
---|
1937 | =head2 How do I reset an each() operation part-way through?
|
---|
1938 |
|
---|
1939 | Using C<keys %hash> in scalar context returns the number of keys in
|
---|
1940 | the hash I<and> resets the iterator associated with the hash. You may
|
---|
1941 | need to do this if you use C<last> to exit a loop early so that when you
|
---|
1942 | re-enter it, the hash iterator has been reset.
|
---|
1943 |
|
---|
1944 | =head2 How can I get the unique keys from two hashes?
|
---|
1945 |
|
---|
1946 | First you extract the keys from the hashes into lists, then solve
|
---|
1947 | the "removing duplicates" problem described above. For example:
|
---|
1948 |
|
---|
1949 | %seen = ();
|
---|
1950 | for $element (keys(%foo), keys(%bar)) {
|
---|
1951 | $seen{$element}++;
|
---|
1952 | }
|
---|
1953 | @uniq = keys %seen;
|
---|
1954 |
|
---|
1955 | Or more succinctly:
|
---|
1956 |
|
---|
1957 | @uniq = keys %{{%foo,%bar}};
|
---|
1958 |
|
---|
1959 | Or if you really want to save space:
|
---|
1960 |
|
---|
1961 | %seen = ();
|
---|
1962 | while (defined ($key = each %foo)) {
|
---|
1963 | $seen{$key}++;
|
---|
1964 | }
|
---|
1965 | while (defined ($key = each %bar)) {
|
---|
1966 | $seen{$key}++;
|
---|
1967 | }
|
---|
1968 | @uniq = keys %seen;
|
---|
1969 |
|
---|
1970 | =head2 How can I store a multidimensional array in a DBM file?
|
---|
1971 |
|
---|
1972 | Either stringify the structure yourself (no fun), or else
|
---|
1973 | get the MLDBM (which uses Data::Dumper) module from CPAN and layer
|
---|
1974 | it on top of either DB_File or GDBM_File.
|
---|
1975 |
|
---|
1976 | =head2 How can I make my hash remember the order I put elements into it?
|
---|
1977 |
|
---|
1978 | Use the Tie::IxHash from CPAN.
|
---|
1979 |
|
---|
1980 | use Tie::IxHash;
|
---|
1981 | tie my %myhash, 'Tie::IxHash';
|
---|
1982 | for (my $i=0; $i<20; $i++) {
|
---|
1983 | $myhash{$i} = 2*$i;
|
---|
1984 | }
|
---|
1985 | my @keys = keys %myhash;
|
---|
1986 | # @keys = (0,1,2,3,...)
|
---|
1987 |
|
---|
1988 | =head2 Why does passing a subroutine an undefined element in a hash create it?
|
---|
1989 |
|
---|
1990 | If you say something like:
|
---|
1991 |
|
---|
1992 | somefunc($hash{"nonesuch key here"});
|
---|
1993 |
|
---|
1994 | Then that element "autovivifies"; that is, it springs into existence
|
---|
1995 | whether you store something there or not. That's because functions
|
---|
1996 | get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
|
---|
1997 | it has to be ready to write it back into the caller's version.
|
---|
1998 |
|
---|
1999 | This has been fixed as of Perl5.004.
|
---|
2000 |
|
---|
2001 | Normally, merely accessing a key's value for a nonexistent key does
|
---|
2002 | I<not> cause that key to be forever there. This is different than
|
---|
2003 | awk's behavior.
|
---|
2004 |
|
---|
2005 | =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
|
---|
2006 |
|
---|
2007 | Usually a hash ref, perhaps like this:
|
---|
2008 |
|
---|
2009 | $record = {
|
---|
2010 | NAME => "Jason",
|
---|
2011 | EMPNO => 132,
|
---|
2012 | TITLE => "deputy peon",
|
---|
2013 | AGE => 23,
|
---|
2014 | SALARY => 37_000,
|
---|
2015 | PALS => [ "Norbert", "Rhys", "Phineas"],
|
---|
2016 | };
|
---|
2017 |
|
---|
2018 | References are documented in L<perlref> and the upcoming L<perlreftut>.
|
---|
2019 | Examples of complex data structures are given in L<perldsc> and
|
---|
2020 | L<perllol>. Examples of structures and object-oriented classes are
|
---|
2021 | in L<perltoot>.
|
---|
2022 |
|
---|
2023 | =head2 How can I use a reference as a hash key?
|
---|
2024 |
|
---|
2025 | (contributed by brian d foy)
|
---|
2026 |
|
---|
2027 | Hash keys are strings, so you can't really use a reference as the key.
|
---|
2028 | When you try to do that, perl turns the reference into its stringified
|
---|
2029 | form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get back
|
---|
2030 | the reference from the stringified form, at least without doing some
|
---|
2031 | extra work on your own. Also remember that hash keys must be unique, but
|
---|
2032 | two different variables can store the same reference (and those variables
|
---|
2033 | can change later).
|
---|
2034 |
|
---|
2035 | The Tie::RefHash module, which is distributed with perl, might be what
|
---|
2036 | you want. It handles that extra work.
|
---|
2037 |
|
---|
2038 | =head1 Data: Misc
|
---|
2039 |
|
---|
2040 | =head2 How do I handle binary data correctly?
|
---|
2041 |
|
---|
2042 | Perl is binary clean, so this shouldn't be a problem. For example,
|
---|
2043 | this works fine (assuming the files are found):
|
---|
2044 |
|
---|
2045 | if (`cat /vmunix` =~ /gzip/) {
|
---|
2046 | print "Your kernel is GNU-zip enabled!\n";
|
---|
2047 | }
|
---|
2048 |
|
---|
2049 | On less elegant (read: Byzantine) systems, however, you have
|
---|
2050 | to play tedious games with "text" versus "binary" files. See
|
---|
2051 | L<perlfunc/"binmode"> or L<perlopentut>.
|
---|
2052 |
|
---|
2053 | If you're concerned about 8-bit ASCII data, then see L<perllocale>.
|
---|
2054 |
|
---|
2055 | If you want to deal with multibyte characters, however, there are
|
---|
2056 | some gotchas. See the section on Regular Expressions.
|
---|
2057 |
|
---|
2058 | =head2 How do I determine whether a scalar is a number/whole/integer/float?
|
---|
2059 |
|
---|
2060 | Assuming that you don't care about IEEE notations like "NaN" or
|
---|
2061 | "Infinity", you probably just want to use a regular expression.
|
---|
2062 |
|
---|
2063 | if (/\D/) { print "has nondigits\n" }
|
---|
2064 | if (/^\d+$/) { print "is a whole number\n" }
|
---|
2065 | if (/^-?\d+$/) { print "is an integer\n" }
|
---|
2066 | if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
|
---|
2067 | if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
|
---|
2068 | if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
|
---|
2069 | if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
|
---|
2070 | { print "a C float\n" }
|
---|
2071 |
|
---|
2072 | There are also some commonly used modules for the task.
|
---|
2073 | L<Scalar::Util> (distributed with 5.8) provides access to perl's
|
---|
2074 | internal function C<looks_like_number> for determining
|
---|
2075 | whether a variable looks like a number. L<Data::Types>
|
---|
2076 | exports functions that validate data types using both the
|
---|
2077 | above and other regular expressions. Thirdly, there is
|
---|
2078 | C<Regexp::Common> which has regular expressions to match
|
---|
2079 | various types of numbers. Those three modules are available
|
---|
2080 | from the CPAN.
|
---|
2081 |
|
---|
2082 | If you're on a POSIX system, Perl supports the C<POSIX::strtod>
|
---|
2083 | function. Its semantics are somewhat cumbersome, so here's a C<getnum>
|
---|
2084 | wrapper function for more convenient access. This function takes
|
---|
2085 | a string and returns the number it found, or C<undef> for input that
|
---|
2086 | isn't a C float. The C<is_numeric> function is a front end to C<getnum>
|
---|
2087 | if you just want to say, "Is this a float?"
|
---|
2088 |
|
---|
2089 | sub getnum {
|
---|
2090 | use POSIX qw(strtod);
|
---|
2091 | my $str = shift;
|
---|
2092 | $str =~ s/^\s+//;
|
---|
2093 | $str =~ s/\s+$//;
|
---|
2094 | $! = 0;
|
---|
2095 | my($num, $unparsed) = strtod($str);
|
---|
2096 | if (($str eq '') || ($unparsed != 0) || $!) {
|
---|
2097 | return undef;
|
---|
2098 | } else {
|
---|
2099 | return $num;
|
---|
2100 | }
|
---|
2101 | }
|
---|
2102 |
|
---|
2103 | sub is_numeric { defined getnum($_[0]) }
|
---|
2104 |
|
---|
2105 | Or you could check out the L<String::Scanf> module on the CPAN
|
---|
2106 | instead. The POSIX module (part of the standard Perl distribution) provides
|
---|
2107 | the C<strtod> and C<strtol> for converting strings to double and longs,
|
---|
2108 | respectively.
|
---|
2109 |
|
---|
2110 | =head2 How do I keep persistent data across program calls?
|
---|
2111 |
|
---|
2112 | For some specific applications, you can use one of the DBM modules.
|
---|
2113 | See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
|
---|
2114 | or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
|
---|
2115 | of the standard distribution. Here's one example using Storable's C<store>
|
---|
2116 | and C<retrieve> functions:
|
---|
2117 |
|
---|
2118 | use Storable;
|
---|
2119 | store(\%hash, "filename");
|
---|
2120 |
|
---|
2121 | # later on...
|
---|
2122 | $href = retrieve("filename"); # by ref
|
---|
2123 | %hash = %{ retrieve("filename") }; # direct to hash
|
---|
2124 |
|
---|
2125 | =head2 How do I print out or copy a recursive data structure?
|
---|
2126 |
|
---|
2127 | The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
|
---|
2128 | for printing out data structures. The Storable module on CPAN (or the
|
---|
2129 | 5.8 release of Perl), provides a function called C<dclone> that recursively
|
---|
2130 | copies its argument.
|
---|
2131 |
|
---|
2132 | use Storable qw(dclone);
|
---|
2133 | $r2 = dclone($r1);
|
---|
2134 |
|
---|
2135 | Where $r1 can be a reference to any kind of data structure you'd like.
|
---|
2136 | It will be deeply copied. Because C<dclone> takes and returns references,
|
---|
2137 | you'd have to add extra punctuation if you had a hash of arrays that
|
---|
2138 | you wanted to copy.
|
---|
2139 |
|
---|
2140 | %newhash = %{ dclone(\%oldhash) };
|
---|
2141 |
|
---|
2142 | =head2 How do I define methods for every class/object?
|
---|
2143 |
|
---|
2144 | Use the UNIVERSAL class (see L<UNIVERSAL>).
|
---|
2145 |
|
---|
2146 | =head2 How do I verify a credit card checksum?
|
---|
2147 |
|
---|
2148 | Get the Business::CreditCard module from CPAN.
|
---|
2149 |
|
---|
2150 | =head2 How do I pack arrays of doubles or floats for XS code?
|
---|
2151 |
|
---|
2152 | The kgbpack.c code in the PGPLOT module on CPAN does just this.
|
---|
2153 | If you're doing a lot of float or double processing, consider using
|
---|
2154 | the PDL module from CPAN instead--it makes number-crunching easy.
|
---|
2155 |
|
---|
2156 | =head1 AUTHOR AND COPYRIGHT
|
---|
2157 |
|
---|
2158 | Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and
|
---|
2159 | other authors as noted. All rights reserved.
|
---|
2160 |
|
---|
2161 | This documentation is free; you can redistribute it and/or modify it
|
---|
2162 | under the same terms as Perl itself.
|
---|
2163 |
|
---|
2164 | Irrespective of its distribution, all code examples in this file
|
---|
2165 | are hereby placed into the public domain. You are permitted and
|
---|
2166 | encouraged to use this code in your own programs for fun
|
---|
2167 | or for profit as you see fit. A simple comment in the code giving
|
---|
2168 | credit would be courteous but is not required.
|
---|