[14489] | 1 | =head1 NAME
|
---|
| 2 |
|
---|
| 3 | perlfaq4 - Data Manipulation ($Revision: 1.73 $, $Date: 2005/12/31 00:54:37 $)
|
---|
| 4 |
|
---|
| 5 | =head1 DESCRIPTION
|
---|
| 6 |
|
---|
| 7 | This section of the FAQ answers questions related to manipulating
|
---|
| 8 | numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
|
---|
| 9 |
|
---|
| 10 | =head1 Data: Numbers
|
---|
| 11 |
|
---|
| 12 | =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
|
---|
| 13 |
|
---|
| 14 | Internally, your computer represents floating-point numbers
|
---|
| 15 | in binary. Digital (as in powers of two) computers cannot
|
---|
| 16 | store all numbers exactly. Some real numbers lose precision
|
---|
| 17 | in the process. This is a problem with how computers store
|
---|
| 18 | numbers and affects all computer languages, not just Perl.
|
---|
| 19 |
|
---|
| 20 | L<perlnumber> show the gory details of number
|
---|
| 21 | representations and conversions.
|
---|
| 22 |
|
---|
| 23 | To limit the number of decimal places in your numbers, you
|
---|
| 24 | can use the printf or sprintf function. See the
|
---|
| 25 | L<"Floating Point Arithmetic"|perlop> for more details.
|
---|
| 26 |
|
---|
| 27 | printf "%.2f", 10/3;
|
---|
| 28 |
|
---|
| 29 | my $number = sprintf "%.2f", 10/3;
|
---|
| 30 |
|
---|
| 31 | =head2 Why is int() broken?
|
---|
| 32 |
|
---|
| 33 | Your int() is most probably working just fine. It's the numbers that
|
---|
| 34 | aren't quite what you think.
|
---|
| 35 |
|
---|
| 36 | First, see the above item "Why am I getting long decimals
|
---|
| 37 | (eg, 19.9499999999999) instead of the numbers I should be getting
|
---|
| 38 | (eg, 19.95)?".
|
---|
| 39 |
|
---|
| 40 | For example, this
|
---|
| 41 |
|
---|
| 42 | print int(0.6/0.2-2), "\n";
|
---|
| 43 |
|
---|
| 44 | will in most computers print 0, not 1, because even such simple
|
---|
| 45 | numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
|
---|
| 46 | numbers. What you think in the above as 'three' is really more like
|
---|
| 47 | 2.9999999999999995559.
|
---|
| 48 |
|
---|
| 49 | =head2 Why isn't my octal data interpreted correctly?
|
---|
| 50 |
|
---|
| 51 | Perl only understands octal and hex numbers as such when they occur as
|
---|
| 52 | literals in your program. Octal literals in perl must start with a
|
---|
| 53 | leading "0" and hexadecimal literals must start with a leading "0x".
|
---|
| 54 | If they are read in from somewhere and assigned, no automatic
|
---|
| 55 | conversion takes place. You must explicitly use oct() or hex() if you
|
---|
| 56 | want the values converted to decimal. oct() interprets hex ("0x350"),
|
---|
| 57 | octal ("0350" or even without the leading "0", like "377") and binary
|
---|
| 58 | ("0b1010") numbers, while hex() only converts hexadecimal ones, with
|
---|
| 59 | or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
|
---|
| 60 | The inverse mapping from decimal to octal can be done with either the
|
---|
| 61 | "%o" or "%O" sprintf() formats.
|
---|
| 62 |
|
---|
| 63 | This problem shows up most often when people try using chmod(), mkdir(),
|
---|
| 64 | umask(), or sysopen(), which by widespread tradition typically take
|
---|
| 65 | permissions in octal.
|
---|
| 66 |
|
---|
| 67 | chmod(644, $file); # WRONG
|
---|
| 68 | chmod(0644, $file); # right
|
---|
| 69 |
|
---|
| 70 | Note the mistake in the first line was specifying the decimal literal
|
---|
| 71 | 644, rather than the intended octal literal 0644. The problem can
|
---|
| 72 | be seen with:
|
---|
| 73 |
|
---|
| 74 | printf("%#o",644); # prints 01204
|
---|
| 75 |
|
---|
| 76 | Surely you had not intended C<chmod(01204, $file);> - did you? If you
|
---|
| 77 | want to use numeric literals as arguments to chmod() et al. then please
|
---|
| 78 | try to express them as octal constants, that is with a leading zero and
|
---|
| 79 | with the following digits restricted to the set 0..7.
|
---|
| 80 |
|
---|
| 81 | =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
|
---|
| 82 |
|
---|
| 83 | Remember that int() merely truncates toward 0. For rounding to a
|
---|
| 84 | certain number of digits, sprintf() or printf() is usually the easiest
|
---|
| 85 | route.
|
---|
| 86 |
|
---|
| 87 | printf("%.3f", 3.1415926535); # prints 3.142
|
---|
| 88 |
|
---|
| 89 | The POSIX module (part of the standard Perl distribution) implements
|
---|
| 90 | ceil(), floor(), and a number of other mathematical and trigonometric
|
---|
| 91 | functions.
|
---|
| 92 |
|
---|
| 93 | use POSIX;
|
---|
| 94 | $ceil = ceil(3.5); # 4
|
---|
| 95 | $floor = floor(3.5); # 3
|
---|
| 96 |
|
---|
| 97 | In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
|
---|
| 98 | module. With 5.004, the Math::Trig module (part of the standard Perl
|
---|
| 99 | distribution) implements the trigonometric functions. Internally it
|
---|
| 100 | uses the Math::Complex module and some functions can break out from
|
---|
| 101 | the real axis into the complex plane, for example the inverse sine of
|
---|
| 102 | 2.
|
---|
| 103 |
|
---|
| 104 | Rounding in financial applications can have serious implications, and
|
---|
| 105 | the rounding method used should be specified precisely. In these
|
---|
| 106 | cases, it probably pays not to trust whichever system rounding is
|
---|
| 107 | being used by Perl, but to instead implement the rounding function you
|
---|
| 108 | need yourself.
|
---|
| 109 |
|
---|
| 110 | To see why, notice how you'll still have an issue on half-way-point
|
---|
| 111 | alternation:
|
---|
| 112 |
|
---|
| 113 | for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
|
---|
| 114 |
|
---|
| 115 | 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
|
---|
| 116 | 0.8 0.8 0.9 0.9 1.0 1.0
|
---|
| 117 |
|
---|
| 118 | Don't blame Perl. It's the same as in C. IEEE says we have to do this.
|
---|
| 119 | Perl numbers whose absolute values are integers under 2**31 (on 32 bit
|
---|
| 120 | machines) will work pretty much like mathematical integers. Other numbers
|
---|
| 121 | are not guaranteed.
|
---|
| 122 |
|
---|
| 123 | =head2 How do I convert between numeric representations/bases/radixes?
|
---|
| 124 |
|
---|
| 125 | As always with Perl there is more than one way to do it. Below
|
---|
| 126 | are a few examples of approaches to making common conversions
|
---|
| 127 | between number representations. This is intended to be representational
|
---|
| 128 | rather than exhaustive.
|
---|
| 129 |
|
---|
| 130 | Some of the examples below use the Bit::Vector module from CPAN.
|
---|
| 131 | The reason you might choose Bit::Vector over the perl built in
|
---|
| 132 | functions is that it works with numbers of ANY size, that it is
|
---|
| 133 | optimized for speed on some operations, and for at least some
|
---|
| 134 | programmers the notation might be familiar.
|
---|
| 135 |
|
---|
| 136 | =over 4
|
---|
| 137 |
|
---|
| 138 | =item How do I convert hexadecimal into decimal
|
---|
| 139 |
|
---|
| 140 | Using perl's built in conversion of 0x notation:
|
---|
| 141 |
|
---|
| 142 | $dec = 0xDEADBEEF;
|
---|
| 143 |
|
---|
| 144 | Using the hex function:
|
---|
| 145 |
|
---|
| 146 | $dec = hex("DEADBEEF");
|
---|
| 147 |
|
---|
| 148 | Using pack:
|
---|
| 149 |
|
---|
| 150 | $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
|
---|
| 151 |
|
---|
| 152 | Using the CPAN module Bit::Vector:
|
---|
| 153 |
|
---|
| 154 | use Bit::Vector;
|
---|
| 155 | $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
|
---|
| 156 | $dec = $vec->to_Dec();
|
---|
| 157 |
|
---|
| 158 | =item How do I convert from decimal to hexadecimal
|
---|
| 159 |
|
---|
| 160 | Using sprintf:
|
---|
| 161 |
|
---|
| 162 | $hex = sprintf("%X", 3735928559); # upper case A-F
|
---|
| 163 | $hex = sprintf("%x", 3735928559); # lower case a-f
|
---|
| 164 |
|
---|
| 165 | Using unpack:
|
---|
| 166 |
|
---|
| 167 | $hex = unpack("H*", pack("N", 3735928559));
|
---|
| 168 |
|
---|
| 169 | Using Bit::Vector:
|
---|
| 170 |
|
---|
| 171 | use Bit::Vector;
|
---|
| 172 | $vec = Bit::Vector->new_Dec(32, -559038737);
|
---|
| 173 | $hex = $vec->to_Hex();
|
---|
| 174 |
|
---|
| 175 | And Bit::Vector supports odd bit counts:
|
---|
| 176 |
|
---|
| 177 | use Bit::Vector;
|
---|
| 178 | $vec = Bit::Vector->new_Dec(33, 3735928559);
|
---|
| 179 | $vec->Resize(32); # suppress leading 0 if unwanted
|
---|
| 180 | $hex = $vec->to_Hex();
|
---|
| 181 |
|
---|
| 182 | =item How do I convert from octal to decimal
|
---|
| 183 |
|
---|
| 184 | Using Perl's built in conversion of numbers with leading zeros:
|
---|
| 185 |
|
---|
| 186 | $dec = 033653337357; # note the leading 0!
|
---|
| 187 |
|
---|
| 188 | Using the oct function:
|
---|
| 189 |
|
---|
| 190 | $dec = oct("33653337357");
|
---|
| 191 |
|
---|
| 192 | Using Bit::Vector:
|
---|
| 193 |
|
---|
| 194 | use Bit::Vector;
|
---|
| 195 | $vec = Bit::Vector->new(32);
|
---|
| 196 | $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
|
---|
| 197 | $dec = $vec->to_Dec();
|
---|
| 198 |
|
---|
| 199 | =item How do I convert from decimal to octal
|
---|
| 200 |
|
---|
| 201 | Using sprintf:
|
---|
| 202 |
|
---|
| 203 | $oct = sprintf("%o", 3735928559);
|
---|
| 204 |
|
---|
| 205 | Using Bit::Vector:
|
---|
| 206 |
|
---|
| 207 | use Bit::Vector;
|
---|
| 208 | $vec = Bit::Vector->new_Dec(32, -559038737);
|
---|
| 209 | $oct = reverse join('', $vec->Chunk_List_Read(3));
|
---|
| 210 |
|
---|
| 211 | =item How do I convert from binary to decimal
|
---|
| 212 |
|
---|
| 213 | Perl 5.6 lets you write binary numbers directly with
|
---|
| 214 | the 0b notation:
|
---|
| 215 |
|
---|
| 216 | $number = 0b10110110;
|
---|
| 217 |
|
---|
| 218 | Using oct:
|
---|
| 219 |
|
---|
| 220 | my $input = "10110110";
|
---|
| 221 | $decimal = oct( "0b$input" );
|
---|
| 222 |
|
---|
| 223 | Using pack and ord:
|
---|
| 224 |
|
---|
| 225 | $decimal = ord(pack('B8', '10110110'));
|
---|
| 226 |
|
---|
| 227 | Using pack and unpack for larger strings:
|
---|
| 228 |
|
---|
| 229 | $int = unpack("N", pack("B32",
|
---|
| 230 | substr("0" x 32 . "11110101011011011111011101111", -32)));
|
---|
| 231 | $dec = sprintf("%d", $int);
|
---|
| 232 |
|
---|
| 233 | # substr() is used to left pad a 32 character string with zeros.
|
---|
| 234 |
|
---|
| 235 | Using Bit::Vector:
|
---|
| 236 |
|
---|
| 237 | $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
|
---|
| 238 | $dec = $vec->to_Dec();
|
---|
| 239 |
|
---|
| 240 | =item How do I convert from decimal to binary
|
---|
| 241 |
|
---|
| 242 | Using sprintf (perl 5.6+):
|
---|
| 243 |
|
---|
| 244 | $bin = sprintf("%b", 3735928559);
|
---|
| 245 |
|
---|
| 246 | Using unpack:
|
---|
| 247 |
|
---|
| 248 | $bin = unpack("B*", pack("N", 3735928559));
|
---|
| 249 |
|
---|
| 250 | Using Bit::Vector:
|
---|
| 251 |
|
---|
| 252 | use Bit::Vector;
|
---|
| 253 | $vec = Bit::Vector->new_Dec(32, -559038737);
|
---|
| 254 | $bin = $vec->to_Bin();
|
---|
| 255 |
|
---|
| 256 | The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
|
---|
| 257 | are left as an exercise to the inclined reader.
|
---|
| 258 |
|
---|
| 259 | =back
|
---|
| 260 |
|
---|
| 261 | =head2 Why doesn't & work the way I want it to?
|
---|
| 262 |
|
---|
| 263 | The behavior of binary arithmetic operators depends on whether they're
|
---|
| 264 | used on numbers or strings. The operators treat a string as a series
|
---|
| 265 | of bits and work with that (the string C<"3"> is the bit pattern
|
---|
| 266 | C<00110011>). The operators work with the binary form of a number
|
---|
| 267 | (the number C<3> is treated as the bit pattern C<00000011>).
|
---|
| 268 |
|
---|
| 269 | So, saying C<11 & 3> performs the "and" operation on numbers (yielding
|
---|
| 270 | C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
|
---|
| 271 | (yielding C<"1">).
|
---|
| 272 |
|
---|
| 273 | Most problems with C<&> and C<|> arise because the programmer thinks
|
---|
| 274 | they have a number but really it's a string. The rest arise because
|
---|
| 275 | the programmer says:
|
---|
| 276 |
|
---|
| 277 | if ("\020\020" & "\101\101") {
|
---|
| 278 | # ...
|
---|
| 279 | }
|
---|
| 280 |
|
---|
| 281 | but a string consisting of two null bytes (the result of C<"\020\020"
|
---|
| 282 | & "\101\101">) is not a false value in Perl. You need:
|
---|
| 283 |
|
---|
| 284 | if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
|
---|
| 285 | # ...
|
---|
| 286 | }
|
---|
| 287 |
|
---|
| 288 | =head2 How do I multiply matrices?
|
---|
| 289 |
|
---|
| 290 | Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
|
---|
| 291 | or the PDL extension (also available from CPAN).
|
---|
| 292 |
|
---|
| 293 | =head2 How do I perform an operation on a series of integers?
|
---|
| 294 |
|
---|
| 295 | To call a function on each element in an array, and collect the
|
---|
| 296 | results, use:
|
---|
| 297 |
|
---|
| 298 | @results = map { my_func($_) } @array;
|
---|
| 299 |
|
---|
| 300 | For example:
|
---|
| 301 |
|
---|
| 302 | @triple = map { 3 * $_ } @single;
|
---|
| 303 |
|
---|
| 304 | To call a function on each element of an array, but ignore the
|
---|
| 305 | results:
|
---|
| 306 |
|
---|
| 307 | foreach $iterator (@array) {
|
---|
| 308 | some_func($iterator);
|
---|
| 309 | }
|
---|
| 310 |
|
---|
| 311 | To call a function on each integer in a (small) range, you B<can> use:
|
---|
| 312 |
|
---|
| 313 | @results = map { some_func($_) } (5 .. 25);
|
---|
| 314 |
|
---|
| 315 | but you should be aware that the C<..> operator creates an array of
|
---|
| 316 | all integers in the range. This can take a lot of memory for large
|
---|
| 317 | ranges. Instead use:
|
---|
| 318 |
|
---|
| 319 | @results = ();
|
---|
| 320 | for ($i=5; $i < 500_005; $i++) {
|
---|
| 321 | push(@results, some_func($i));
|
---|
| 322 | }
|
---|
| 323 |
|
---|
| 324 | This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
|
---|
| 325 | loop will iterate over the range, without creating the entire range.
|
---|
| 326 |
|
---|
| 327 | for my $i (5 .. 500_005) {
|
---|
| 328 | push(@results, some_func($i));
|
---|
| 329 | }
|
---|
| 330 |
|
---|
| 331 | will not create a list of 500,000 integers.
|
---|
| 332 |
|
---|
| 333 | =head2 How can I output Roman numerals?
|
---|
| 334 |
|
---|
| 335 | Get the http://www.cpan.org/modules/by-module/Roman module.
|
---|
| 336 |
|
---|
| 337 | =head2 Why aren't my random numbers random?
|
---|
| 338 |
|
---|
| 339 | If you're using a version of Perl before 5.004, you must call C<srand>
|
---|
| 340 | once at the start of your program to seed the random number generator.
|
---|
| 341 |
|
---|
| 342 | BEGIN { srand() if $] < 5.004 }
|
---|
| 343 |
|
---|
| 344 | 5.004 and later automatically call C<srand> at the beginning. Don't
|
---|
| 345 | call C<srand> more than once---you make your numbers less random, rather
|
---|
| 346 | than more.
|
---|
| 347 |
|
---|
| 348 | Computers are good at being predictable and bad at being random
|
---|
| 349 | (despite appearances caused by bugs in your programs :-). see the
|
---|
| 350 | F<random> article in the "Far More Than You Ever Wanted To Know"
|
---|
| 351 | collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
|
---|
| 352 | Tom Phoenix, talks more about this. John von Neumann said, "Anyone
|
---|
| 353 | who attempts to generate random numbers by deterministic means is, of
|
---|
| 354 | course, living in a state of sin."
|
---|
| 355 |
|
---|
| 356 | If you want numbers that are more random than C<rand> with C<srand>
|
---|
| 357 | provides, you should also check out the Math::TrulyRandom module from
|
---|
| 358 | CPAN. It uses the imperfections in your system's timer to generate
|
---|
| 359 | random numbers, but this takes quite a while. If you want a better
|
---|
| 360 | pseudorandom generator than comes with your operating system, look at
|
---|
| 361 | "Numerical Recipes in C" at http://www.nr.com/ .
|
---|
| 362 |
|
---|
| 363 | =head2 How do I get a random number between X and Y?
|
---|
| 364 |
|
---|
| 365 | C<rand($x)> returns a number such that
|
---|
| 366 | C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
|
---|
| 367 | figure out is a random number in the range from 0 to the
|
---|
| 368 | difference between your I<X> and I<Y>.
|
---|
| 369 |
|
---|
| 370 | That is, to get a number between 10 and 15, inclusive, you
|
---|
| 371 | want a random number between 0 and 5 that you can then add
|
---|
| 372 | to 10.
|
---|
| 373 |
|
---|
| 374 | my $number = 10 + int rand( 15-10+1 );
|
---|
| 375 |
|
---|
| 376 | Hence you derive the following simple function to abstract
|
---|
| 377 | that. It selects a random integer between the two given
|
---|
| 378 | integers (inclusive), For example: C<random_int_in(50,120)>.
|
---|
| 379 |
|
---|
| 380 | sub random_int_in ($$) {
|
---|
| 381 | my($min, $max) = @_;
|
---|
| 382 | # Assumes that the two arguments are integers themselves!
|
---|
| 383 | return $min if $min == $max;
|
---|
| 384 | ($min, $max) = ($max, $min) if $min > $max;
|
---|
| 385 | return $min + int rand(1 + $max - $min);
|
---|
| 386 | }
|
---|
| 387 |
|
---|
| 388 | =head1 Data: Dates
|
---|
| 389 |
|
---|
| 390 | =head2 How do I find the day or week of the year?
|
---|
| 391 |
|
---|
| 392 | The localtime function returns the day of the year. Without an
|
---|
| 393 | argument localtime uses the current time.
|
---|
| 394 |
|
---|
| 395 | $day_of_year = (localtime)[7];
|
---|
| 396 |
|
---|
| 397 | The POSIX module can also format a date as the day of the year or
|
---|
| 398 | week of the year.
|
---|
| 399 |
|
---|
| 400 | use POSIX qw/strftime/;
|
---|
| 401 | my $day_of_year = strftime "%j", localtime;
|
---|
| 402 | my $week_of_year = strftime "%W", localtime;
|
---|
| 403 |
|
---|
| 404 | To get the day of year for any date, use the Time::Local module to get
|
---|
| 405 | a time in epoch seconds for the argument to localtime.
|
---|
| 406 |
|
---|
| 407 | use POSIX qw/strftime/;
|
---|
| 408 | use Time::Local;
|
---|
| 409 | my $week_of_year = strftime "%W",
|
---|
| 410 | localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
|
---|
| 411 |
|
---|
| 412 | The Date::Calc module provides two functions to calculate these.
|
---|
| 413 |
|
---|
| 414 | use Date::Calc;
|
---|
| 415 | my $day_of_year = Day_of_Year( 1987, 12, 18 );
|
---|
| 416 | my $week_of_year = Week_of_Year( 1987, 12, 18 );
|
---|
| 417 |
|
---|
| 418 | =head2 How do I find the current century or millennium?
|
---|
| 419 |
|
---|
| 420 | Use the following simple functions:
|
---|
| 421 |
|
---|
| 422 | sub get_century {
|
---|
| 423 | return int((((localtime(shift || time))[5] + 1999))/100);
|
---|
| 424 | }
|
---|
| 425 |
|
---|
| 426 | sub get_millennium {
|
---|
| 427 | return 1+int((((localtime(shift || time))[5] + 1899))/1000);
|
---|
| 428 | }
|
---|
| 429 |
|
---|
| 430 | On some systems, the POSIX module's strftime() function has
|
---|
| 431 | been extended in a non-standard way to use a C<%C> format,
|
---|
| 432 | which they sometimes claim is the "century". It isn't,
|
---|
| 433 | because on most such systems, this is only the first two
|
---|
| 434 | digits of the four-digit year, and thus cannot be used to
|
---|
| 435 | reliably determine the current century or millennium.
|
---|
| 436 |
|
---|
| 437 | =head2 How can I compare two dates and find the difference?
|
---|
| 438 |
|
---|
| 439 | (contributed by brian d foy)
|
---|
| 440 |
|
---|
| 441 | You could just store all your dates as a number and then subtract. Life
|
---|
| 442 | isn't always that simple though. If you want to work with formatted
|
---|
| 443 | dates, the Date::Manip, Date::Calc, or DateTime modules can help you.
|
---|
| 444 |
|
---|
| 445 |
|
---|
| 446 | =head2 How can I take a string and turn it into epoch seconds?
|
---|
| 447 |
|
---|
| 448 | If it's a regular enough string that it always has the same format,
|
---|
| 449 | you can split it up and pass the parts to C<timelocal> in the standard
|
---|
| 450 | Time::Local module. Otherwise, you should look into the Date::Calc
|
---|
| 451 | and Date::Manip modules from CPAN.
|
---|
| 452 |
|
---|
| 453 | =head2 How can I find the Julian Day?
|
---|
| 454 |
|
---|
| 455 | (contributed by brian d foy and Dave Cross)
|
---|
| 456 |
|
---|
| 457 | You can use the Time::JulianDay module available on CPAN. Ensure that
|
---|
| 458 | you really want to find a Julian day, though, as many people have
|
---|
| 459 | different ideas about Julian days. See
|
---|
| 460 | http://www.hermetic.ch/cal_stud/jdn.htm for instance.
|
---|
| 461 |
|
---|
| 462 | You can also try the DateTime module, which can convert a date/time
|
---|
| 463 | to a Julian Day.
|
---|
| 464 |
|
---|
| 465 | $ perl -MDateTime -le'print DateTime->today->jd'
|
---|
| 466 | 2453401.5
|
---|
| 467 |
|
---|
| 468 | Or the modified Julian Day
|
---|
| 469 |
|
---|
| 470 | $ perl -MDateTime -le'print DateTime->today->mjd'
|
---|
| 471 | 53401
|
---|
| 472 |
|
---|
| 473 | Or even the day of the year (which is what some people think of as a
|
---|
| 474 | Julian day)
|
---|
| 475 |
|
---|
| 476 | $ perl -MDateTime -le'print DateTime->today->doy'
|
---|
| 477 | 31
|
---|
| 478 |
|
---|
| 479 | =head2 How do I find yesterday's date?
|
---|
| 480 |
|
---|
| 481 | (contributed by brian d foy)
|
---|
| 482 |
|
---|
| 483 | Use one of the Date modules. The C<DateTime> module makes it simple, and
|
---|
| 484 | give you the same time of day, only the day before.
|
---|
| 485 |
|
---|
| 486 | use DateTime;
|
---|
| 487 |
|
---|
| 488 | my $yesterday = DateTime->now->subtract( days => 1 );
|
---|
| 489 |
|
---|
| 490 | print "Yesterday was $yesterday\n";
|
---|
| 491 |
|
---|
| 492 | You can also use the C<Date::Calc> module using its Today_and_Now
|
---|
| 493 | function.
|
---|
| 494 |
|
---|
| 495 | use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
|
---|
| 496 |
|
---|
| 497 | my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
|
---|
| 498 |
|
---|
| 499 | print "@date\n";
|
---|
| 500 |
|
---|
| 501 | Most people try to use the time rather than the calendar to figure out
|
---|
| 502 | dates, but that assumes that days are twenty-four hours each. For
|
---|
| 503 | most people, there are two days a year when they aren't: the switch to
|
---|
| 504 | and from summer time throws this off. Let the modules do the work.
|
---|
| 505 |
|
---|
| 506 | =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
|
---|
| 507 |
|
---|
| 508 | Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
|
---|
| 509 | Y2K compliant (whatever that means). The programmers you've hired to
|
---|
| 510 | use it, however, probably are not.
|
---|
| 511 |
|
---|
| 512 | Long answer: The question belies a true understanding of the issue.
|
---|
| 513 | Perl is just as Y2K compliant as your pencil--no more, and no less.
|
---|
| 514 | Can you use your pencil to write a non-Y2K-compliant memo? Of course
|
---|
| 515 | you can. Is that the pencil's fault? Of course it isn't.
|
---|
| 516 |
|
---|
| 517 | The date and time functions supplied with Perl (gmtime and localtime)
|
---|
| 518 | supply adequate information to determine the year well beyond 2000
|
---|
| 519 | (2038 is when trouble strikes for 32-bit machines). The year returned
|
---|
| 520 | by these functions when used in a list context is the year minus 1900.
|
---|
| 521 | For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
|
---|
| 522 | number. To avoid the year 2000 problem simply do not treat the year as
|
---|
| 523 | a 2-digit number. It isn't.
|
---|
| 524 |
|
---|
| 525 | When gmtime() and localtime() are used in scalar context they return
|
---|
| 526 | a timestamp string that contains a fully-expanded year. For example,
|
---|
| 527 | C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
|
---|
| 528 | 2001". There's no year 2000 problem here.
|
---|
| 529 |
|
---|
| 530 | That doesn't mean that Perl can't be used to create non-Y2K compliant
|
---|
| 531 | programs. It can. But so can your pencil. It's the fault of the user,
|
---|
| 532 | not the language. At the risk of inflaming the NRA: "Perl doesn't
|
---|
| 533 | break Y2K, people do." See http://www.perl.org/about/y2k.html for
|
---|
| 534 | a longer exposition.
|
---|
| 535 |
|
---|
| 536 | =head1 Data: Strings
|
---|
| 537 |
|
---|
| 538 | =head2 How do I validate input?
|
---|
| 539 |
|
---|
| 540 | (contributed by brian d foy)
|
---|
| 541 |
|
---|
| 542 | There are many ways to ensure that values are what you expect or
|
---|
| 543 | want to accept. Besides the specific examples that we cover in the
|
---|
| 544 | perlfaq, you can also look at the modules with "Assert" and "Validate"
|
---|
| 545 | in their names, along with other modules such as C<Regexp::Common>.
|
---|
| 546 |
|
---|
| 547 | Some modules have validation for particular types of input, such
|
---|
| 548 | as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
|
---|
| 549 | and C<Data::Validate::IP>.
|
---|
| 550 |
|
---|
| 551 | =head2 How do I unescape a string?
|
---|
| 552 |
|
---|
| 553 | It depends just what you mean by "escape". URL escapes are dealt
|
---|
| 554 | with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
|
---|
| 555 | character are removed with
|
---|
| 556 |
|
---|
| 557 | s/\\(.)/$1/g;
|
---|
| 558 |
|
---|
| 559 | This won't expand C<"\n"> or C<"\t"> or any other special escapes.
|
---|
| 560 |
|
---|
| 561 | =head2 How do I remove consecutive pairs of characters?
|
---|
| 562 |
|
---|
| 563 | (contributed by brian d foy)
|
---|
| 564 |
|
---|
| 565 | You can use the substitution operator to find pairs of characters (or
|
---|
| 566 | runs of characters) and replace them with a single instance. In this
|
---|
| 567 | substitution, we find a character in C<(.)>. The memory parentheses
|
---|
| 568 | store the matched character in the back-reference C<\1> and we use
|
---|
| 569 | that to require that the same thing immediately follow it. We replace
|
---|
| 570 | that part of the string with the character in C<$1>.
|
---|
| 571 |
|
---|
| 572 | s/(.)\1/$1/g;
|
---|
| 573 |
|
---|
| 574 | We can also use the transliteration operator, C<tr///>. In this
|
---|
| 575 | example, the search list side of our C<tr///> contains nothing, but
|
---|
| 576 | the C<c> option complements that so it contains everything. The
|
---|
| 577 | replacement list also contains nothing, so the transliteration is
|
---|
| 578 | almost a no-op since it won't do any replacements (or more exactly,
|
---|
| 579 | replace the character with itself). However, the C<s> option squashes
|
---|
| 580 | duplicated and consecutive characters in the string so a character
|
---|
| 581 | does not show up next to itself
|
---|
| 582 |
|
---|
| 583 | my $str = 'Haarlem'; # in the Netherlands
|
---|
| 584 | $str =~ tr///cs; # Now Harlem, like in New York
|
---|
| 585 |
|
---|
| 586 | =head2 How do I expand function calls in a string?
|
---|
| 587 |
|
---|
| 588 | (contributed by brian d foy)
|
---|
| 589 |
|
---|
| 590 | This is documented in L<perlref>, and although it's not the easiest
|
---|
| 591 | thing to read, it does work. In each of these examples, we call the
|
---|
| 592 | function inside the braces used to dereference a reference. If we
|
---|
| 593 | have a more than one return value, we can construct and dereference an
|
---|
| 594 | anonymous array. In this case, we call the function in list context.
|
---|
| 595 |
|
---|
| 596 | print "The time values are @{ [localtime] }.\n";
|
---|
| 597 |
|
---|
| 598 | If we want to call the function in scalar context, we have to do a bit
|
---|
| 599 | more work. We can really have any code we like inside the braces, so
|
---|
| 600 | we simply have to end with the scalar reference, although how you do
|
---|
| 601 | that is up to you, and you can use code inside the braces.
|
---|
| 602 |
|
---|
| 603 | print "The time is ${\(scalar localtime)}.\n"
|
---|
| 604 |
|
---|
| 605 | print "The time is ${ my $x = localtime; \$x }.\n";
|
---|
| 606 |
|
---|
| 607 | If your function already returns a reference, you don't need to create
|
---|
| 608 | the reference yourself.
|
---|
| 609 |
|
---|
| 610 | sub timestamp { my $t = localtime; \$t }
|
---|
| 611 |
|
---|
| 612 | print "The time is ${ timestamp() }.\n";
|
---|
| 613 |
|
---|
| 614 | The C<Interpolation> module can also do a lot of magic for you. You can
|
---|
| 615 | specify a variable name, in this case C<E>, to set up a tied hash that
|
---|
| 616 | does the interpolation for you. It has several other methods to do this
|
---|
| 617 | as well.
|
---|
| 618 |
|
---|
| 619 | use Interpolation E => 'eval';
|
---|
| 620 | print "The time values are $E{localtime()}.\n";
|
---|
| 621 |
|
---|
| 622 | In most cases, it is probably easier to simply use string concatenation,
|
---|
| 623 | which also forces scalar context.
|
---|
| 624 |
|
---|
| 625 | print "The time is " . localtime . ".\n";
|
---|
| 626 |
|
---|
| 627 | =head2 How do I find matching/nesting anything?
|
---|
| 628 |
|
---|
| 629 | This isn't something that can be done in one regular expression, no
|
---|
| 630 | matter how complicated. To find something between two single
|
---|
| 631 | characters, a pattern like C</x([^x]*)x/> will get the intervening
|
---|
| 632 | bits in $1. For multiple ones, then something more like
|
---|
| 633 | C</alpha(.*?)omega/> would be needed. But none of these deals with
|
---|
| 634 | nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
|
---|
| 635 | C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
|
---|
| 636 | L<perlre/(??{ code })>. For other cases, you'll have to write a
|
---|
| 637 | parser.
|
---|
| 638 |
|
---|
| 639 | If you are serious about writing a parser, there are a number of
|
---|
| 640 | modules or oddities that will make your life a lot easier. There are
|
---|
| 641 | the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
|
---|
| 642 | and the byacc program. Starting from perl 5.8 the Text::Balanced is
|
---|
| 643 | part of the standard distribution.
|
---|
| 644 |
|
---|
| 645 | One simple destructive, inside-out approach that you might try is to
|
---|
| 646 | pull out the smallest nesting parts one at a time:
|
---|
| 647 |
|
---|
| 648 | while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
|
---|
| 649 | # do something with $1
|
---|
| 650 | }
|
---|
| 651 |
|
---|
| 652 | A more complicated and sneaky approach is to make Perl's regular
|
---|
| 653 | expression engine do it for you. This is courtesy Dean Inada, and
|
---|
| 654 | rather has the nature of an Obfuscated Perl Contest entry, but it
|
---|
| 655 | really does work:
|
---|
| 656 |
|
---|
| 657 | # $_ contains the string to parse
|
---|
| 658 | # BEGIN and END are the opening and closing markers for the
|
---|
| 659 | # nested text.
|
---|
| 660 |
|
---|
| 661 | @( = ('(','');
|
---|
| 662 | @) = (')','');
|
---|
| 663 | ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
|
---|
| 664 | @$ = (eval{/$re/},$@!~/unmatched/i);
|
---|
| 665 | print join("\n",@$[0..$#$]) if( $$[-1] );
|
---|
| 666 |
|
---|
| 667 | =head2 How do I reverse a string?
|
---|
| 668 |
|
---|
| 669 | Use reverse() in scalar context, as documented in
|
---|
| 670 | L<perlfunc/reverse>.
|
---|
| 671 |
|
---|
| 672 | $reversed = reverse $string;
|
---|
| 673 |
|
---|
| 674 | =head2 How do I expand tabs in a string?
|
---|
| 675 |
|
---|
| 676 | You can do it yourself:
|
---|
| 677 |
|
---|
| 678 | 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
|
---|
| 679 |
|
---|
| 680 | Or you can just use the Text::Tabs module (part of the standard Perl
|
---|
| 681 | distribution).
|
---|
| 682 |
|
---|
| 683 | use Text::Tabs;
|
---|
| 684 | @expanded_lines = expand(@lines_with_tabs);
|
---|
| 685 |
|
---|
| 686 | =head2 How do I reformat a paragraph?
|
---|
| 687 |
|
---|
| 688 | Use Text::Wrap (part of the standard Perl distribution):
|
---|
| 689 |
|
---|
| 690 | use Text::Wrap;
|
---|
| 691 | print wrap("\t", ' ', @paragraphs);
|
---|
| 692 |
|
---|
| 693 | The paragraphs you give to Text::Wrap should not contain embedded
|
---|
| 694 | newlines. Text::Wrap doesn't justify the lines (flush-right).
|
---|
| 695 |
|
---|
| 696 | Or use the CPAN module Text::Autoformat. Formatting files can be easily
|
---|
| 697 | done by making a shell alias, like so:
|
---|
| 698 |
|
---|
| 699 | alias fmt="perl -i -MText::Autoformat -n0777 \
|
---|
| 700 | -e 'print autoformat $_, {all=>1}' $*"
|
---|
| 701 |
|
---|
| 702 | See the documentation for Text::Autoformat to appreciate its many
|
---|
| 703 | capabilities.
|
---|
| 704 |
|
---|
| 705 | =head2 How can I access or change N characters of a string?
|
---|
| 706 |
|
---|
| 707 | You can access the first characters of a string with substr().
|
---|
| 708 | To get the first character, for example, start at position 0
|
---|
| 709 | and grab the string of length 1.
|
---|
| 710 |
|
---|
| 711 |
|
---|
| 712 | $string = "Just another Perl Hacker";
|
---|
| 713 | $first_char = substr( $string, 0, 1 ); # 'J'
|
---|
| 714 |
|
---|
| 715 | To change part of a string, you can use the optional fourth
|
---|
| 716 | argument which is the replacement string.
|
---|
| 717 |
|
---|
| 718 | substr( $string, 13, 4, "Perl 5.8.0" );
|
---|
| 719 |
|
---|
| 720 | You can also use substr() as an lvalue.
|
---|
| 721 |
|
---|
| 722 | substr( $string, 13, 4 ) = "Perl 5.8.0";
|
---|
| 723 |
|
---|
| 724 | =head2 How do I change the Nth occurrence of something?
|
---|
| 725 |
|
---|
| 726 | You have to keep track of N yourself. For example, let's say you want
|
---|
| 727 | to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
|
---|
| 728 | C<"whosoever"> or C<"whomsoever">, case insensitively. These
|
---|
| 729 | all assume that $_ contains the string to be altered.
|
---|
| 730 |
|
---|
| 731 | $count = 0;
|
---|
| 732 | s{((whom?)ever)}{
|
---|
| 733 | ++$count == 5 # is it the 5th?
|
---|
| 734 | ? "${2}soever" # yes, swap
|
---|
| 735 | : $1 # renege and leave it there
|
---|
| 736 | }ige;
|
---|
| 737 |
|
---|
| 738 | In the more general case, you can use the C</g> modifier in a C<while>
|
---|
| 739 | loop, keeping count of matches.
|
---|
| 740 |
|
---|
| 741 | $WANT = 3;
|
---|
| 742 | $count = 0;
|
---|
| 743 | $_ = "One fish two fish red fish blue fish";
|
---|
| 744 | while (/(\w+)\s+fish\b/gi) {
|
---|
| 745 | if (++$count == $WANT) {
|
---|
| 746 | print "The third fish is a $1 one.\n";
|
---|
| 747 | }
|
---|
| 748 | }
|
---|
| 749 |
|
---|
| 750 | That prints out: C<"The third fish is a red one."> You can also use a
|
---|
| 751 | repetition count and repeated pattern like this:
|
---|
| 752 |
|
---|
| 753 | /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
|
---|
| 754 |
|
---|
| 755 | =head2 How can I count the number of occurrences of a substring within a string?
|
---|
| 756 |
|
---|
| 757 | There are a number of ways, with varying efficiency. If you want a
|
---|
| 758 | count of a certain single character (X) within a string, you can use the
|
---|
| 759 | C<tr///> function like so:
|
---|
| 760 |
|
---|
| 761 | $string = "ThisXlineXhasXsomeXx'sXinXit";
|
---|
| 762 | $count = ($string =~ tr/X//);
|
---|
| 763 | print "There are $count X characters in the string";
|
---|
| 764 |
|
---|
| 765 | This is fine if you are just looking for a single character. However,
|
---|
| 766 | if you are trying to count multiple character substrings within a
|
---|
| 767 | larger string, C<tr///> won't work. What you can do is wrap a while()
|
---|
| 768 | loop around a global pattern match. For example, let's count negative
|
---|
| 769 | integers:
|
---|
| 770 |
|
---|
| 771 | $string = "-9 55 48 -2 23 -76 4 14 -44";
|
---|
| 772 | while ($string =~ /-\d+/g) { $count++ }
|
---|
| 773 | print "There are $count negative numbers in the string";
|
---|
| 774 |
|
---|
| 775 | Another version uses a global match in list context, then assigns the
|
---|
| 776 | result to a scalar, producing a count of the number of matches.
|
---|
| 777 |
|
---|
| 778 | $count = () = $string =~ /-\d+/g;
|
---|
| 779 |
|
---|
| 780 | =head2 How do I capitalize all the words on one line?
|
---|
| 781 |
|
---|
| 782 | To make the first letter of each word upper case:
|
---|
| 783 |
|
---|
| 784 | $line =~ s/\b(\w)/\U$1/g;
|
---|
| 785 |
|
---|
| 786 | This has the strange effect of turning "C<don't do it>" into "C<Don'T
|
---|
| 787 | Do It>". Sometimes you might want this. Other times you might need a
|
---|
| 788 | more thorough solution (Suggested by brian d foy):
|
---|
| 789 |
|
---|
| 790 | $string =~ s/ (
|
---|
| 791 | (^\w) #at the beginning of the line
|
---|
| 792 | | # or
|
---|
| 793 | (\s\w) #preceded by whitespace
|
---|
| 794 | )
|
---|
| 795 | /\U$1/xg;
|
---|
| 796 | $string =~ /([\w']+)/\u\L$1/g;
|
---|
| 797 |
|
---|
| 798 | To make the whole line upper case:
|
---|
| 799 |
|
---|
| 800 | $line = uc($line);
|
---|
| 801 |
|
---|
| 802 | To force each word to be lower case, with the first letter upper case:
|
---|
| 803 |
|
---|
| 804 | $line =~ s/(\w+)/\u\L$1/g;
|
---|
| 805 |
|
---|
| 806 | You can (and probably should) enable locale awareness of those
|
---|
| 807 | characters by placing a C<use locale> pragma in your program.
|
---|
| 808 | See L<perllocale> for endless details on locales.
|
---|
| 809 |
|
---|
| 810 | This is sometimes referred to as putting something into "title
|
---|
| 811 | case", but that's not quite accurate. Consider the proper
|
---|
| 812 | capitalization of the movie I<Dr. Strangelove or: How I Learned to
|
---|
| 813 | Stop Worrying and Love the Bomb>, for example.
|
---|
| 814 |
|
---|
| 815 | Damian Conway's L<Text::Autoformat> module provides some smart
|
---|
| 816 | case transformations:
|
---|
| 817 |
|
---|
| 818 | use Text::Autoformat;
|
---|
| 819 | my $x = "Dr. Strangelove or: How I Learned to Stop ".
|
---|
| 820 | "Worrying and Love the Bomb";
|
---|
| 821 |
|
---|
| 822 | print $x, "\n";
|
---|
| 823 | for my $style (qw( sentence title highlight ))
|
---|
| 824 | {
|
---|
| 825 | print autoformat($x, { case => $style }), "\n";
|
---|
| 826 | }
|
---|
| 827 |
|
---|
| 828 | =head2 How can I split a [character] delimited string except when inside [character]?
|
---|
| 829 |
|
---|
| 830 | Several modules can handle this sort of pasing---Text::Balanced,
|
---|
| 831 | Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
|
---|
| 832 |
|
---|
| 833 | Take the example case of trying to split a string that is
|
---|
| 834 | comma-separated into its different fields. You can't use C<split(/,/)>
|
---|
| 835 | because you shouldn't split if the comma is inside quotes. For
|
---|
| 836 | example, take a data line like this:
|
---|
| 837 |
|
---|
| 838 | SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
|
---|
| 839 |
|
---|
| 840 | Due to the restriction of the quotes, this is a fairly complex
|
---|
| 841 | problem. Thankfully, we have Jeffrey Friedl, author of
|
---|
| 842 | I<Mastering Regular Expressions>, to handle these for us. He
|
---|
| 843 | suggests (assuming your string is contained in $text):
|
---|
| 844 |
|
---|
| 845 | @new = ();
|
---|
| 846 | push(@new, $+) while $text =~ m{
|
---|
| 847 | "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
|
---|
| 848 | | ([^,]+),?
|
---|
| 849 | | ,
|
---|
| 850 | }gx;
|
---|
| 851 | push(@new, undef) if substr($text,-1,1) eq ',';
|
---|
| 852 |
|
---|
| 853 | If you want to represent quotation marks inside a
|
---|
| 854 | quotation-mark-delimited field, escape them with backslashes (eg,
|
---|
| 855 | C<"like \"this\"">.
|
---|
| 856 |
|
---|
| 857 | Alternatively, the Text::ParseWords module (part of the standard Perl
|
---|
| 858 | distribution) lets you say:
|
---|
| 859 |
|
---|
| 860 | use Text::ParseWords;
|
---|
| 861 | @new = quotewords(",", 0, $text);
|
---|
| 862 |
|
---|
| 863 | There's also a Text::CSV (Comma-Separated Values) module on CPAN.
|
---|
| 864 |
|
---|
| 865 | =head2 How do I strip blank space from the beginning/end of a string?
|
---|
| 866 |
|
---|
| 867 | (contributed by brian d foy)
|
---|
| 868 |
|
---|
| 869 | A substitution can do this for you. For a single line, you want to
|
---|
| 870 | replace all the leading or trailing whitespace with nothing. You
|
---|
| 871 | can do that with a pair of substitutions.
|
---|
| 872 |
|
---|
| 873 | s/^\s+//;
|
---|
| 874 | s/\s+$//;
|
---|
| 875 |
|
---|
| 876 | You can also write that as a single substitution, although it turns
|
---|
| 877 | out the combined statement is slower than the separate ones. That
|
---|
| 878 | might not matter to you, though.
|
---|
| 879 |
|
---|
| 880 | s/^\s+|\s+$//g;
|
---|
| 881 |
|
---|
| 882 | In this regular expression, the alternation matches either at the
|
---|
| 883 | beginning or the end of the string since the anchors have a lower
|
---|
| 884 | precedence than the alternation. With the C</g> flag, the substitution
|
---|
| 885 | makes all possible matches, so it gets both. Remember, the trailing
|
---|
| 886 | newline matches the C<\s+>, and the C<$> anchor can match to the
|
---|
| 887 | physical end of the string, so the newline disappears too. Just add
|
---|
| 888 | the newline to the output, which has the added benefit of preserving
|
---|
| 889 | "blank" (consisting entirely of whitespace) lines which the C<^\s+>
|
---|
| 890 | would remove all by itself.
|
---|
| 891 |
|
---|
| 892 | while( <> )
|
---|
| 893 | {
|
---|
| 894 | s/^\s+|\s+$//g;
|
---|
| 895 | print "$_\n";
|
---|
| 896 | }
|
---|
| 897 |
|
---|
| 898 | For a multi-line string, you can apply the regular expression
|
---|
| 899 | to each logical line in the string by adding the C</m> flag (for
|
---|
| 900 | "multi-line"). With the C</m> flag, the C<$> matches I<before> an
|
---|
| 901 | embedded newline, so it doesn't remove it. It still removes the
|
---|
| 902 | newline at the end of the string.
|
---|
| 903 |
|
---|
| 904 | $string =~ s/^\s+|\s+$//gm;
|
---|
| 905 |
|
---|
| 906 | Remember that lines consisting entirely of whitespace will disappear,
|
---|
| 907 | since the first part of the alternation can match the entire string
|
---|
| 908 | and replace it with nothing. If need to keep embedded blank lines,
|
---|
| 909 | you have to do a little more work. Instead of matching any whitespace
|
---|
| 910 | (since that includes a newline), just match the other whitespace.
|
---|
| 911 |
|
---|
| 912 | $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
|
---|
| 913 |
|
---|
| 914 | =head2 How do I pad a string with blanks or pad a number with zeroes?
|
---|
| 915 |
|
---|
| 916 | In the following examples, C<$pad_len> is the length to which you wish
|
---|
| 917 | to pad the string, C<$text> or C<$num> contains the string to be padded,
|
---|
| 918 | and C<$pad_char> contains the padding character. You can use a single
|
---|
| 919 | character string constant instead of the C<$pad_char> variable if you
|
---|
| 920 | know what it is in advance. And in the same way you can use an integer in
|
---|
| 921 | place of C<$pad_len> if you know the pad length in advance.
|
---|
| 922 |
|
---|
| 923 | The simplest method uses the C<sprintf> function. It can pad on the left
|
---|
| 924 | or right with blanks and on the left with zeroes and it will not
|
---|
| 925 | truncate the result. The C<pack> function can only pad strings on the
|
---|
| 926 | right with blanks and it will truncate the result to a maximum length of
|
---|
| 927 | C<$pad_len>.
|
---|
| 928 |
|
---|
| 929 | # Left padding a string with blanks (no truncation):
|
---|
| 930 | $padded = sprintf("%${pad_len}s", $text);
|
---|
| 931 | $padded = sprintf("%*s", $pad_len, $text); # same thing
|
---|
| 932 |
|
---|
| 933 | # Right padding a string with blanks (no truncation):
|
---|
| 934 | $padded = sprintf("%-${pad_len}s", $text);
|
---|
| 935 | $padded = sprintf("%-*s", $pad_len, $text); # same thing
|
---|
| 936 |
|
---|
| 937 | # Left padding a number with 0 (no truncation):
|
---|
| 938 | $padded = sprintf("%0${pad_len}d", $num);
|
---|
| 939 | $padded = sprintf("%0*d", $pad_len, $num); # same thing
|
---|
| 940 |
|
---|
| 941 | # Right padding a string with blanks using pack (will truncate):
|
---|
| 942 | $padded = pack("A$pad_len",$text);
|
---|
| 943 |
|
---|
| 944 | If you need to pad with a character other than blank or zero you can use
|
---|
| 945 | one of the following methods. They all generate a pad string with the
|
---|
| 946 | C<x> operator and combine that with C<$text>. These methods do
|
---|
| 947 | not truncate C<$text>.
|
---|
| 948 |
|
---|
| 949 | Left and right padding with any character, creating a new string:
|
---|
| 950 |
|
---|
| 951 | $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
|
---|
| 952 | $padded = $text . $pad_char x ( $pad_len - length( $text ) );
|
---|
| 953 |
|
---|
| 954 | Left and right padding with any character, modifying C<$text> directly:
|
---|
| 955 |
|
---|
| 956 | substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
|
---|
| 957 | $text .= $pad_char x ( $pad_len - length( $text ) );
|
---|
| 958 |
|
---|
| 959 | =head2 How do I extract selected columns from a string?
|
---|
| 960 |
|
---|
| 961 | Use substr() or unpack(), both documented in L<perlfunc>.
|
---|
| 962 | If you prefer thinking in terms of columns instead of widths,
|
---|
| 963 | you can use this kind of thing:
|
---|
| 964 |
|
---|
| 965 | # determine the unpack format needed to split Linux ps output
|
---|
| 966 | # arguments are cut columns
|
---|
| 967 | my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
|
---|
| 968 |
|
---|
| 969 | sub cut2fmt {
|
---|
| 970 | my(@positions) = @_;
|
---|
| 971 | my $template = '';
|
---|
| 972 | my $lastpos = 1;
|
---|
| 973 | for my $place (@positions) {
|
---|
| 974 | $template .= "A" . ($place - $lastpos) . " ";
|
---|
| 975 | $lastpos = $place;
|
---|
| 976 | }
|
---|
| 977 | $template .= "A*";
|
---|
| 978 | return $template;
|
---|
| 979 | }
|
---|
| 980 |
|
---|
| 981 | =head2 How do I find the soundex value of a string?
|
---|
| 982 |
|
---|
| 983 | (contributed by brian d foy)
|
---|
| 984 |
|
---|
| 985 | You can use the Text::Soundex module. If you want to do fuzzy or close
|
---|
| 986 | matching, you might also try the String::Approx, and Text::Metaphone,
|
---|
| 987 | and Text::DoubleMetaphone modules.
|
---|
| 988 |
|
---|
| 989 | =head2 How can I expand variables in text strings?
|
---|
| 990 |
|
---|
| 991 | Let's assume that you have a string that contains placeholder
|
---|
| 992 | variables.
|
---|
| 993 |
|
---|
| 994 | $text = 'this has a $foo in it and a $bar';
|
---|
| 995 |
|
---|
| 996 | You can use a substitution with a double evaluation. The
|
---|
| 997 | first /e turns C<$1> into C<$foo>, and the second /e turns
|
---|
| 998 | C<$foo> into its value. You may want to wrap this in an
|
---|
| 999 | C<eval>: if you try to get the value of an undeclared variable
|
---|
| 1000 | while running under C<use strict>, you get a fatal error.
|
---|
| 1001 |
|
---|
| 1002 | eval { $text =~ s/(\$\w+)/$1/eeg };
|
---|
| 1003 | die if $@;
|
---|
| 1004 |
|
---|
| 1005 | It's probably better in the general case to treat those
|
---|
| 1006 | variables as entries in some special hash. For example:
|
---|
| 1007 |
|
---|
| 1008 | %user_defs = (
|
---|
| 1009 | foo => 23,
|
---|
| 1010 | bar => 19,
|
---|
| 1011 | );
|
---|
| 1012 | $text =~ s/\$(\w+)/$user_defs{$1}/g;
|
---|
| 1013 |
|
---|
| 1014 | =head2 What's wrong with always quoting "$vars"?
|
---|
| 1015 |
|
---|
| 1016 | The problem is that those double-quotes force stringification--
|
---|
| 1017 | coercing numbers and references into strings--even when you
|
---|
| 1018 | don't want them to be strings. Think of it this way: double-quote
|
---|
| 1019 | expansion is used to produce new strings. If you already
|
---|
| 1020 | have a string, why do you need more?
|
---|
| 1021 |
|
---|
| 1022 | If you get used to writing odd things like these:
|
---|
| 1023 |
|
---|
| 1024 | print "$var"; # BAD
|
---|
| 1025 | $new = "$old"; # BAD
|
---|
| 1026 | somefunc("$var"); # BAD
|
---|
| 1027 |
|
---|
| 1028 | You'll be in trouble. Those should (in 99.8% of the cases) be
|
---|
| 1029 | the simpler and more direct:
|
---|
| 1030 |
|
---|
| 1031 | print $var;
|
---|
| 1032 | $new = $old;
|
---|
| 1033 | somefunc($var);
|
---|
| 1034 |
|
---|
| 1035 | Otherwise, besides slowing you down, you're going to break code when
|
---|
| 1036 | the thing in the scalar is actually neither a string nor a number, but
|
---|
| 1037 | a reference:
|
---|
| 1038 |
|
---|
| 1039 | func(\@array);
|
---|
| 1040 | sub func {
|
---|
| 1041 | my $aref = shift;
|
---|
| 1042 | my $oref = "$aref"; # WRONG
|
---|
| 1043 | }
|
---|
| 1044 |
|
---|
| 1045 | You can also get into subtle problems on those few operations in Perl
|
---|
| 1046 | that actually do care about the difference between a string and a
|
---|
| 1047 | number, such as the magical C<++> autoincrement operator or the
|
---|
| 1048 | syscall() function.
|
---|
| 1049 |
|
---|
| 1050 | Stringification also destroys arrays.
|
---|
| 1051 |
|
---|
| 1052 | @lines = `command`;
|
---|
| 1053 | print "@lines"; # WRONG - extra blanks
|
---|
| 1054 | print @lines; # right
|
---|
| 1055 |
|
---|
| 1056 | =head2 Why don't my E<lt>E<lt>HERE documents work?
|
---|
| 1057 |
|
---|
| 1058 | Check for these three things:
|
---|
| 1059 |
|
---|
| 1060 | =over 4
|
---|
| 1061 |
|
---|
| 1062 | =item There must be no space after the E<lt>E<lt> part.
|
---|
| 1063 |
|
---|
| 1064 | =item There (probably) should be a semicolon at the end.
|
---|
| 1065 |
|
---|
| 1066 | =item You can't (easily) have any space in front of the tag.
|
---|
| 1067 |
|
---|
| 1068 | =back
|
---|
| 1069 |
|
---|
| 1070 | If you want to indent the text in the here document, you
|
---|
| 1071 | can do this:
|
---|
| 1072 |
|
---|
| 1073 | # all in one
|
---|
| 1074 | ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
|
---|
| 1075 | your text
|
---|
| 1076 | goes here
|
---|
| 1077 | HERE_TARGET
|
---|
| 1078 |
|
---|
| 1079 | But the HERE_TARGET must still be flush against the margin.
|
---|
| 1080 | If you want that indented also, you'll have to quote
|
---|
| 1081 | in the indentation.
|
---|
| 1082 |
|
---|
| 1083 | ($quote = <<' FINIS') =~ s/^\s+//gm;
|
---|
| 1084 | ...we will have peace, when you and all your works have
|
---|
| 1085 | perished--and the works of your dark master to whom you
|
---|
| 1086 | would deliver us. You are a liar, Saruman, and a corrupter
|
---|
| 1087 | of men's hearts. --Theoden in /usr/src/perl/taint.c
|
---|
| 1088 | FINIS
|
---|
| 1089 | $quote =~ s/\s+--/\n--/;
|
---|
| 1090 |
|
---|
| 1091 | A nice general-purpose fixer-upper function for indented here documents
|
---|
| 1092 | follows. It expects to be called with a here document as its argument.
|
---|
| 1093 | It looks to see whether each line begins with a common substring, and
|
---|
| 1094 | if so, strips that substring off. Otherwise, it takes the amount of leading
|
---|
| 1095 | whitespace found on the first line and removes that much off each
|
---|
| 1096 | subsequent line.
|
---|
| 1097 |
|
---|
| 1098 | sub fix {
|
---|
| 1099 | local $_ = shift;
|
---|
| 1100 | my ($white, $leader); # common whitespace and common leading string
|
---|
| 1101 | if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
|
---|
| 1102 | ($white, $leader) = ($2, quotemeta($1));
|
---|
| 1103 | } else {
|
---|
| 1104 | ($white, $leader) = (/^(\s+)/, '');
|
---|
| 1105 | }
|
---|
| 1106 | s/^\s*?$leader(?:$white)?//gm;
|
---|
| 1107 | return $_;
|
---|
| 1108 | }
|
---|
| 1109 |
|
---|
| 1110 | This works with leading special strings, dynamically determined:
|
---|
| 1111 |
|
---|
| 1112 | $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
|
---|
| 1113 | @@@ int
|
---|
| 1114 | @@@ runops() {
|
---|
| 1115 | @@@ SAVEI32(runlevel);
|
---|
| 1116 | @@@ runlevel++;
|
---|
| 1117 | @@@ while ( op = (*op->op_ppaddr)() );
|
---|
| 1118 | @@@ TAINT_NOT;
|
---|
| 1119 | @@@ return 0;
|
---|
| 1120 | @@@ }
|
---|
| 1121 | MAIN_INTERPRETER_LOOP
|
---|
| 1122 |
|
---|
| 1123 | Or with a fixed amount of leading whitespace, with remaining
|
---|
| 1124 | indentation correctly preserved:
|
---|
| 1125 |
|
---|
| 1126 | $poem = fix<<EVER_ON_AND_ON;
|
---|
| 1127 | Now far ahead the Road has gone,
|
---|
| 1128 | And I must follow, if I can,
|
---|
| 1129 | Pursuing it with eager feet,
|
---|
| 1130 | Until it joins some larger way
|
---|
| 1131 | Where many paths and errands meet.
|
---|
| 1132 | And whither then? I cannot say.
|
---|
| 1133 | --Bilbo in /usr/src/perl/pp_ctl.c
|
---|
| 1134 | EVER_ON_AND_ON
|
---|
| 1135 |
|
---|
| 1136 | =head1 Data: Arrays
|
---|
| 1137 |
|
---|
| 1138 | =head2 What is the difference between a list and an array?
|
---|
| 1139 |
|
---|
| 1140 | An array has a changeable length. A list does not. An array is something
|
---|
| 1141 | you can push or pop, while a list is a set of values. Some people make
|
---|
| 1142 | the distinction that a list is a value while an array is a variable.
|
---|
| 1143 | Subroutines are passed and return lists, you put things into list
|
---|
| 1144 | context, you initialize arrays with lists, and you foreach() across
|
---|
| 1145 | a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
|
---|
| 1146 | in scalar context behave like the number of elements in them, subroutines
|
---|
| 1147 | access their arguments through the array C<@_>, and push/pop/shift only work
|
---|
| 1148 | on arrays.
|
---|
| 1149 |
|
---|
| 1150 | As a side note, there's no such thing as a list in scalar context.
|
---|
| 1151 | When you say
|
---|
| 1152 |
|
---|
| 1153 | $scalar = (2, 5, 7, 9);
|
---|
| 1154 |
|
---|
| 1155 | you're using the comma operator in scalar context, so it uses the scalar
|
---|
| 1156 | comma operator. There never was a list there at all! This causes the
|
---|
| 1157 | last value to be returned: 9.
|
---|
| 1158 |
|
---|
| 1159 | =head2 What is the difference between $array[1] and @array[1]?
|
---|
| 1160 |
|
---|
| 1161 | The former is a scalar value; the latter an array slice, making
|
---|
| 1162 | it a list with one (scalar) value. You should use $ when you want a
|
---|
| 1163 | scalar value (most of the time) and @ when you want a list with one
|
---|
| 1164 | scalar value in it (very, very rarely; nearly never, in fact).
|
---|
| 1165 |
|
---|
| 1166 | Sometimes it doesn't make a difference, but sometimes it does.
|
---|
| 1167 | For example, compare:
|
---|
| 1168 |
|
---|
| 1169 | $good[0] = `some program that outputs several lines`;
|
---|
| 1170 |
|
---|
| 1171 | with
|
---|
| 1172 |
|
---|
| 1173 | @bad[0] = `same program that outputs several lines`;
|
---|
| 1174 |
|
---|
| 1175 | The C<use warnings> pragma and the B<-w> flag will warn you about these
|
---|
| 1176 | matters.
|
---|
| 1177 |
|
---|
| 1178 | =head2 How can I remove duplicate elements from a list or array?
|
---|
| 1179 |
|
---|
| 1180 | (contributed by brian d foy)
|
---|
| 1181 |
|
---|
| 1182 | Use a hash. When you think the words "unique" or "duplicated", think
|
---|
| 1183 | "hash keys".
|
---|
| 1184 |
|
---|
| 1185 | If you don't care about the order of the elements, you could just
|
---|
| 1186 | create the hash then extract the keys. It's not important how you
|
---|
| 1187 | create that hash: just that you use C<keys> to get the unique
|
---|
| 1188 | elements.
|
---|
| 1189 |
|
---|
| 1190 | my %hash = map { $_, 1 } @array;
|
---|
| 1191 | # or a hash slice: @hash{ @array } = ();
|
---|
| 1192 | # or a foreach: $hash{$_} = 1 foreach ( @array );
|
---|
| 1193 |
|
---|
| 1194 | my @unique = keys %hash;
|
---|
| 1195 |
|
---|
| 1196 | You can also go through each element and skip the ones you've seen
|
---|
| 1197 | before. Use a hash to keep track. The first time the loop sees an
|
---|
| 1198 | element, that element has no key in C<%Seen>. The C<next> statement
|
---|
| 1199 | creates the key and immediately uses its value, which is C<undef>, so
|
---|
| 1200 | the loop continues to the C<push> and increments the value for that
|
---|
| 1201 | key. The next time the loop sees that same element, its key exists in
|
---|
| 1202 | the hash I<and> the value for that key is true (since it's not 0 or
|
---|
| 1203 | undef), so the next skips that iteration and the loop goes to the next
|
---|
| 1204 | element.
|
---|
| 1205 |
|
---|
| 1206 | my @unique = ();
|
---|
| 1207 | my %seen = ();
|
---|
| 1208 |
|
---|
| 1209 | foreach my $elem ( @array )
|
---|
| 1210 | {
|
---|
| 1211 | next if $seen{ $elem }++;
|
---|
| 1212 | push @unique, $elem;
|
---|
| 1213 | }
|
---|
| 1214 |
|
---|
| 1215 | You can write this more briefly using a grep, which does the
|
---|
| 1216 | same thing.
|
---|
| 1217 |
|
---|
| 1218 | my %seen = ();
|
---|
| 1219 | my @unique = grep { ! $seen{ $_ }++ } @array;
|
---|
| 1220 |
|
---|
| 1221 | =head2 How can I tell whether a certain element is contained in a list or array?
|
---|
| 1222 |
|
---|
| 1223 | (portions of this answer contributed by Anno Siegel)
|
---|
| 1224 |
|
---|
| 1225 | Hearing the word "in" is an I<in>dication that you probably should have
|
---|
| 1226 | used a hash, not a list or array, to store your data. Hashes are
|
---|
| 1227 | designed to answer this question quickly and efficiently. Arrays aren't.
|
---|
| 1228 |
|
---|
| 1229 | That being said, there are several ways to approach this. If you
|
---|
| 1230 | are going to make this query many times over arbitrary string values,
|
---|
| 1231 | the fastest way is probably to invert the original array and maintain a
|
---|
| 1232 | hash whose keys are the first array's values.
|
---|
| 1233 |
|
---|
| 1234 | @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
|
---|
| 1235 | %is_blue = ();
|
---|
| 1236 | for (@blues) { $is_blue{$_} = 1 }
|
---|
| 1237 |
|
---|
| 1238 | Now you can check whether $is_blue{$some_color}. It might have been a
|
---|
| 1239 | good idea to keep the blues all in a hash in the first place.
|
---|
| 1240 |
|
---|
| 1241 | If the values are all small integers, you could use a simple indexed
|
---|
| 1242 | array. This kind of an array will take up less space:
|
---|
| 1243 |
|
---|
| 1244 | @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
|
---|
| 1245 | @is_tiny_prime = ();
|
---|
| 1246 | for (@primes) { $is_tiny_prime[$_] = 1 }
|
---|
| 1247 | # or simply @istiny_prime[@primes] = (1) x @primes;
|
---|
| 1248 |
|
---|
| 1249 | Now you check whether $is_tiny_prime[$some_number].
|
---|
| 1250 |
|
---|
| 1251 | If the values in question are integers instead of strings, you can save
|
---|
| 1252 | quite a lot of space by using bit strings instead:
|
---|
| 1253 |
|
---|
| 1254 | @articles = ( 1..10, 150..2000, 2017 );
|
---|
| 1255 | undef $read;
|
---|
| 1256 | for (@articles) { vec($read,$_,1) = 1 }
|
---|
| 1257 |
|
---|
| 1258 | Now check whether C<vec($read,$n,1)> is true for some C<$n>.
|
---|
| 1259 |
|
---|
| 1260 | These methods guarantee fast individual tests but require a re-organization
|
---|
| 1261 | of the original list or array. They only pay off if you have to test
|
---|
| 1262 | multiple values against the same array.
|
---|
| 1263 |
|
---|
| 1264 | If you are testing only once, the standard module List::Util exports
|
---|
| 1265 | the function C<first> for this purpose. It works by stopping once it
|
---|
| 1266 | finds the element. It's written in C for speed, and its Perl equivalant
|
---|
| 1267 | looks like this subroutine:
|
---|
| 1268 |
|
---|
| 1269 | sub first (&@) {
|
---|
| 1270 | my $code = shift;
|
---|
| 1271 | foreach (@_) {
|
---|
| 1272 | return $_ if &{$code}();
|
---|
| 1273 | }
|
---|
| 1274 | undef;
|
---|
| 1275 | }
|
---|
| 1276 |
|
---|
| 1277 | If speed is of little concern, the common idiom uses grep in scalar context
|
---|
| 1278 | (which returns the number of items that passed its condition) to traverse the
|
---|
| 1279 | entire list. This does have the benefit of telling you how many matches it
|
---|
| 1280 | found, though.
|
---|
| 1281 |
|
---|
| 1282 | my $is_there = grep $_ eq $whatever, @array;
|
---|
| 1283 |
|
---|
| 1284 | If you want to actually extract the matching elements, simply use grep in
|
---|
| 1285 | list context.
|
---|
| 1286 |
|
---|
| 1287 | my @matches = grep $_ eq $whatever, @array;
|
---|
| 1288 |
|
---|
| 1289 | =head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
|
---|
| 1290 |
|
---|
| 1291 | Use a hash. Here's code to do both and more. It assumes that
|
---|
| 1292 | each element is unique in a given array:
|
---|
| 1293 |
|
---|
| 1294 | @union = @intersection = @difference = ();
|
---|
| 1295 | %count = ();
|
---|
| 1296 | foreach $element (@array1, @array2) { $count{$element}++ }
|
---|
| 1297 | foreach $element (keys %count) {
|
---|
| 1298 | push @union, $element;
|
---|
| 1299 | push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
|
---|
| 1300 | }
|
---|
| 1301 |
|
---|
| 1302 | Note that this is the I<symmetric difference>, that is, all elements in
|
---|
| 1303 | either A or in B but not in both. Think of it as an xor operation.
|
---|
| 1304 |
|
---|
| 1305 | =head2 How do I test whether two arrays or hashes are equal?
|
---|
| 1306 |
|
---|
| 1307 | The following code works for single-level arrays. It uses a stringwise
|
---|
| 1308 | comparison, and does not distinguish defined versus undefined empty
|
---|
| 1309 | strings. Modify if you have other needs.
|
---|
| 1310 |
|
---|
| 1311 | $are_equal = compare_arrays(\@frogs, \@toads);
|
---|
| 1312 |
|
---|
| 1313 | sub compare_arrays {
|
---|
| 1314 | my ($first, $second) = @_;
|
---|
| 1315 | no warnings; # silence spurious -w undef complaints
|
---|
| 1316 | return 0 unless @$first == @$second;
|
---|
| 1317 | for (my $i = 0; $i < @$first; $i++) {
|
---|
| 1318 | return 0 if $first->[$i] ne $second->[$i];
|
---|
| 1319 | }
|
---|
| 1320 | return 1;
|
---|
| 1321 | }
|
---|
| 1322 |
|
---|
| 1323 | For multilevel structures, you may wish to use an approach more
|
---|
| 1324 | like this one. It uses the CPAN module FreezeThaw:
|
---|
| 1325 |
|
---|
| 1326 | use FreezeThaw qw(cmpStr);
|
---|
| 1327 | @a = @b = ( "this", "that", [ "more", "stuff" ] );
|
---|
| 1328 |
|
---|
| 1329 | printf "a and b contain %s arrays\n",
|
---|
| 1330 | cmpStr(\@a, \@b) == 0
|
---|
| 1331 | ? "the same"
|
---|
| 1332 | : "different";
|
---|
| 1333 |
|
---|
| 1334 | This approach also works for comparing hashes. Here
|
---|
| 1335 | we'll demonstrate two different answers:
|
---|
| 1336 |
|
---|
| 1337 | use FreezeThaw qw(cmpStr cmpStrHard);
|
---|
| 1338 |
|
---|
| 1339 | %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
|
---|
| 1340 | $a{EXTRA} = \%b;
|
---|
| 1341 | $b{EXTRA} = \%a;
|
---|
| 1342 |
|
---|
| 1343 | printf "a and b contain %s hashes\n",
|
---|
| 1344 | cmpStr(\%a, \%b) == 0 ? "the same" : "different";
|
---|
| 1345 |
|
---|
| 1346 | printf "a and b contain %s hashes\n",
|
---|
| 1347 | cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
|
---|
| 1348 |
|
---|
| 1349 |
|
---|
| 1350 | The first reports that both those the hashes contain the same data,
|
---|
| 1351 | while the second reports that they do not. Which you prefer is left as
|
---|
| 1352 | an exercise to the reader.
|
---|
| 1353 |
|
---|
| 1354 | =head2 How do I find the first array element for which a condition is true?
|
---|
| 1355 |
|
---|
| 1356 | To find the first array element which satisfies a condition, you can
|
---|
| 1357 | use the first() function in the List::Util module, which comes with
|
---|
| 1358 | Perl 5.8. This example finds the first element that contains "Perl".
|
---|
| 1359 |
|
---|
| 1360 | use List::Util qw(first);
|
---|
| 1361 |
|
---|
| 1362 | my $element = first { /Perl/ } @array;
|
---|
| 1363 |
|
---|
| 1364 | If you cannot use List::Util, you can make your own loop to do the
|
---|
| 1365 | same thing. Once you find the element, you stop the loop with last.
|
---|
| 1366 |
|
---|
| 1367 | my $found;
|
---|
| 1368 | foreach ( @array )
|
---|
| 1369 | {
|
---|
| 1370 | if( /Perl/ ) { $found = $_; last }
|
---|
| 1371 | }
|
---|
| 1372 |
|
---|
| 1373 | If you want the array index, you can iterate through the indices
|
---|
| 1374 | and check the array element at each index until you find one
|
---|
| 1375 | that satisfies the condition.
|
---|
| 1376 |
|
---|
| 1377 | my( $found, $index ) = ( undef, -1 );
|
---|
| 1378 | for( $i = 0; $i < @array; $i++ )
|
---|
| 1379 | {
|
---|
| 1380 | if( $array[$i] =~ /Perl/ )
|
---|
| 1381 | {
|
---|
| 1382 | $found = $array[$i];
|
---|
| 1383 | $index = $i;
|
---|
| 1384 | last;
|
---|
| 1385 | }
|
---|
| 1386 | }
|
---|
| 1387 |
|
---|
| 1388 | =head2 How do I handle linked lists?
|
---|
| 1389 |
|
---|
| 1390 | In general, you usually don't need a linked list in Perl, since with
|
---|
| 1391 | regular arrays, you can push and pop or shift and unshift at either end,
|
---|
| 1392 | or you can use splice to add and/or remove arbitrary number of elements at
|
---|
| 1393 | arbitrary points. Both pop and shift are both O(1) operations on Perl's
|
---|
| 1394 | dynamic arrays. In the absence of shifts and pops, push in general
|
---|
| 1395 | needs to reallocate on the order every log(N) times, and unshift will
|
---|
| 1396 | need to copy pointers each time.
|
---|
| 1397 |
|
---|
| 1398 | If you really, really wanted, you could use structures as described in
|
---|
| 1399 | L<perldsc> or L<perltoot> and do just what the algorithm book tells you
|
---|
| 1400 | to do. For example, imagine a list node like this:
|
---|
| 1401 |
|
---|
| 1402 | $node = {
|
---|
| 1403 | VALUE => 42,
|
---|
| 1404 | LINK => undef,
|
---|
| 1405 | };
|
---|
| 1406 |
|
---|
| 1407 | You could walk the list this way:
|
---|
| 1408 |
|
---|
| 1409 | print "List: ";
|
---|
| 1410 | for ($node = $head; $node; $node = $node->{LINK}) {
|
---|
| 1411 | print $node->{VALUE}, " ";
|
---|
| 1412 | }
|
---|
| 1413 | print "\n";
|
---|
| 1414 |
|
---|
| 1415 | You could add to the list this way:
|
---|
| 1416 |
|
---|
| 1417 | my ($head, $tail);
|
---|
| 1418 | $tail = append($head, 1); # grow a new head
|
---|
| 1419 | for $value ( 2 .. 10 ) {
|
---|
| 1420 | $tail = append($tail, $value);
|
---|
| 1421 | }
|
---|
| 1422 |
|
---|
| 1423 | sub append {
|
---|
| 1424 | my($list, $value) = @_;
|
---|
| 1425 | my $node = { VALUE => $value };
|
---|
| 1426 | if ($list) {
|
---|
| 1427 | $node->{LINK} = $list->{LINK};
|
---|
| 1428 | $list->{LINK} = $node;
|
---|
| 1429 | } else {
|
---|
| 1430 | $_[0] = $node; # replace caller's version
|
---|
| 1431 | }
|
---|
| 1432 | return $node;
|
---|
| 1433 | }
|
---|
| 1434 |
|
---|
| 1435 | But again, Perl's built-in are virtually always good enough.
|
---|
| 1436 |
|
---|
| 1437 | =head2 How do I handle circular lists?
|
---|
| 1438 |
|
---|
| 1439 | Circular lists could be handled in the traditional fashion with linked
|
---|
| 1440 | lists, or you could just do something like this with an array:
|
---|
| 1441 |
|
---|
| 1442 | unshift(@array, pop(@array)); # the last shall be first
|
---|
| 1443 | push(@array, shift(@array)); # and vice versa
|
---|
| 1444 |
|
---|
| 1445 | =head2 How do I shuffle an array randomly?
|
---|
| 1446 |
|
---|
| 1447 | If you either have Perl 5.8.0 or later installed, or if you have
|
---|
| 1448 | Scalar-List-Utils 1.03 or later installed, you can say:
|
---|
| 1449 |
|
---|
| 1450 | use List::Util 'shuffle';
|
---|
| 1451 |
|
---|
| 1452 | @shuffled = shuffle(@list);
|
---|
| 1453 |
|
---|
| 1454 | If not, you can use a Fisher-Yates shuffle.
|
---|
| 1455 |
|
---|
| 1456 | sub fisher_yates_shuffle {
|
---|
| 1457 | my $deck = shift; # $deck is a reference to an array
|
---|
| 1458 | my $i = @$deck;
|
---|
| 1459 | while (--$i) {
|
---|
| 1460 | my $j = int rand ($i+1);
|
---|
| 1461 | @$deck[$i,$j] = @$deck[$j,$i];
|
---|
| 1462 | }
|
---|
| 1463 | }
|
---|
| 1464 |
|
---|
| 1465 | # shuffle my mpeg collection
|
---|
| 1466 | #
|
---|
| 1467 | my @mpeg = <audio/*/*.mp3>;
|
---|
| 1468 | fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
|
---|
| 1469 | print @mpeg;
|
---|
| 1470 |
|
---|
| 1471 | Note that the above implementation shuffles an array in place,
|
---|
| 1472 | unlike the List::Util::shuffle() which takes a list and returns
|
---|
| 1473 | a new shuffled list.
|
---|
| 1474 |
|
---|
| 1475 | You've probably seen shuffling algorithms that work using splice,
|
---|
| 1476 | randomly picking another element to swap the current element with
|
---|
| 1477 |
|
---|
| 1478 | srand;
|
---|
| 1479 | @new = ();
|
---|
| 1480 | @old = 1 .. 10; # just a demo
|
---|
| 1481 | while (@old) {
|
---|
| 1482 | push(@new, splice(@old, rand @old, 1));
|
---|
| 1483 | }
|
---|
| 1484 |
|
---|
| 1485 | This is bad because splice is already O(N), and since you do it N times,
|
---|
| 1486 | you just invented a quadratic algorithm; that is, O(N**2). This does
|
---|
| 1487 | not scale, although Perl is so efficient that you probably won't notice
|
---|
| 1488 | this until you have rather largish arrays.
|
---|
| 1489 |
|
---|
| 1490 | =head2 How do I process/modify each element of an array?
|
---|
| 1491 |
|
---|
| 1492 | Use C<for>/C<foreach>:
|
---|
| 1493 |
|
---|
| 1494 | for (@lines) {
|
---|
| 1495 | s/foo/bar/; # change that word
|
---|
| 1496 | tr/XZ/ZX/; # swap those letters
|
---|
| 1497 | }
|
---|
| 1498 |
|
---|
| 1499 | Here's another; let's compute spherical volumes:
|
---|
| 1500 |
|
---|
| 1501 | for (@volumes = @radii) { # @volumes has changed parts
|
---|
| 1502 | $_ **= 3;
|
---|
| 1503 | $_ *= (4/3) * 3.14159; # this will be constant folded
|
---|
| 1504 | }
|
---|
| 1505 |
|
---|
| 1506 | which can also be done with map() which is made to transform
|
---|
| 1507 | one list into another:
|
---|
| 1508 |
|
---|
| 1509 | @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
|
---|
| 1510 |
|
---|
| 1511 | If you want to do the same thing to modify the values of the
|
---|
| 1512 | hash, you can use the C<values> function. As of Perl 5.6
|
---|
| 1513 | the values are not copied, so if you modify $orbit (in this
|
---|
| 1514 | case), you modify the value.
|
---|
| 1515 |
|
---|
| 1516 | for $orbit ( values %orbits ) {
|
---|
| 1517 | ($orbit **= 3) *= (4/3) * 3.14159;
|
---|
| 1518 | }
|
---|
| 1519 |
|
---|
| 1520 | Prior to perl 5.6 C<values> returned copies of the values,
|
---|
| 1521 | so older perl code often contains constructions such as
|
---|
| 1522 | C<@orbits{keys %orbits}> instead of C<values %orbits> where
|
---|
| 1523 | the hash is to be modified.
|
---|
| 1524 |
|
---|
| 1525 | =head2 How do I select a random element from an array?
|
---|
| 1526 |
|
---|
| 1527 | Use the rand() function (see L<perlfunc/rand>):
|
---|
| 1528 |
|
---|
| 1529 | $index = rand @array;
|
---|
| 1530 | $element = $array[$index];
|
---|
| 1531 |
|
---|
| 1532 | Or, simply:
|
---|
| 1533 | my $element = $array[ rand @array ];
|
---|
| 1534 |
|
---|
| 1535 | =head2 How do I permute N elements of a list?
|
---|
| 1536 |
|
---|
| 1537 | Use the List::Permutor module on CPAN. If the list is
|
---|
| 1538 | actually an array, try the Algorithm::Permute module (also
|
---|
| 1539 | on CPAN). It's written in XS code and is very efficient.
|
---|
| 1540 |
|
---|
| 1541 | use Algorithm::Permute;
|
---|
| 1542 | my @array = 'a'..'d';
|
---|
| 1543 | my $p_iterator = Algorithm::Permute->new ( \@array );
|
---|
| 1544 | while (my @perm = $p_iterator->next) {
|
---|
| 1545 | print "next permutation: (@perm)\n";
|
---|
| 1546 | }
|
---|
| 1547 |
|
---|
| 1548 | For even faster execution, you could do:
|
---|
| 1549 |
|
---|
| 1550 | use Algorithm::Permute;
|
---|
| 1551 | my @array = 'a'..'d';
|
---|
| 1552 | Algorithm::Permute::permute {
|
---|
| 1553 | print "next permutation: (@array)\n";
|
---|
| 1554 | } @array;
|
---|
| 1555 |
|
---|
| 1556 | Here's a little program that generates all permutations of
|
---|
| 1557 | all the words on each line of input. The algorithm embodied
|
---|
| 1558 | in the permute() function is discussed in Volume 4 (still
|
---|
| 1559 | unpublished) of Knuth's I<The Art of Computer Programming>
|
---|
| 1560 | and will work on any list:
|
---|
| 1561 |
|
---|
| 1562 | #!/usr/bin/perl -n
|
---|
| 1563 | # Fischer-Kause ordered permutation generator
|
---|
| 1564 |
|
---|
| 1565 | sub permute (&@) {
|
---|
| 1566 | my $code = shift;
|
---|
| 1567 | my @idx = 0..$#_;
|
---|
| 1568 | while ( $code->(@_[@idx]) ) {
|
---|
| 1569 | my $p = $#idx;
|
---|
| 1570 | --$p while $idx[$p-1] > $idx[$p];
|
---|
| 1571 | my $q = $p or return;
|
---|
| 1572 | push @idx, reverse splice @idx, $p;
|
---|
| 1573 | ++$q while $idx[$p-1] > $idx[$q];
|
---|
| 1574 | @idx[$p-1,$q]=@idx[$q,$p-1];
|
---|
| 1575 | }
|
---|
| 1576 | }
|
---|
| 1577 |
|
---|
| 1578 | permute {print"@_\n"} split;
|
---|
| 1579 |
|
---|
| 1580 | =head2 How do I sort an array by (anything)?
|
---|
| 1581 |
|
---|
| 1582 | Supply a comparison function to sort() (described in L<perlfunc/sort>):
|
---|
| 1583 |
|
---|
| 1584 | @list = sort { $a <=> $b } @list;
|
---|
| 1585 |
|
---|
| 1586 | The default sort function is cmp, string comparison, which would
|
---|
| 1587 | sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
|
---|
| 1588 | the numerical comparison operator.
|
---|
| 1589 |
|
---|
| 1590 | If you have a complicated function needed to pull out the part you
|
---|
| 1591 | want to sort on, then don't do it inside the sort function. Pull it
|
---|
| 1592 | out first, because the sort BLOCK can be called many times for the
|
---|
| 1593 | same element. Here's an example of how to pull out the first word
|
---|
| 1594 | after the first number on each item, and then sort those words
|
---|
| 1595 | case-insensitively.
|
---|
| 1596 |
|
---|
| 1597 | @idx = ();
|
---|
| 1598 | for (@data) {
|
---|
| 1599 | ($item) = /\d+\s*(\S+)/;
|
---|
| 1600 | push @idx, uc($item);
|
---|
| 1601 | }
|
---|
| 1602 | @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
|
---|
| 1603 |
|
---|
| 1604 | which could also be written this way, using a trick
|
---|
| 1605 | that's come to be known as the Schwartzian Transform:
|
---|
| 1606 |
|
---|
| 1607 | @sorted = map { $_->[0] }
|
---|
| 1608 | sort { $a->[1] cmp $b->[1] }
|
---|
| 1609 | map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
|
---|
| 1610 |
|
---|
| 1611 | If you need to sort on several fields, the following paradigm is useful.
|
---|
| 1612 |
|
---|
| 1613 | @sorted = sort { field1($a) <=> field1($b) ||
|
---|
| 1614 | field2($a) cmp field2($b) ||
|
---|
| 1615 | field3($a) cmp field3($b)
|
---|
| 1616 | } @data;
|
---|
| 1617 |
|
---|
| 1618 | This can be conveniently combined with precalculation of keys as given
|
---|
| 1619 | above.
|
---|
| 1620 |
|
---|
| 1621 | See the F<sort> article in the "Far More Than You Ever Wanted
|
---|
| 1622 | To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
|
---|
| 1623 | more about this approach.
|
---|
| 1624 |
|
---|
| 1625 | See also the question below on sorting hashes.
|
---|
| 1626 |
|
---|
| 1627 | =head2 How do I manipulate arrays of bits?
|
---|
| 1628 |
|
---|
| 1629 | Use pack() and unpack(), or else vec() and the bitwise operations.
|
---|
| 1630 |
|
---|
| 1631 | For example, this sets $vec to have bit N set if $ints[N] was set:
|
---|
| 1632 |
|
---|
| 1633 | $vec = '';
|
---|
| 1634 | foreach(@ints) { vec($vec,$_,1) = 1 }
|
---|
| 1635 |
|
---|
| 1636 | Here's how, given a vector in $vec, you can
|
---|
| 1637 | get those bits into your @ints array:
|
---|
| 1638 |
|
---|
| 1639 | sub bitvec_to_list {
|
---|
| 1640 | my $vec = shift;
|
---|
| 1641 | my @ints;
|
---|
| 1642 | # Find null-byte density then select best algorithm
|
---|
| 1643 | if ($vec =~ tr/\0// / length $vec > 0.95) {
|
---|
| 1644 | use integer;
|
---|
| 1645 | my $i;
|
---|
| 1646 | # This method is faster with mostly null-bytes
|
---|
| 1647 | while($vec =~ /[^\0]/g ) {
|
---|
| 1648 | $i = -9 + 8 * pos $vec;
|
---|
| 1649 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1650 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1651 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1652 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1653 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1654 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1655 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1656 | push @ints, $i if vec($vec, ++$i, 1);
|
---|
| 1657 | }
|
---|
| 1658 | } else {
|
---|
| 1659 | # This method is a fast general algorithm
|
---|
| 1660 | use integer;
|
---|
| 1661 | my $bits = unpack "b*", $vec;
|
---|
| 1662 | push @ints, 0 if $bits =~ s/^(\d)// && $1;
|
---|
| 1663 | push @ints, pos $bits while($bits =~ /1/g);
|
---|
| 1664 | }
|
---|
| 1665 | return \@ints;
|
---|
| 1666 | }
|
---|
| 1667 |
|
---|
| 1668 | This method gets faster the more sparse the bit vector is.
|
---|
| 1669 | (Courtesy of Tim Bunce and Winfried Koenig.)
|
---|
| 1670 |
|
---|
| 1671 | You can make the while loop a lot shorter with this suggestion
|
---|
| 1672 | from Benjamin Goldberg:
|
---|
| 1673 |
|
---|
| 1674 | while($vec =~ /[^\0]+/g ) {
|
---|
| 1675 | push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
|
---|
| 1676 | }
|
---|
| 1677 |
|
---|
| 1678 | Or use the CPAN module Bit::Vector:
|
---|
| 1679 |
|
---|
| 1680 | $vector = Bit::Vector->new($num_of_bits);
|
---|
| 1681 | $vector->Index_List_Store(@ints);
|
---|
| 1682 | @ints = $vector->Index_List_Read();
|
---|
| 1683 |
|
---|
| 1684 | Bit::Vector provides efficient methods for bit vector, sets of small integers
|
---|
| 1685 | and "big int" math.
|
---|
| 1686 |
|
---|
| 1687 | Here's a more extensive illustration using vec():
|
---|
| 1688 |
|
---|
| 1689 | # vec demo
|
---|
| 1690 | $vector = "\xff\x0f\xef\xfe";
|
---|
| 1691 | print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
|
---|
| 1692 | unpack("N", $vector), "\n";
|
---|
| 1693 | $is_set = vec($vector, 23, 1);
|
---|
| 1694 | print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
|
---|
| 1695 | pvec($vector);
|
---|
| 1696 |
|
---|
| 1697 | set_vec(1,1,1);
|
---|
| 1698 | set_vec(3,1,1);
|
---|
| 1699 | set_vec(23,1,1);
|
---|
| 1700 |
|
---|
| 1701 | set_vec(3,1,3);
|
---|
| 1702 | set_vec(3,2,3);
|
---|
| 1703 | set_vec(3,4,3);
|
---|
| 1704 | set_vec(3,4,7);
|
---|
| 1705 | set_vec(3,8,3);
|
---|
| 1706 | set_vec(3,8,7);
|
---|
| 1707 |
|
---|
| 1708 | set_vec(0,32,17);
|
---|
| 1709 | set_vec(1,32,17);
|
---|
| 1710 |
|
---|
| 1711 | sub set_vec {
|
---|
| 1712 | my ($offset, $width, $value) = @_;
|
---|
| 1713 | my $vector = '';
|
---|
| 1714 | vec($vector, $offset, $width) = $value;
|
---|
| 1715 | print "offset=$offset width=$width value=$value\n";
|
---|
| 1716 | pvec($vector);
|
---|
| 1717 | }
|
---|
| 1718 |
|
---|
| 1719 | sub pvec {
|
---|
| 1720 | my $vector = shift;
|
---|
| 1721 | my $bits = unpack("b*", $vector);
|
---|
| 1722 | my $i = 0;
|
---|
| 1723 | my $BASE = 8;
|
---|
| 1724 |
|
---|
| 1725 | print "vector length in bytes: ", length($vector), "\n";
|
---|
| 1726 | @bytes = unpack("A8" x length($vector), $bits);
|
---|
| 1727 | print "bits are: @bytes\n\n";
|
---|
| 1728 | }
|
---|
| 1729 |
|
---|
| 1730 | =head2 Why does defined() return true on empty arrays and hashes?
|
---|
| 1731 |
|
---|
| 1732 | The short story is that you should probably only use defined on scalars or
|
---|
| 1733 | functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
|
---|
| 1734 | in the 5.004 release or later of Perl for more detail.
|
---|
| 1735 |
|
---|
| 1736 | =head1 Data: Hashes (Associative Arrays)
|
---|
| 1737 |
|
---|
| 1738 | =head2 How do I process an entire hash?
|
---|
| 1739 |
|
---|
| 1740 | Use the each() function (see L<perlfunc/each>) if you don't care
|
---|
| 1741 | whether it's sorted:
|
---|
| 1742 |
|
---|
| 1743 | while ( ($key, $value) = each %hash) {
|
---|
| 1744 | print "$key = $value\n";
|
---|
| 1745 | }
|
---|
| 1746 |
|
---|
| 1747 | If you want it sorted, you'll have to use foreach() on the result of
|
---|
| 1748 | sorting the keys as shown in an earlier question.
|
---|
| 1749 |
|
---|
| 1750 | =head2 What happens if I add or remove keys from a hash while iterating over it?
|
---|
| 1751 |
|
---|
| 1752 | (contributed by brian d foy)
|
---|
| 1753 |
|
---|
| 1754 | The easy answer is "Don't do that!"
|
---|
| 1755 |
|
---|
| 1756 | If you iterate through the hash with each(), you can delete the key
|
---|
| 1757 | most recently returned without worrying about it. If you delete or add
|
---|
| 1758 | other keys, the iterator may skip or double up on them since perl
|
---|
| 1759 | may rearrange the hash table. See the
|
---|
| 1760 | entry for C<each()> in L<perlfunc>.
|
---|
| 1761 |
|
---|
| 1762 | =head2 How do I look up a hash element by value?
|
---|
| 1763 |
|
---|
| 1764 | Create a reverse hash:
|
---|
| 1765 |
|
---|
| 1766 | %by_value = reverse %by_key;
|
---|
| 1767 | $key = $by_value{$value};
|
---|
| 1768 |
|
---|
| 1769 | That's not particularly efficient. It would be more space-efficient
|
---|
| 1770 | to use:
|
---|
| 1771 |
|
---|
| 1772 | while (($key, $value) = each %by_key) {
|
---|
| 1773 | $by_value{$value} = $key;
|
---|
| 1774 | }
|
---|
| 1775 |
|
---|
| 1776 | If your hash could have repeated values, the methods above will only find
|
---|
| 1777 | one of the associated keys. This may or may not worry you. If it does
|
---|
| 1778 | worry you, you can always reverse the hash into a hash of arrays instead:
|
---|
| 1779 |
|
---|
| 1780 | while (($key, $value) = each %by_key) {
|
---|
| 1781 | push @{$key_list_by_value{$value}}, $key;
|
---|
| 1782 | }
|
---|
| 1783 |
|
---|
| 1784 | =head2 How can I know how many entries are in a hash?
|
---|
| 1785 |
|
---|
| 1786 | If you mean how many keys, then all you have to do is
|
---|
| 1787 | use the keys() function in a scalar context:
|
---|
| 1788 |
|
---|
| 1789 | $num_keys = keys %hash;
|
---|
| 1790 |
|
---|
| 1791 | The keys() function also resets the iterator, which means that you may
|
---|
| 1792 | see strange results if you use this between uses of other hash operators
|
---|
| 1793 | such as each().
|
---|
| 1794 |
|
---|
| 1795 | =head2 How do I sort a hash (optionally by value instead of key)?
|
---|
| 1796 |
|
---|
| 1797 | (contributed by brian d foy)
|
---|
| 1798 |
|
---|
| 1799 | To sort a hash, start with the keys. In this example, we give the list of
|
---|
| 1800 | keys to the sort function which then compares them ASCIIbetically (which
|
---|
| 1801 | might be affected by your locale settings). The output list has the keys
|
---|
| 1802 | in ASCIIbetical order. Once we have the keys, we can go through them to
|
---|
| 1803 | create a report which lists the keys in ASCIIbetical order.
|
---|
| 1804 |
|
---|
| 1805 | my @keys = sort { $a cmp $b } keys %hash;
|
---|
| 1806 |
|
---|
| 1807 | foreach my $key ( @keys )
|
---|
| 1808 | {
|
---|
| 1809 | printf "%-20s %6d\n", $key, $hash{$value};
|
---|
| 1810 | }
|
---|
| 1811 |
|
---|
| 1812 | We could get more fancy in the C<sort()> block though. Instead of
|
---|
| 1813 | comparing the keys, we can compute a value with them and use that
|
---|
| 1814 | value as the comparison.
|
---|
| 1815 |
|
---|
| 1816 | For instance, to make our report order case-insensitive, we use
|
---|
| 1817 | the C<\L> sequence in a double-quoted string to make everything
|
---|
| 1818 | lowercase. The C<sort()> block then compares the lowercased
|
---|
| 1819 | values to determine in which order to put the keys.
|
---|
| 1820 |
|
---|
| 1821 | my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
|
---|
| 1822 |
|
---|
| 1823 | Note: if the computation is expensive or the hash has many elements,
|
---|
| 1824 | you may want to look at the Schwartzian Transform to cache the
|
---|
| 1825 | computation results.
|
---|
| 1826 |
|
---|
| 1827 | If we want to sort by the hash value instead, we use the hash key
|
---|
| 1828 | to look it up. We still get out a list of keys, but this time they
|
---|
| 1829 | are ordered by their value.
|
---|
| 1830 |
|
---|
| 1831 | my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
|
---|
| 1832 |
|
---|
| 1833 | From there we can get more complex. If the hash values are the same,
|
---|
| 1834 | we can provide a secondary sort on the hash key.
|
---|
| 1835 |
|
---|
| 1836 | my @keys = sort {
|
---|
| 1837 | $hash{$a} <=> $hash{$b}
|
---|
| 1838 | or
|
---|
| 1839 | "\L$a" cmp "\L$b"
|
---|
| 1840 | } keys %hash;
|
---|
| 1841 |
|
---|
| 1842 | =head2 How can I always keep my hash sorted?
|
---|
| 1843 |
|
---|
| 1844 | You can look into using the DB_File module and tie() using the
|
---|
| 1845 | $DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
|
---|
| 1846 | The Tie::IxHash module from CPAN might also be instructive.
|
---|
| 1847 |
|
---|
| 1848 | =head2 What's the difference between "delete" and "undef" with hashes?
|
---|
| 1849 |
|
---|
| 1850 | Hashes contain pairs of scalars: the first is the key, the
|
---|
| 1851 | second is the value. The key will be coerced to a string,
|
---|
| 1852 | although the value can be any kind of scalar: string,
|
---|
| 1853 | number, or reference. If a key $key is present in
|
---|
| 1854 | %hash, C<exists($hash{$key})> will return true. The value
|
---|
| 1855 | for a given key can be C<undef>, in which case
|
---|
| 1856 | C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
|
---|
| 1857 | will return true. This corresponds to (C<$key>, C<undef>)
|
---|
| 1858 | being in the hash.
|
---|
| 1859 |
|
---|
| 1860 | Pictures help... here's the %hash table:
|
---|
| 1861 |
|
---|
| 1862 | keys values
|
---|
| 1863 | +------+------+
|
---|
| 1864 | | a | 3 |
|
---|
| 1865 | | x | 7 |
|
---|
| 1866 | | d | 0 |
|
---|
| 1867 | | e | 2 |
|
---|
| 1868 | +------+------+
|
---|
| 1869 |
|
---|
| 1870 | And these conditions hold
|
---|
| 1871 |
|
---|
| 1872 | $hash{'a'} is true
|
---|
| 1873 | $hash{'d'} is false
|
---|
| 1874 | defined $hash{'d'} is true
|
---|
| 1875 | defined $hash{'a'} is true
|
---|
| 1876 | exists $hash{'a'} is true (Perl5 only)
|
---|
| 1877 | grep ($_ eq 'a', keys %hash) is true
|
---|
| 1878 |
|
---|
| 1879 | If you now say
|
---|
| 1880 |
|
---|
| 1881 | undef $hash{'a'}
|
---|
| 1882 |
|
---|
| 1883 | your table now reads:
|
---|
| 1884 |
|
---|
| 1885 |
|
---|
| 1886 | keys values
|
---|
| 1887 | +------+------+
|
---|
| 1888 | | a | undef|
|
---|
| 1889 | | x | 7 |
|
---|
| 1890 | | d | 0 |
|
---|
| 1891 | | e | 2 |
|
---|
| 1892 | +------+------+
|
---|
| 1893 |
|
---|
| 1894 | and these conditions now hold; changes in caps:
|
---|
| 1895 |
|
---|
| 1896 | $hash{'a'} is FALSE
|
---|
| 1897 | $hash{'d'} is false
|
---|
| 1898 | defined $hash{'d'} is true
|
---|
| 1899 | defined $hash{'a'} is FALSE
|
---|
| 1900 | exists $hash{'a'} is true (Perl5 only)
|
---|
| 1901 | grep ($_ eq 'a', keys %hash) is true
|
---|
| 1902 |
|
---|
| 1903 | Notice the last two: you have an undef value, but a defined key!
|
---|
| 1904 |
|
---|
| 1905 | Now, consider this:
|
---|
| 1906 |
|
---|
| 1907 | delete $hash{'a'}
|
---|
| 1908 |
|
---|
| 1909 | your table now reads:
|
---|
| 1910 |
|
---|
| 1911 | keys values
|
---|
| 1912 | +------+------+
|
---|
| 1913 | | x | 7 |
|
---|
| 1914 | | d | 0 |
|
---|
| 1915 | | e | 2 |
|
---|
| 1916 | +------+------+
|
---|
| 1917 |
|
---|
| 1918 | and these conditions now hold; changes in caps:
|
---|
| 1919 |
|
---|
| 1920 | $hash{'a'} is false
|
---|
| 1921 | $hash{'d'} is false
|
---|
| 1922 | defined $hash{'d'} is true
|
---|
| 1923 | defined $hash{'a'} is false
|
---|
| 1924 | exists $hash{'a'} is FALSE (Perl5 only)
|
---|
| 1925 | grep ($_ eq 'a', keys %hash) is FALSE
|
---|
| 1926 |
|
---|
| 1927 | See, the whole entry is gone!
|
---|
| 1928 |
|
---|
| 1929 | =head2 Why don't my tied hashes make the defined/exists distinction?
|
---|
| 1930 |
|
---|
| 1931 | This depends on the tied hash's implementation of EXISTS().
|
---|
| 1932 | For example, there isn't the concept of undef with hashes
|
---|
| 1933 | that are tied to DBM* files. It also means that exists() and
|
---|
| 1934 | defined() do the same thing with a DBM* file, and what they
|
---|
| 1935 | end up doing is not what they do with ordinary hashes.
|
---|
| 1936 |
|
---|
| 1937 | =head2 How do I reset an each() operation part-way through?
|
---|
| 1938 |
|
---|
| 1939 | Using C<keys %hash> in scalar context returns the number of keys in
|
---|
| 1940 | the hash I<and> resets the iterator associated with the hash. You may
|
---|
| 1941 | need to do this if you use C<last> to exit a loop early so that when you
|
---|
| 1942 | re-enter it, the hash iterator has been reset.
|
---|
| 1943 |
|
---|
| 1944 | =head2 How can I get the unique keys from two hashes?
|
---|
| 1945 |
|
---|
| 1946 | First you extract the keys from the hashes into lists, then solve
|
---|
| 1947 | the "removing duplicates" problem described above. For example:
|
---|
| 1948 |
|
---|
| 1949 | %seen = ();
|
---|
| 1950 | for $element (keys(%foo), keys(%bar)) {
|
---|
| 1951 | $seen{$element}++;
|
---|
| 1952 | }
|
---|
| 1953 | @uniq = keys %seen;
|
---|
| 1954 |
|
---|
| 1955 | Or more succinctly:
|
---|
| 1956 |
|
---|
| 1957 | @uniq = keys %{{%foo,%bar}};
|
---|
| 1958 |
|
---|
| 1959 | Or if you really want to save space:
|
---|
| 1960 |
|
---|
| 1961 | %seen = ();
|
---|
| 1962 | while (defined ($key = each %foo)) {
|
---|
| 1963 | $seen{$key}++;
|
---|
| 1964 | }
|
---|
| 1965 | while (defined ($key = each %bar)) {
|
---|
| 1966 | $seen{$key}++;
|
---|
| 1967 | }
|
---|
| 1968 | @uniq = keys %seen;
|
---|
| 1969 |
|
---|
| 1970 | =head2 How can I store a multidimensional array in a DBM file?
|
---|
| 1971 |
|
---|
| 1972 | Either stringify the structure yourself (no fun), or else
|
---|
| 1973 | get the MLDBM (which uses Data::Dumper) module from CPAN and layer
|
---|
| 1974 | it on top of either DB_File or GDBM_File.
|
---|
| 1975 |
|
---|
| 1976 | =head2 How can I make my hash remember the order I put elements into it?
|
---|
| 1977 |
|
---|
| 1978 | Use the Tie::IxHash from CPAN.
|
---|
| 1979 |
|
---|
| 1980 | use Tie::IxHash;
|
---|
| 1981 | tie my %myhash, 'Tie::IxHash';
|
---|
| 1982 | for (my $i=0; $i<20; $i++) {
|
---|
| 1983 | $myhash{$i} = 2*$i;
|
---|
| 1984 | }
|
---|
| 1985 | my @keys = keys %myhash;
|
---|
| 1986 | # @keys = (0,1,2,3,...)
|
---|
| 1987 |
|
---|
| 1988 | =head2 Why does passing a subroutine an undefined element in a hash create it?
|
---|
| 1989 |
|
---|
| 1990 | If you say something like:
|
---|
| 1991 |
|
---|
| 1992 | somefunc($hash{"nonesuch key here"});
|
---|
| 1993 |
|
---|
| 1994 | Then that element "autovivifies"; that is, it springs into existence
|
---|
| 1995 | whether you store something there or not. That's because functions
|
---|
| 1996 | get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
|
---|
| 1997 | it has to be ready to write it back into the caller's version.
|
---|
| 1998 |
|
---|
| 1999 | This has been fixed as of Perl5.004.
|
---|
| 2000 |
|
---|
| 2001 | Normally, merely accessing a key's value for a nonexistent key does
|
---|
| 2002 | I<not> cause that key to be forever there. This is different than
|
---|
| 2003 | awk's behavior.
|
---|
| 2004 |
|
---|
| 2005 | =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
|
---|
| 2006 |
|
---|
| 2007 | Usually a hash ref, perhaps like this:
|
---|
| 2008 |
|
---|
| 2009 | $record = {
|
---|
| 2010 | NAME => "Jason",
|
---|
| 2011 | EMPNO => 132,
|
---|
| 2012 | TITLE => "deputy peon",
|
---|
| 2013 | AGE => 23,
|
---|
| 2014 | SALARY => 37_000,
|
---|
| 2015 | PALS => [ "Norbert", "Rhys", "Phineas"],
|
---|
| 2016 | };
|
---|
| 2017 |
|
---|
| 2018 | References are documented in L<perlref> and the upcoming L<perlreftut>.
|
---|
| 2019 | Examples of complex data structures are given in L<perldsc> and
|
---|
| 2020 | L<perllol>. Examples of structures and object-oriented classes are
|
---|
| 2021 | in L<perltoot>.
|
---|
| 2022 |
|
---|
| 2023 | =head2 How can I use a reference as a hash key?
|
---|
| 2024 |
|
---|
| 2025 | (contributed by brian d foy)
|
---|
| 2026 |
|
---|
| 2027 | Hash keys are strings, so you can't really use a reference as the key.
|
---|
| 2028 | When you try to do that, perl turns the reference into its stringified
|
---|
| 2029 | form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get back
|
---|
| 2030 | the reference from the stringified form, at least without doing some
|
---|
| 2031 | extra work on your own. Also remember that hash keys must be unique, but
|
---|
| 2032 | two different variables can store the same reference (and those variables
|
---|
| 2033 | can change later).
|
---|
| 2034 |
|
---|
| 2035 | The Tie::RefHash module, which is distributed with perl, might be what
|
---|
| 2036 | you want. It handles that extra work.
|
---|
| 2037 |
|
---|
| 2038 | =head1 Data: Misc
|
---|
| 2039 |
|
---|
| 2040 | =head2 How do I handle binary data correctly?
|
---|
| 2041 |
|
---|
| 2042 | Perl is binary clean, so this shouldn't be a problem. For example,
|
---|
| 2043 | this works fine (assuming the files are found):
|
---|
| 2044 |
|
---|
| 2045 | if (`cat /vmunix` =~ /gzip/) {
|
---|
| 2046 | print "Your kernel is GNU-zip enabled!\n";
|
---|
| 2047 | }
|
---|
| 2048 |
|
---|
| 2049 | On less elegant (read: Byzantine) systems, however, you have
|
---|
| 2050 | to play tedious games with "text" versus "binary" files. See
|
---|
| 2051 | L<perlfunc/"binmode"> or L<perlopentut>.
|
---|
| 2052 |
|
---|
| 2053 | If you're concerned about 8-bit ASCII data, then see L<perllocale>.
|
---|
| 2054 |
|
---|
| 2055 | If you want to deal with multibyte characters, however, there are
|
---|
| 2056 | some gotchas. See the section on Regular Expressions.
|
---|
| 2057 |
|
---|
| 2058 | =head2 How do I determine whether a scalar is a number/whole/integer/float?
|
---|
| 2059 |
|
---|
| 2060 | Assuming that you don't care about IEEE notations like "NaN" or
|
---|
| 2061 | "Infinity", you probably just want to use a regular expression.
|
---|
| 2062 |
|
---|
| 2063 | if (/\D/) { print "has nondigits\n" }
|
---|
| 2064 | if (/^\d+$/) { print "is a whole number\n" }
|
---|
| 2065 | if (/^-?\d+$/) { print "is an integer\n" }
|
---|
| 2066 | if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
|
---|
| 2067 | if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
|
---|
| 2068 | if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
|
---|
| 2069 | if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
|
---|
| 2070 | { print "a C float\n" }
|
---|
| 2071 |
|
---|
| 2072 | There are also some commonly used modules for the task.
|
---|
| 2073 | L<Scalar::Util> (distributed with 5.8) provides access to perl's
|
---|
| 2074 | internal function C<looks_like_number> for determining
|
---|
| 2075 | whether a variable looks like a number. L<Data::Types>
|
---|
| 2076 | exports functions that validate data types using both the
|
---|
| 2077 | above and other regular expressions. Thirdly, there is
|
---|
| 2078 | C<Regexp::Common> which has regular expressions to match
|
---|
| 2079 | various types of numbers. Those three modules are available
|
---|
| 2080 | from the CPAN.
|
---|
| 2081 |
|
---|
| 2082 | If you're on a POSIX system, Perl supports the C<POSIX::strtod>
|
---|
| 2083 | function. Its semantics are somewhat cumbersome, so here's a C<getnum>
|
---|
| 2084 | wrapper function for more convenient access. This function takes
|
---|
| 2085 | a string and returns the number it found, or C<undef> for input that
|
---|
| 2086 | isn't a C float. The C<is_numeric> function is a front end to C<getnum>
|
---|
| 2087 | if you just want to say, "Is this a float?"
|
---|
| 2088 |
|
---|
| 2089 | sub getnum {
|
---|
| 2090 | use POSIX qw(strtod);
|
---|
| 2091 | my $str = shift;
|
---|
| 2092 | $str =~ s/^\s+//;
|
---|
| 2093 | $str =~ s/\s+$//;
|
---|
| 2094 | $! = 0;
|
---|
| 2095 | my($num, $unparsed) = strtod($str);
|
---|
| 2096 | if (($str eq '') || ($unparsed != 0) || $!) {
|
---|
| 2097 | return undef;
|
---|
| 2098 | } else {
|
---|
| 2099 | return $num;
|
---|
| 2100 | }
|
---|
| 2101 | }
|
---|
| 2102 |
|
---|
| 2103 | sub is_numeric { defined getnum($_[0]) }
|
---|
| 2104 |
|
---|
| 2105 | Or you could check out the L<String::Scanf> module on the CPAN
|
---|
| 2106 | instead. The POSIX module (part of the standard Perl distribution) provides
|
---|
| 2107 | the C<strtod> and C<strtol> for converting strings to double and longs,
|
---|
| 2108 | respectively.
|
---|
| 2109 |
|
---|
| 2110 | =head2 How do I keep persistent data across program calls?
|
---|
| 2111 |
|
---|
| 2112 | For some specific applications, you can use one of the DBM modules.
|
---|
| 2113 | See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
|
---|
| 2114 | or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
|
---|
| 2115 | of the standard distribution. Here's one example using Storable's C<store>
|
---|
| 2116 | and C<retrieve> functions:
|
---|
| 2117 |
|
---|
| 2118 | use Storable;
|
---|
| 2119 | store(\%hash, "filename");
|
---|
| 2120 |
|
---|
| 2121 | # later on...
|
---|
| 2122 | $href = retrieve("filename"); # by ref
|
---|
| 2123 | %hash = %{ retrieve("filename") }; # direct to hash
|
---|
| 2124 |
|
---|
| 2125 | =head2 How do I print out or copy a recursive data structure?
|
---|
| 2126 |
|
---|
| 2127 | The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
|
---|
| 2128 | for printing out data structures. The Storable module on CPAN (or the
|
---|
| 2129 | 5.8 release of Perl), provides a function called C<dclone> that recursively
|
---|
| 2130 | copies its argument.
|
---|
| 2131 |
|
---|
| 2132 | use Storable qw(dclone);
|
---|
| 2133 | $r2 = dclone($r1);
|
---|
| 2134 |
|
---|
| 2135 | Where $r1 can be a reference to any kind of data structure you'd like.
|
---|
| 2136 | It will be deeply copied. Because C<dclone> takes and returns references,
|
---|
| 2137 | you'd have to add extra punctuation if you had a hash of arrays that
|
---|
| 2138 | you wanted to copy.
|
---|
| 2139 |
|
---|
| 2140 | %newhash = %{ dclone(\%oldhash) };
|
---|
| 2141 |
|
---|
| 2142 | =head2 How do I define methods for every class/object?
|
---|
| 2143 |
|
---|
| 2144 | Use the UNIVERSAL class (see L<UNIVERSAL>).
|
---|
| 2145 |
|
---|
| 2146 | =head2 How do I verify a credit card checksum?
|
---|
| 2147 |
|
---|
| 2148 | Get the Business::CreditCard module from CPAN.
|
---|
| 2149 |
|
---|
| 2150 | =head2 How do I pack arrays of doubles or floats for XS code?
|
---|
| 2151 |
|
---|
| 2152 | The kgbpack.c code in the PGPLOT module on CPAN does just this.
|
---|
| 2153 | If you're doing a lot of float or double processing, consider using
|
---|
| 2154 | the PDL module from CPAN instead--it makes number-crunching easy.
|
---|
| 2155 |
|
---|
| 2156 | =head1 AUTHOR AND COPYRIGHT
|
---|
| 2157 |
|
---|
| 2158 | Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and
|
---|
| 2159 | other authors as noted. All rights reserved.
|
---|
| 2160 |
|
---|
| 2161 | This documentation is free; you can redistribute it and/or modify it
|
---|
| 2162 | under the same terms as Perl itself.
|
---|
| 2163 |
|
---|
| 2164 | Irrespective of its distribution, all code examples in this file
|
---|
| 2165 | are hereby placed into the public domain. You are permitted and
|
---|
| 2166 | encouraged to use this code in your own programs for fun
|
---|
| 2167 | or for profit as you see fit. A simple comment in the code giving
|
---|
| 2168 | credit would be courteous but is not required.
|
---|