1 | =head1 NAME
|
---|
2 | X<data structure> X<complex data structure> X<struct>
|
---|
3 |
|
---|
4 | perldsc - Perl Data Structures Cookbook
|
---|
5 |
|
---|
6 | =head1 DESCRIPTION
|
---|
7 |
|
---|
8 | The single feature most sorely lacking in the Perl programming language
|
---|
9 | prior to its 5.0 release was complex data structures. Even without direct
|
---|
10 | language support, some valiant programmers did manage to emulate them, but
|
---|
11 | it was hard work and not for the faint of heart. You could occasionally
|
---|
12 | get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
|
---|
13 | keys are actually more like a single concatenated string C<"$AoA$b">, but
|
---|
14 | traversal and sorting were difficult. More desperate programmers even
|
---|
15 | hacked Perl's internal symbol table directly, a strategy that proved hard
|
---|
16 | to develop and maintain--to put it mildly.
|
---|
17 |
|
---|
18 | The 5.0 release of Perl let us have complex data structures. You
|
---|
19 | may now write something like this and all of a sudden, you'd have an array
|
---|
20 | with three dimensions!
|
---|
21 |
|
---|
22 | for $x (1 .. 10) {
|
---|
23 | for $y (1 .. 10) {
|
---|
24 | for $z (1 .. 10) {
|
---|
25 | $AoA[$x][$y][$z] =
|
---|
26 | $x ** $y + $z;
|
---|
27 | }
|
---|
28 | }
|
---|
29 | }
|
---|
30 |
|
---|
31 | Alas, however simple this may appear, underneath it's a much more
|
---|
32 | elaborate construct than meets the eye!
|
---|
33 |
|
---|
34 | How do you print it out? Why can't you say just C<print @AoA>? How do
|
---|
35 | you sort it? How can you pass it to a function or get one of these back
|
---|
36 | from a function? Is it an object? Can you save it to disk to read
|
---|
37 | back later? How do you access whole rows or columns of that matrix? Do
|
---|
38 | all the values have to be numeric?
|
---|
39 |
|
---|
40 | As you see, it's quite easy to become confused. While some small portion
|
---|
41 | of the blame for this can be attributed to the reference-based
|
---|
42 | implementation, it's really more due to a lack of existing documentation with
|
---|
43 | examples designed for the beginner.
|
---|
44 |
|
---|
45 | This document is meant to be a detailed but understandable treatment of the
|
---|
46 | many different sorts of data structures you might want to develop. It
|
---|
47 | should also serve as a cookbook of examples. That way, when you need to
|
---|
48 | create one of these complex data structures, you can just pinch, pilfer, or
|
---|
49 | purloin a drop-in example from here.
|
---|
50 |
|
---|
51 | Let's look at each of these possible constructs in detail. There are separate
|
---|
52 | sections on each of the following:
|
---|
53 |
|
---|
54 | =over 5
|
---|
55 |
|
---|
56 | =item * arrays of arrays
|
---|
57 |
|
---|
58 | =item * hashes of arrays
|
---|
59 |
|
---|
60 | =item * arrays of hashes
|
---|
61 |
|
---|
62 | =item * hashes of hashes
|
---|
63 |
|
---|
64 | =item * more elaborate constructs
|
---|
65 |
|
---|
66 | =back
|
---|
67 |
|
---|
68 | But for now, let's look at general issues common to all
|
---|
69 | these types of data structures.
|
---|
70 |
|
---|
71 | =head1 REFERENCES
|
---|
72 | X<reference> X<dereference> X<dereferencing> X<pointer>
|
---|
73 |
|
---|
74 | The most important thing to understand about all data structures in Perl
|
---|
75 | -- including multidimensional arrays--is that even though they might
|
---|
76 | appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
|
---|
77 | one-dimensional. They can hold only scalar values (meaning a string,
|
---|
78 | number, or a reference). They cannot directly contain other arrays or
|
---|
79 | hashes, but instead contain I<references> to other arrays or hashes.
|
---|
80 | X<multidimensional array> X<array, multidimensional>
|
---|
81 |
|
---|
82 | You can't use a reference to an array or hash in quite the same way that you
|
---|
83 | would a real array or hash. For C or C++ programmers unused to
|
---|
84 | distinguishing between arrays and pointers to the same, this can be
|
---|
85 | confusing. If so, just think of it as the difference between a structure
|
---|
86 | and a pointer to a structure.
|
---|
87 |
|
---|
88 | You can (and should) read more about references in the perlref(1) man
|
---|
89 | page. Briefly, references are rather like pointers that know what they
|
---|
90 | point to. (Objects are also a kind of reference, but we won't be needing
|
---|
91 | them right away--if ever.) This means that when you have something which
|
---|
92 | looks to you like an access to a two-or-more-dimensional array and/or hash,
|
---|
93 | what's really going on is that the base type is
|
---|
94 | merely a one-dimensional entity that contains references to the next
|
---|
95 | level. It's just that you can I<use> it as though it were a
|
---|
96 | two-dimensional one. This is actually the way almost all C
|
---|
97 | multidimensional arrays work as well.
|
---|
98 |
|
---|
99 | $array[7][12] # array of arrays
|
---|
100 | $array[7]{string} # array of hashes
|
---|
101 | $hash{string}[7] # hash of arrays
|
---|
102 | $hash{string}{'another string'} # hash of hashes
|
---|
103 |
|
---|
104 | Now, because the top level contains only references, if you try to print
|
---|
105 | out your array in with a simple print() function, you'll get something
|
---|
106 | that doesn't look very nice, like this:
|
---|
107 |
|
---|
108 | @AoA = ( [2, 3], [4, 5, 7], [0] );
|
---|
109 | print $AoA[1][2];
|
---|
110 | 7
|
---|
111 | print @AoA;
|
---|
112 | ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
|
---|
113 |
|
---|
114 |
|
---|
115 | That's because Perl doesn't (ever) implicitly dereference your variables.
|
---|
116 | If you want to get at the thing a reference is referring to, then you have
|
---|
117 | to do this yourself using either prefix typing indicators, like
|
---|
118 | C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
|
---|
119 | like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
|
---|
120 |
|
---|
121 | =head1 COMMON MISTAKES
|
---|
122 |
|
---|
123 | The two most common mistakes made in constructing something like
|
---|
124 | an array of arrays is either accidentally counting the number of
|
---|
125 | elements or else taking a reference to the same memory location
|
---|
126 | repeatedly. Here's the case where you just get the count instead
|
---|
127 | of a nested array:
|
---|
128 |
|
---|
129 | for $i (1..10) {
|
---|
130 | @array = somefunc($i);
|
---|
131 | $AoA[$i] = @array; # WRONG!
|
---|
132 | }
|
---|
133 |
|
---|
134 | That's just the simple case of assigning an array to a scalar and getting
|
---|
135 | its element count. If that's what you really and truly want, then you
|
---|
136 | might do well to consider being a tad more explicit about it, like this:
|
---|
137 |
|
---|
138 | for $i (1..10) {
|
---|
139 | @array = somefunc($i);
|
---|
140 | $counts[$i] = scalar @array;
|
---|
141 | }
|
---|
142 |
|
---|
143 | Here's the case of taking a reference to the same memory location
|
---|
144 | again and again:
|
---|
145 |
|
---|
146 | for $i (1..10) {
|
---|
147 | @array = somefunc($i);
|
---|
148 | $AoA[$i] = \@array; # WRONG!
|
---|
149 | }
|
---|
150 |
|
---|
151 | So, what's the big problem with that? It looks right, doesn't it?
|
---|
152 | After all, I just told you that you need an array of references, so by
|
---|
153 | golly, you've made me one!
|
---|
154 |
|
---|
155 | Unfortunately, while this is true, it's still broken. All the references
|
---|
156 | in @AoA refer to the I<very same place>, and they will therefore all hold
|
---|
157 | whatever was last in @array! It's similar to the problem demonstrated in
|
---|
158 | the following C program:
|
---|
159 |
|
---|
160 | #include <pwd.h>
|
---|
161 | main() {
|
---|
162 | struct passwd *getpwnam(), *rp, *dp;
|
---|
163 | rp = getpwnam("root");
|
---|
164 | dp = getpwnam("daemon");
|
---|
165 |
|
---|
166 | printf("daemon name is %s\nroot name is %s\n",
|
---|
167 | dp->pw_name, rp->pw_name);
|
---|
168 | }
|
---|
169 |
|
---|
170 | Which will print
|
---|
171 |
|
---|
172 | daemon name is daemon
|
---|
173 | root name is daemon
|
---|
174 |
|
---|
175 | The problem is that both C<rp> and C<dp> are pointers to the same location
|
---|
176 | in memory! In C, you'd have to remember to malloc() yourself some new
|
---|
177 | memory. In Perl, you'll want to use the array constructor C<[]> or the
|
---|
178 | hash constructor C<{}> instead. Here's the right way to do the preceding
|
---|
179 | broken code fragments:
|
---|
180 | X<[]> X<{}>
|
---|
181 |
|
---|
182 | for $i (1..10) {
|
---|
183 | @array = somefunc($i);
|
---|
184 | $AoA[$i] = [ @array ];
|
---|
185 | }
|
---|
186 |
|
---|
187 | The square brackets make a reference to a new array with a I<copy>
|
---|
188 | of what's in @array at the time of the assignment. This is what
|
---|
189 | you want.
|
---|
190 |
|
---|
191 | Note that this will produce something similar, but it's
|
---|
192 | much harder to read:
|
---|
193 |
|
---|
194 | for $i (1..10) {
|
---|
195 | @array = 0 .. $i;
|
---|
196 | @{$AoA[$i]} = @array;
|
---|
197 | }
|
---|
198 |
|
---|
199 | Is it the same? Well, maybe so--and maybe not. The subtle difference
|
---|
200 | is that when you assign something in square brackets, you know for sure
|
---|
201 | it's always a brand new reference with a new I<copy> of the data.
|
---|
202 | Something else could be going on in this new case with the C<@{$AoA[$i]}}>
|
---|
203 | dereference on the left-hand-side of the assignment. It all depends on
|
---|
204 | whether C<$AoA[$i]> had been undefined to start with, or whether it
|
---|
205 | already contained a reference. If you had already populated @AoA with
|
---|
206 | references, as in
|
---|
207 |
|
---|
208 | $AoA[3] = \@another_array;
|
---|
209 |
|
---|
210 | Then the assignment with the indirection on the left-hand-side would
|
---|
211 | use the existing reference that was already there:
|
---|
212 |
|
---|
213 | @{$AoA[3]} = @array;
|
---|
214 |
|
---|
215 | Of course, this I<would> have the "interesting" effect of clobbering
|
---|
216 | @another_array. (Have you ever noticed how when a programmer says
|
---|
217 | something is "interesting", that rather than meaning "intriguing",
|
---|
218 | they're disturbingly more apt to mean that it's "annoying",
|
---|
219 | "difficult", or both? :-)
|
---|
220 |
|
---|
221 | So just remember always to use the array or hash constructors with C<[]>
|
---|
222 | or C<{}>, and you'll be fine, although it's not always optimally
|
---|
223 | efficient.
|
---|
224 |
|
---|
225 | Surprisingly, the following dangerous-looking construct will
|
---|
226 | actually work out fine:
|
---|
227 |
|
---|
228 | for $i (1..10) {
|
---|
229 | my @array = somefunc($i);
|
---|
230 | $AoA[$i] = \@array;
|
---|
231 | }
|
---|
232 |
|
---|
233 | That's because my() is more of a run-time statement than it is a
|
---|
234 | compile-time declaration I<per se>. This means that the my() variable is
|
---|
235 | remade afresh each time through the loop. So even though it I<looks> as
|
---|
236 | though you stored the same variable reference each time, you actually did
|
---|
237 | not! This is a subtle distinction that can produce more efficient code at
|
---|
238 | the risk of misleading all but the most experienced of programmers. So I
|
---|
239 | usually advise against teaching it to beginners. In fact, except for
|
---|
240 | passing arguments to functions, I seldom like to see the gimme-a-reference
|
---|
241 | operator (backslash) used much at all in code. Instead, I advise
|
---|
242 | beginners that they (and most of the rest of us) should try to use the
|
---|
243 | much more easily understood constructors C<[]> and C<{}> instead of
|
---|
244 | relying upon lexical (or dynamic) scoping and hidden reference-counting to
|
---|
245 | do the right thing behind the scenes.
|
---|
246 |
|
---|
247 | In summary:
|
---|
248 |
|
---|
249 | $AoA[$i] = [ @array ]; # usually best
|
---|
250 | $AoA[$i] = \@array; # perilous; just how my() was that array?
|
---|
251 | @{ $AoA[$i] } = @array; # way too tricky for most programmers
|
---|
252 |
|
---|
253 |
|
---|
254 | =head1 CAVEAT ON PRECEDENCE
|
---|
255 | X<dereference, precedence> X<dereferencing, precedence>
|
---|
256 |
|
---|
257 | Speaking of things like C<@{$AoA[$i]}>, the following are actually the
|
---|
258 | same thing:
|
---|
259 | X<< -> >>
|
---|
260 |
|
---|
261 | $aref->[2][2] # clear
|
---|
262 | $$aref[2][2] # confusing
|
---|
263 |
|
---|
264 | That's because Perl's precedence rules on its five prefix dereferencers
|
---|
265 | (which look like someone swearing: C<$ @ * % &>) make them bind more
|
---|
266 | tightly than the postfix subscripting brackets or braces! This will no
|
---|
267 | doubt come as a great shock to the C or C++ programmer, who is quite
|
---|
268 | accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
|
---|
269 | element of C<a>. That is, they first take the subscript, and only then
|
---|
270 | dereference the thing at that subscript. That's fine in C, but this isn't C.
|
---|
271 |
|
---|
272 | The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
|
---|
273 | the deref of $aref, making it take $aref as a reference to an
|
---|
274 | array, and then dereference that, and finally tell you the I<i'th> value
|
---|
275 | of the array pointed to by $AoA. If you wanted the C notion, you'd have to
|
---|
276 | write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
|
---|
277 | before the leading C<$> dereferencer.
|
---|
278 |
|
---|
279 | =head1 WHY YOU SHOULD ALWAYS C<use strict>
|
---|
280 |
|
---|
281 | If this is starting to sound scarier than it's worth, relax. Perl has
|
---|
282 | some features to help you avoid its most common pitfalls. The best
|
---|
283 | way to avoid getting confused is to start every program like this:
|
---|
284 |
|
---|
285 | #!/usr/bin/perl -w
|
---|
286 | use strict;
|
---|
287 |
|
---|
288 | This way, you'll be forced to declare all your variables with my() and
|
---|
289 | also disallow accidental "symbolic dereferencing". Therefore if you'd done
|
---|
290 | this:
|
---|
291 |
|
---|
292 | my $aref = [
|
---|
293 | [ "fred", "barney", "pebbles", "bambam", "dino", ],
|
---|
294 | [ "homer", "bart", "marge", "maggie", ],
|
---|
295 | [ "george", "jane", "elroy", "judy", ],
|
---|
296 | ];
|
---|
297 |
|
---|
298 | print $aref[2][2];
|
---|
299 |
|
---|
300 | The compiler would immediately flag that as an error I<at compile time>,
|
---|
301 | because you were accidentally accessing C<@aref>, an undeclared
|
---|
302 | variable, and it would thereby remind you to write instead:
|
---|
303 |
|
---|
304 | print $aref->[2][2]
|
---|
305 |
|
---|
306 | =head1 DEBUGGING
|
---|
307 | X<data structure, debugging> X<complex data structure, debugging>
|
---|
308 | X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging>
|
---|
309 | X<array of arrays, debugging> X<hash of arrays, debugging>
|
---|
310 | X<array of hashes, debugging> X<hash of hashes, debugging>
|
---|
311 |
|
---|
312 | Before version 5.002, the standard Perl debugger didn't do a very nice job of
|
---|
313 | printing out complex data structures. With 5.002 or above, the
|
---|
314 | debugger includes several new features, including command line editing as
|
---|
315 | well as the C<x> command to dump out complex data structures. For
|
---|
316 | example, given the assignment to $AoA above, here's the debugger output:
|
---|
317 |
|
---|
318 | DB<1> x $AoA
|
---|
319 | $AoA = ARRAY(0x13b5a0)
|
---|
320 | 0 ARRAY(0x1f0a24)
|
---|
321 | 0 'fred'
|
---|
322 | 1 'barney'
|
---|
323 | 2 'pebbles'
|
---|
324 | 3 'bambam'
|
---|
325 | 4 'dino'
|
---|
326 | 1 ARRAY(0x13b558)
|
---|
327 | 0 'homer'
|
---|
328 | 1 'bart'
|
---|
329 | 2 'marge'
|
---|
330 | 3 'maggie'
|
---|
331 | 2 ARRAY(0x13b540)
|
---|
332 | 0 'george'
|
---|
333 | 1 'jane'
|
---|
334 | 2 'elroy'
|
---|
335 | 3 'judy'
|
---|
336 |
|
---|
337 | =head1 CODE EXAMPLES
|
---|
338 |
|
---|
339 | Presented with little comment (these will get their own manpages someday)
|
---|
340 | here are short code examples illustrating access of various
|
---|
341 | types of data structures.
|
---|
342 |
|
---|
343 | =head1 ARRAYS OF ARRAYS
|
---|
344 | X<array of arrays> X<AoA>
|
---|
345 |
|
---|
346 | =head2 Declaration of an ARRAY OF ARRAYS
|
---|
347 |
|
---|
348 | @AoA = (
|
---|
349 | [ "fred", "barney" ],
|
---|
350 | [ "george", "jane", "elroy" ],
|
---|
351 | [ "homer", "marge", "bart" ],
|
---|
352 | );
|
---|
353 |
|
---|
354 | =head2 Generation of an ARRAY OF ARRAYS
|
---|
355 |
|
---|
356 | # reading from file
|
---|
357 | while ( <> ) {
|
---|
358 | push @AoA, [ split ];
|
---|
359 | }
|
---|
360 |
|
---|
361 | # calling a function
|
---|
362 | for $i ( 1 .. 10 ) {
|
---|
363 | $AoA[$i] = [ somefunc($i) ];
|
---|
364 | }
|
---|
365 |
|
---|
366 | # using temp vars
|
---|
367 | for $i ( 1 .. 10 ) {
|
---|
368 | @tmp = somefunc($i);
|
---|
369 | $AoA[$i] = [ @tmp ];
|
---|
370 | }
|
---|
371 |
|
---|
372 | # add to an existing row
|
---|
373 | push @{ $AoA[0] }, "wilma", "betty";
|
---|
374 |
|
---|
375 | =head2 Access and Printing of an ARRAY OF ARRAYS
|
---|
376 |
|
---|
377 | # one element
|
---|
378 | $AoA[0][0] = "Fred";
|
---|
379 |
|
---|
380 | # another element
|
---|
381 | $AoA[1][1] =~ s/(\w)/\u$1/;
|
---|
382 |
|
---|
383 | # print the whole thing with refs
|
---|
384 | for $aref ( @AoA ) {
|
---|
385 | print "\t [ @$aref ],\n";
|
---|
386 | }
|
---|
387 |
|
---|
388 | # print the whole thing with indices
|
---|
389 | for $i ( 0 .. $#AoA ) {
|
---|
390 | print "\t [ @{$AoA[$i]} ],\n";
|
---|
391 | }
|
---|
392 |
|
---|
393 | # print the whole thing one at a time
|
---|
394 | for $i ( 0 .. $#AoA ) {
|
---|
395 | for $j ( 0 .. $#{ $AoA[$i] } ) {
|
---|
396 | print "elt $i $j is $AoA[$i][$j]\n";
|
---|
397 | }
|
---|
398 | }
|
---|
399 |
|
---|
400 | =head1 HASHES OF ARRAYS
|
---|
401 | X<hash of arrays> X<HoA>
|
---|
402 |
|
---|
403 | =head2 Declaration of a HASH OF ARRAYS
|
---|
404 |
|
---|
405 | %HoA = (
|
---|
406 | flintstones => [ "fred", "barney" ],
|
---|
407 | jetsons => [ "george", "jane", "elroy" ],
|
---|
408 | simpsons => [ "homer", "marge", "bart" ],
|
---|
409 | );
|
---|
410 |
|
---|
411 | =head2 Generation of a HASH OF ARRAYS
|
---|
412 |
|
---|
413 | # reading from file
|
---|
414 | # flintstones: fred barney wilma dino
|
---|
415 | while ( <> ) {
|
---|
416 | next unless s/^(.*?):\s*//;
|
---|
417 | $HoA{$1} = [ split ];
|
---|
418 | }
|
---|
419 |
|
---|
420 | # reading from file; more temps
|
---|
421 | # flintstones: fred barney wilma dino
|
---|
422 | while ( $line = <> ) {
|
---|
423 | ($who, $rest) = split /:\s*/, $line, 2;
|
---|
424 | @fields = split ' ', $rest;
|
---|
425 | $HoA{$who} = [ @fields ];
|
---|
426 | }
|
---|
427 |
|
---|
428 | # calling a function that returns a list
|
---|
429 | for $group ( "simpsons", "jetsons", "flintstones" ) {
|
---|
430 | $HoA{$group} = [ get_family($group) ];
|
---|
431 | }
|
---|
432 |
|
---|
433 | # likewise, but using temps
|
---|
434 | for $group ( "simpsons", "jetsons", "flintstones" ) {
|
---|
435 | @members = get_family($group);
|
---|
436 | $HoA{$group} = [ @members ];
|
---|
437 | }
|
---|
438 |
|
---|
439 | # append new members to an existing family
|
---|
440 | push @{ $HoA{"flintstones"} }, "wilma", "betty";
|
---|
441 |
|
---|
442 | =head2 Access and Printing of a HASH OF ARRAYS
|
---|
443 |
|
---|
444 | # one element
|
---|
445 | $HoA{flintstones}[0] = "Fred";
|
---|
446 |
|
---|
447 | # another element
|
---|
448 | $HoA{simpsons}[1] =~ s/(\w)/\u$1/;
|
---|
449 |
|
---|
450 | # print the whole thing
|
---|
451 | foreach $family ( keys %HoA ) {
|
---|
452 | print "$family: @{ $HoA{$family} }\n"
|
---|
453 | }
|
---|
454 |
|
---|
455 | # print the whole thing with indices
|
---|
456 | foreach $family ( keys %HoA ) {
|
---|
457 | print "family: ";
|
---|
458 | foreach $i ( 0 .. $#{ $HoA{$family} } ) {
|
---|
459 | print " $i = $HoA{$family}[$i]";
|
---|
460 | }
|
---|
461 | print "\n";
|
---|
462 | }
|
---|
463 |
|
---|
464 | # print the whole thing sorted by number of members
|
---|
465 | foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
|
---|
466 | print "$family: @{ $HoA{$family} }\n"
|
---|
467 | }
|
---|
468 |
|
---|
469 | # print the whole thing sorted by number of members and name
|
---|
470 | foreach $family ( sort {
|
---|
471 | @{$HoA{$b}} <=> @{$HoA{$a}}
|
---|
472 | ||
|
---|
473 | $a cmp $b
|
---|
474 | } keys %HoA )
|
---|
475 | {
|
---|
476 | print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
|
---|
477 | }
|
---|
478 |
|
---|
479 | =head1 ARRAYS OF HASHES
|
---|
480 | X<array of hashes> X<AoH>
|
---|
481 |
|
---|
482 | =head2 Declaration of an ARRAY OF HASHES
|
---|
483 |
|
---|
484 | @AoH = (
|
---|
485 | {
|
---|
486 | Lead => "fred",
|
---|
487 | Friend => "barney",
|
---|
488 | },
|
---|
489 | {
|
---|
490 | Lead => "george",
|
---|
491 | Wife => "jane",
|
---|
492 | Son => "elroy",
|
---|
493 | },
|
---|
494 | {
|
---|
495 | Lead => "homer",
|
---|
496 | Wife => "marge",
|
---|
497 | Son => "bart",
|
---|
498 | }
|
---|
499 | );
|
---|
500 |
|
---|
501 | =head2 Generation of an ARRAY OF HASHES
|
---|
502 |
|
---|
503 | # reading from file
|
---|
504 | # format: LEAD=fred FRIEND=barney
|
---|
505 | while ( <> ) {
|
---|
506 | $rec = {};
|
---|
507 | for $field ( split ) {
|
---|
508 | ($key, $value) = split /=/, $field;
|
---|
509 | $rec->{$key} = $value;
|
---|
510 | }
|
---|
511 | push @AoH, $rec;
|
---|
512 | }
|
---|
513 |
|
---|
514 |
|
---|
515 | # reading from file
|
---|
516 | # format: LEAD=fred FRIEND=barney
|
---|
517 | # no temp
|
---|
518 | while ( <> ) {
|
---|
519 | push @AoH, { split /[\s+=]/ };
|
---|
520 | }
|
---|
521 |
|
---|
522 | # calling a function that returns a key/value pair list, like
|
---|
523 | # "lead","fred","daughter","pebbles"
|
---|
524 | while ( %fields = getnextpairset() ) {
|
---|
525 | push @AoH, { %fields };
|
---|
526 | }
|
---|
527 |
|
---|
528 | # likewise, but using no temp vars
|
---|
529 | while (<>) {
|
---|
530 | push @AoH, { parsepairs($_) };
|
---|
531 | }
|
---|
532 |
|
---|
533 | # add key/value to an element
|
---|
534 | $AoH[0]{pet} = "dino";
|
---|
535 | $AoH[2]{pet} = "santa's little helper";
|
---|
536 |
|
---|
537 | =head2 Access and Printing of an ARRAY OF HASHES
|
---|
538 |
|
---|
539 | # one element
|
---|
540 | $AoH[0]{lead} = "fred";
|
---|
541 |
|
---|
542 | # another element
|
---|
543 | $AoH[1]{lead} =~ s/(\w)/\u$1/;
|
---|
544 |
|
---|
545 | # print the whole thing with refs
|
---|
546 | for $href ( @AoH ) {
|
---|
547 | print "{ ";
|
---|
548 | for $role ( keys %$href ) {
|
---|
549 | print "$role=$href->{$role} ";
|
---|
550 | }
|
---|
551 | print "}\n";
|
---|
552 | }
|
---|
553 |
|
---|
554 | # print the whole thing with indices
|
---|
555 | for $i ( 0 .. $#AoH ) {
|
---|
556 | print "$i is { ";
|
---|
557 | for $role ( keys %{ $AoH[$i] } ) {
|
---|
558 | print "$role=$AoH[$i]{$role} ";
|
---|
559 | }
|
---|
560 | print "}\n";
|
---|
561 | }
|
---|
562 |
|
---|
563 | # print the whole thing one at a time
|
---|
564 | for $i ( 0 .. $#AoH ) {
|
---|
565 | for $role ( keys %{ $AoH[$i] } ) {
|
---|
566 | print "elt $i $role is $AoH[$i]{$role}\n";
|
---|
567 | }
|
---|
568 | }
|
---|
569 |
|
---|
570 | =head1 HASHES OF HASHES
|
---|
571 | X<hass of hashes> X<HoH>
|
---|
572 |
|
---|
573 | =head2 Declaration of a HASH OF HASHES
|
---|
574 |
|
---|
575 | %HoH = (
|
---|
576 | flintstones => {
|
---|
577 | lead => "fred",
|
---|
578 | pal => "barney",
|
---|
579 | },
|
---|
580 | jetsons => {
|
---|
581 | lead => "george",
|
---|
582 | wife => "jane",
|
---|
583 | "his boy" => "elroy",
|
---|
584 | },
|
---|
585 | simpsons => {
|
---|
586 | lead => "homer",
|
---|
587 | wife => "marge",
|
---|
588 | kid => "bart",
|
---|
589 | },
|
---|
590 | );
|
---|
591 |
|
---|
592 | =head2 Generation of a HASH OF HASHES
|
---|
593 |
|
---|
594 | # reading from file
|
---|
595 | # flintstones: lead=fred pal=barney wife=wilma pet=dino
|
---|
596 | while ( <> ) {
|
---|
597 | next unless s/^(.*?):\s*//;
|
---|
598 | $who = $1;
|
---|
599 | for $field ( split ) {
|
---|
600 | ($key, $value) = split /=/, $field;
|
---|
601 | $HoH{$who}{$key} = $value;
|
---|
602 | }
|
---|
603 |
|
---|
604 |
|
---|
605 | # reading from file; more temps
|
---|
606 | while ( <> ) {
|
---|
607 | next unless s/^(.*?):\s*//;
|
---|
608 | $who = $1;
|
---|
609 | $rec = {};
|
---|
610 | $HoH{$who} = $rec;
|
---|
611 | for $field ( split ) {
|
---|
612 | ($key, $value) = split /=/, $field;
|
---|
613 | $rec->{$key} = $value;
|
---|
614 | }
|
---|
615 | }
|
---|
616 |
|
---|
617 | # calling a function that returns a key,value hash
|
---|
618 | for $group ( "simpsons", "jetsons", "flintstones" ) {
|
---|
619 | $HoH{$group} = { get_family($group) };
|
---|
620 | }
|
---|
621 |
|
---|
622 | # likewise, but using temps
|
---|
623 | for $group ( "simpsons", "jetsons", "flintstones" ) {
|
---|
624 | %members = get_family($group);
|
---|
625 | $HoH{$group} = { %members };
|
---|
626 | }
|
---|
627 |
|
---|
628 | # append new members to an existing family
|
---|
629 | %new_folks = (
|
---|
630 | wife => "wilma",
|
---|
631 | pet => "dino",
|
---|
632 | );
|
---|
633 |
|
---|
634 | for $what (keys %new_folks) {
|
---|
635 | $HoH{flintstones}{$what} = $new_folks{$what};
|
---|
636 | }
|
---|
637 |
|
---|
638 | =head2 Access and Printing of a HASH OF HASHES
|
---|
639 |
|
---|
640 | # one element
|
---|
641 | $HoH{flintstones}{wife} = "wilma";
|
---|
642 |
|
---|
643 | # another element
|
---|
644 | $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
|
---|
645 |
|
---|
646 | # print the whole thing
|
---|
647 | foreach $family ( keys %HoH ) {
|
---|
648 | print "$family: { ";
|
---|
649 | for $role ( keys %{ $HoH{$family} } ) {
|
---|
650 | print "$role=$HoH{$family}{$role} ";
|
---|
651 | }
|
---|
652 | print "}\n";
|
---|
653 | }
|
---|
654 |
|
---|
655 | # print the whole thing somewhat sorted
|
---|
656 | foreach $family ( sort keys %HoH ) {
|
---|
657 | print "$family: { ";
|
---|
658 | for $role ( sort keys %{ $HoH{$family} } ) {
|
---|
659 | print "$role=$HoH{$family}{$role} ";
|
---|
660 | }
|
---|
661 | print "}\n";
|
---|
662 | }
|
---|
663 |
|
---|
664 |
|
---|
665 | # print the whole thing sorted by number of members
|
---|
666 | foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
|
---|
667 | print "$family: { ";
|
---|
668 | for $role ( sort keys %{ $HoH{$family} } ) {
|
---|
669 | print "$role=$HoH{$family}{$role} ";
|
---|
670 | }
|
---|
671 | print "}\n";
|
---|
672 | }
|
---|
673 |
|
---|
674 | # establish a sort order (rank) for each role
|
---|
675 | $i = 0;
|
---|
676 | for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
|
---|
677 |
|
---|
678 | # now print the whole thing sorted by number of members
|
---|
679 | foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
|
---|
680 | print "$family: { ";
|
---|
681 | # and print these according to rank order
|
---|
682 | for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
|
---|
683 | print "$role=$HoH{$family}{$role} ";
|
---|
684 | }
|
---|
685 | print "}\n";
|
---|
686 | }
|
---|
687 |
|
---|
688 |
|
---|
689 | =head1 MORE ELABORATE RECORDS
|
---|
690 | X<record> X<structure> X<struct>
|
---|
691 |
|
---|
692 | =head2 Declaration of MORE ELABORATE RECORDS
|
---|
693 |
|
---|
694 | Here's a sample showing how to create and use a record whose fields are of
|
---|
695 | many different sorts:
|
---|
696 |
|
---|
697 | $rec = {
|
---|
698 | TEXT => $string,
|
---|
699 | SEQUENCE => [ @old_values ],
|
---|
700 | LOOKUP => { %some_table },
|
---|
701 | THATCODE => \&some_function,
|
---|
702 | THISCODE => sub { $_[0] ** $_[1] },
|
---|
703 | HANDLE => \*STDOUT,
|
---|
704 | };
|
---|
705 |
|
---|
706 | print $rec->{TEXT};
|
---|
707 |
|
---|
708 | print $rec->{SEQUENCE}[0];
|
---|
709 | $last = pop @ { $rec->{SEQUENCE} };
|
---|
710 |
|
---|
711 | print $rec->{LOOKUP}{"key"};
|
---|
712 | ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
|
---|
713 |
|
---|
714 | $answer = $rec->{THATCODE}->($arg);
|
---|
715 | $answer = $rec->{THISCODE}->($arg1, $arg2);
|
---|
716 |
|
---|
717 | # careful of extra block braces on fh ref
|
---|
718 | print { $rec->{HANDLE} } "a string\n";
|
---|
719 |
|
---|
720 | use FileHandle;
|
---|
721 | $rec->{HANDLE}->autoflush(1);
|
---|
722 | $rec->{HANDLE}->print(" a string\n");
|
---|
723 |
|
---|
724 | =head2 Declaration of a HASH OF COMPLEX RECORDS
|
---|
725 |
|
---|
726 | %TV = (
|
---|
727 | flintstones => {
|
---|
728 | series => "flintstones",
|
---|
729 | nights => [ qw(monday thursday friday) ],
|
---|
730 | members => [
|
---|
731 | { name => "fred", role => "lead", age => 36, },
|
---|
732 | { name => "wilma", role => "wife", age => 31, },
|
---|
733 | { name => "pebbles", role => "kid", age => 4, },
|
---|
734 | ],
|
---|
735 | },
|
---|
736 |
|
---|
737 | jetsons => {
|
---|
738 | series => "jetsons",
|
---|
739 | nights => [ qw(wednesday saturday) ],
|
---|
740 | members => [
|
---|
741 | { name => "george", role => "lead", age => 41, },
|
---|
742 | { name => "jane", role => "wife", age => 39, },
|
---|
743 | { name => "elroy", role => "kid", age => 9, },
|
---|
744 | ],
|
---|
745 | },
|
---|
746 |
|
---|
747 | simpsons => {
|
---|
748 | series => "simpsons",
|
---|
749 | nights => [ qw(monday) ],
|
---|
750 | members => [
|
---|
751 | { name => "homer", role => "lead", age => 34, },
|
---|
752 | { name => "marge", role => "wife", age => 37, },
|
---|
753 | { name => "bart", role => "kid", age => 11, },
|
---|
754 | ],
|
---|
755 | },
|
---|
756 | );
|
---|
757 |
|
---|
758 | =head2 Generation of a HASH OF COMPLEX RECORDS
|
---|
759 |
|
---|
760 | # reading from file
|
---|
761 | # this is most easily done by having the file itself be
|
---|
762 | # in the raw data format as shown above. perl is happy
|
---|
763 | # to parse complex data structures if declared as data, so
|
---|
764 | # sometimes it's easiest to do that
|
---|
765 |
|
---|
766 | # here's a piece by piece build up
|
---|
767 | $rec = {};
|
---|
768 | $rec->{series} = "flintstones";
|
---|
769 | $rec->{nights} = [ find_days() ];
|
---|
770 |
|
---|
771 | @members = ();
|
---|
772 | # assume this file in field=value syntax
|
---|
773 | while (<>) {
|
---|
774 | %fields = split /[\s=]+/;
|
---|
775 | push @members, { %fields };
|
---|
776 | }
|
---|
777 | $rec->{members} = [ @members ];
|
---|
778 |
|
---|
779 | # now remember the whole thing
|
---|
780 | $TV{ $rec->{series} } = $rec;
|
---|
781 |
|
---|
782 | ###########################################################
|
---|
783 | # now, you might want to make interesting extra fields that
|
---|
784 | # include pointers back into the same data structure so if
|
---|
785 | # change one piece, it changes everywhere, like for example
|
---|
786 | # if you wanted a {kids} field that was a reference
|
---|
787 | # to an array of the kids' records without having duplicate
|
---|
788 | # records and thus update problems.
|
---|
789 | ###########################################################
|
---|
790 | foreach $family (keys %TV) {
|
---|
791 | $rec = $TV{$family}; # temp pointer
|
---|
792 | @kids = ();
|
---|
793 | for $person ( @{ $rec->{members} } ) {
|
---|
794 | if ($person->{role} =~ /kid|son|daughter/) {
|
---|
795 | push @kids, $person;
|
---|
796 | }
|
---|
797 | }
|
---|
798 | # REMEMBER: $rec and $TV{$family} point to same data!!
|
---|
799 | $rec->{kids} = [ @kids ];
|
---|
800 | }
|
---|
801 |
|
---|
802 | # you copied the array, but the array itself contains pointers
|
---|
803 | # to uncopied objects. this means that if you make bart get
|
---|
804 | # older via
|
---|
805 |
|
---|
806 | $TV{simpsons}{kids}[0]{age}++;
|
---|
807 |
|
---|
808 | # then this would also change in
|
---|
809 | print $TV{simpsons}{members}[2]{age};
|
---|
810 |
|
---|
811 | # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
|
---|
812 | # both point to the same underlying anonymous hash table
|
---|
813 |
|
---|
814 | # print the whole thing
|
---|
815 | foreach $family ( keys %TV ) {
|
---|
816 | print "the $family";
|
---|
817 | print " is on during @{ $TV{$family}{nights} }\n";
|
---|
818 | print "its members are:\n";
|
---|
819 | for $who ( @{ $TV{$family}{members} } ) {
|
---|
820 | print " $who->{name} ($who->{role}), age $who->{age}\n";
|
---|
821 | }
|
---|
822 | print "it turns out that $TV{$family}{lead} has ";
|
---|
823 | print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
|
---|
824 | print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
|
---|
825 | print "\n";
|
---|
826 | }
|
---|
827 |
|
---|
828 | =head1 Database Ties
|
---|
829 |
|
---|
830 | You cannot easily tie a multilevel data structure (such as a hash of
|
---|
831 | hashes) to a dbm file. The first problem is that all but GDBM and
|
---|
832 | Berkeley DB have size limitations, but beyond that, you also have problems
|
---|
833 | with how references are to be represented on disk. One experimental
|
---|
834 | module that does partially attempt to address this need is the MLDBM
|
---|
835 | module. Check your nearest CPAN site as described in L<perlmodlib> for
|
---|
836 | source code to MLDBM.
|
---|
837 |
|
---|
838 | =head1 SEE ALSO
|
---|
839 |
|
---|
840 | perlref(1), perllol(1), perldata(1), perlobj(1)
|
---|
841 |
|
---|
842 | =head1 AUTHOR
|
---|
843 |
|
---|
844 | Tom Christiansen <F<[email protected]>>
|
---|
845 |
|
---|
846 | Last update:
|
---|
847 | Wed Oct 23 04:57:50 MET DST 1996
|
---|