1 | =head1 NAME
|
---|
2 |
|
---|
3 | perlguts - Introduction to the Perl API
|
---|
4 |
|
---|
5 | =head1 DESCRIPTION
|
---|
6 |
|
---|
7 | This document attempts to describe how to use the Perl API, as well as
|
---|
8 | to provide some info on the basic workings of the Perl core. It is far
|
---|
9 | from complete and probably contains many errors. Please refer any
|
---|
10 | questions or comments to the author below.
|
---|
11 |
|
---|
12 | =head1 Variables
|
---|
13 |
|
---|
14 | =head2 Datatypes
|
---|
15 |
|
---|
16 | Perl has three typedefs that handle Perl's three main data types:
|
---|
17 |
|
---|
18 | SV Scalar Value
|
---|
19 | AV Array Value
|
---|
20 | HV Hash Value
|
---|
21 |
|
---|
22 | Each typedef has specific routines that manipulate the various data types.
|
---|
23 |
|
---|
24 | =head2 What is an "IV"?
|
---|
25 |
|
---|
26 | Perl uses a special typedef IV which is a simple signed integer type that is
|
---|
27 | guaranteed to be large enough to hold a pointer (as well as an integer).
|
---|
28 | Additionally, there is the UV, which is simply an unsigned IV.
|
---|
29 |
|
---|
30 | Perl also uses two special typedefs, I32 and I16, which will always be at
|
---|
31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
|
---|
32 | as well.) They will usually be exactly 32 and 16 bits long, but on Crays
|
---|
33 | they will both be 64 bits.
|
---|
34 |
|
---|
35 | =head2 Working with SVs
|
---|
36 |
|
---|
37 | An SV can be created and loaded with one command. There are five types of
|
---|
38 | values that can be loaded: an integer value (IV), an unsigned integer
|
---|
39 | value (UV), a double (NV), a string (PV), and another scalar (SV).
|
---|
40 |
|
---|
41 | The seven routines are:
|
---|
42 |
|
---|
43 | SV* newSViv(IV);
|
---|
44 | SV* newSVuv(UV);
|
---|
45 | SV* newSVnv(double);
|
---|
46 | SV* newSVpv(const char*, STRLEN);
|
---|
47 | SV* newSVpvn(const char*, STRLEN);
|
---|
48 | SV* newSVpvf(const char*, ...);
|
---|
49 | SV* newSVsv(SV*);
|
---|
50 |
|
---|
51 | C<STRLEN> is an integer type (Size_t, usually defined as size_t in
|
---|
52 | F<config.h>) guaranteed to be large enough to represent the size of
|
---|
53 | any string that perl can handle.
|
---|
54 |
|
---|
55 | In the unlikely case of a SV requiring more complex initialisation, you
|
---|
56 | can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
|
---|
57 | type NULL is returned, else an SV of type PV is returned with len + 1 (for
|
---|
58 | the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
|
---|
59 | the SV has value undef.
|
---|
60 |
|
---|
61 | SV *sv = newSV(0); /* no storage allocated */
|
---|
62 | SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
|
---|
63 |
|
---|
64 | To change the value of an I<already-existing> SV, there are eight routines:
|
---|
65 |
|
---|
66 | void sv_setiv(SV*, IV);
|
---|
67 | void sv_setuv(SV*, UV);
|
---|
68 | void sv_setnv(SV*, double);
|
---|
69 | void sv_setpv(SV*, const char*);
|
---|
70 | void sv_setpvn(SV*, const char*, STRLEN)
|
---|
71 | void sv_setpvf(SV*, const char*, ...);
|
---|
72 | void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
|
---|
73 | void sv_setsv(SV*, SV*);
|
---|
74 |
|
---|
75 | Notice that you can choose to specify the length of the string to be
|
---|
76 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
|
---|
77 | allow Perl to calculate the length by using C<sv_setpv> or by specifying
|
---|
78 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
|
---|
79 | determine the string's length by using C<strlen>, which depends on the
|
---|
80 | string terminating with a NUL character.
|
---|
81 |
|
---|
82 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
|
---|
83 | formatted output becomes the value.
|
---|
84 |
|
---|
85 | C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
|
---|
86 | either a pointer to a variable argument list or the address and length of
|
---|
87 | an array of SVs. The last argument points to a boolean; on return, if that
|
---|
88 | boolean is true, then locale-specific information has been used to format
|
---|
89 | the string, and the string's contents are therefore untrustworthy (see
|
---|
90 | L<perlsec>). This pointer may be NULL if that information is not
|
---|
91 | important. Note that this function requires you to specify the length of
|
---|
92 | the format.
|
---|
93 |
|
---|
94 | The C<sv_set*()> functions are not generic enough to operate on values
|
---|
95 | that have "magic". See L<Magic Virtual Tables> later in this document.
|
---|
96 |
|
---|
97 | All SVs that contain strings should be terminated with a NUL character.
|
---|
98 | If it is not NUL-terminated there is a risk of
|
---|
99 | core dumps and corruptions from code which passes the string to C
|
---|
100 | functions or system calls which expect a NUL-terminated string.
|
---|
101 | Perl's own functions typically add a trailing NUL for this reason.
|
---|
102 | Nevertheless, you should be very careful when you pass a string stored
|
---|
103 | in an SV to a C function or system call.
|
---|
104 |
|
---|
105 | To access the actual value that an SV points to, you can use the macros:
|
---|
106 |
|
---|
107 | SvIV(SV*)
|
---|
108 | SvUV(SV*)
|
---|
109 | SvNV(SV*)
|
---|
110 | SvPV(SV*, STRLEN len)
|
---|
111 | SvPV_nolen(SV*)
|
---|
112 |
|
---|
113 | which will automatically coerce the actual scalar type into an IV, UV, double,
|
---|
114 | or string.
|
---|
115 |
|
---|
116 | In the C<SvPV> macro, the length of the string returned is placed into the
|
---|
117 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
|
---|
118 | not care what the length of the data is, use the C<SvPV_nolen> macro.
|
---|
119 | Historically the C<SvPV> macro with the global variable C<PL_na> has been
|
---|
120 | used in this case. But that can be quite inefficient because C<PL_na> must
|
---|
121 | be accessed in thread-local storage in threaded Perl. In any case, remember
|
---|
122 | that Perl allows arbitrary strings of data that may both contain NULs and
|
---|
123 | might not be terminated by a NUL.
|
---|
124 |
|
---|
125 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
|
---|
126 | len);>. It might work with your compiler, but it won't work for everyone.
|
---|
127 | Break this sort of statement up into separate assignments:
|
---|
128 |
|
---|
129 | SV *s;
|
---|
130 | STRLEN len;
|
---|
131 | char * ptr;
|
---|
132 | ptr = SvPV(s, len);
|
---|
133 | foo(ptr, len);
|
---|
134 |
|
---|
135 | If you want to know if the scalar value is TRUE, you can use:
|
---|
136 |
|
---|
137 | SvTRUE(SV*)
|
---|
138 |
|
---|
139 | Although Perl will automatically grow strings for you, if you need to force
|
---|
140 | Perl to allocate more memory for your SV, you can use the macro
|
---|
141 |
|
---|
142 | SvGROW(SV*, STRLEN newlen)
|
---|
143 |
|
---|
144 | which will determine if more memory needs to be allocated. If so, it will
|
---|
145 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
|
---|
146 | decrease, the allocated memory of an SV and that it does not automatically
|
---|
147 | add a byte for the a trailing NUL (perl's own string functions typically do
|
---|
148 | C<SvGROW(sv, len + 1)>).
|
---|
149 |
|
---|
150 | If you have an SV and want to know what kind of data Perl thinks is stored
|
---|
151 | in it, you can use the following macros to check the type of SV you have.
|
---|
152 |
|
---|
153 | SvIOK(SV*)
|
---|
154 | SvNOK(SV*)
|
---|
155 | SvPOK(SV*)
|
---|
156 |
|
---|
157 | You can get and set the current length of the string stored in an SV with
|
---|
158 | the following macros:
|
---|
159 |
|
---|
160 | SvCUR(SV*)
|
---|
161 | SvCUR_set(SV*, I32 val)
|
---|
162 |
|
---|
163 | You can also get a pointer to the end of the string stored in the SV
|
---|
164 | with the macro:
|
---|
165 |
|
---|
166 | SvEND(SV*)
|
---|
167 |
|
---|
168 | But note that these last three macros are valid only if C<SvPOK()> is true.
|
---|
169 |
|
---|
170 | If you want to append something to the end of string stored in an C<SV*>,
|
---|
171 | you can use the following functions:
|
---|
172 |
|
---|
173 | void sv_catpv(SV*, const char*);
|
---|
174 | void sv_catpvn(SV*, const char*, STRLEN);
|
---|
175 | void sv_catpvf(SV*, const char*, ...);
|
---|
176 | void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
|
---|
177 | void sv_catsv(SV*, SV*);
|
---|
178 |
|
---|
179 | The first function calculates the length of the string to be appended by
|
---|
180 | using C<strlen>. In the second, you specify the length of the string
|
---|
181 | yourself. The third function processes its arguments like C<sprintf> and
|
---|
182 | appends the formatted output. The fourth function works like C<vsprintf>.
|
---|
183 | You can specify the address and length of an array of SVs instead of the
|
---|
184 | va_list argument. The fifth function extends the string stored in the first
|
---|
185 | SV with the string stored in the second SV. It also forces the second SV
|
---|
186 | to be interpreted as a string.
|
---|
187 |
|
---|
188 | The C<sv_cat*()> functions are not generic enough to operate on values that
|
---|
189 | have "magic". See L<Magic Virtual Tables> later in this document.
|
---|
190 |
|
---|
191 | If you know the name of a scalar variable, you can get a pointer to its SV
|
---|
192 | by using the following:
|
---|
193 |
|
---|
194 | SV* get_sv("package::varname", FALSE);
|
---|
195 |
|
---|
196 | This returns NULL if the variable does not exist.
|
---|
197 |
|
---|
198 | If you want to know if this variable (or any other SV) is actually C<defined>,
|
---|
199 | you can call:
|
---|
200 |
|
---|
201 | SvOK(SV*)
|
---|
202 |
|
---|
203 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
|
---|
204 |
|
---|
205 | Its address can be used whenever an C<SV*> is needed. Make sure that
|
---|
206 | you don't try to compare a random sv with C<&PL_sv_undef>. For example
|
---|
207 | when interfacing Perl code, it'll work correctly for:
|
---|
208 |
|
---|
209 | foo(undef);
|
---|
210 |
|
---|
211 | But won't work when called as:
|
---|
212 |
|
---|
213 | $x = undef;
|
---|
214 | foo($x);
|
---|
215 |
|
---|
216 | So to repeat always use SvOK() to check whether an sv is defined.
|
---|
217 |
|
---|
218 | Also you have to be careful when using C<&PL_sv_undef> as a value in
|
---|
219 | AVs or HVs (see L<AVs, HVs and undefined values>).
|
---|
220 |
|
---|
221 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
|
---|
222 | boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
|
---|
223 | addresses can be used whenever an C<SV*> is needed.
|
---|
224 |
|
---|
225 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
|
---|
226 | Take this code:
|
---|
227 |
|
---|
228 | SV* sv = (SV*) 0;
|
---|
229 | if (I-am-to-return-a-real-value) {
|
---|
230 | sv = sv_2mortal(newSViv(42));
|
---|
231 | }
|
---|
232 | sv_setsv(ST(0), sv);
|
---|
233 |
|
---|
234 | This code tries to return a new SV (which contains the value 42) if it should
|
---|
235 | return a real value, or undef otherwise. Instead it has returned a NULL
|
---|
236 | pointer which, somewhere down the line, will cause a segmentation violation,
|
---|
237 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
|
---|
238 | first line and all will be well.
|
---|
239 |
|
---|
240 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
|
---|
241 | call is not necessary (see L<Reference Counts and Mortality>).
|
---|
242 |
|
---|
243 | =head2 Offsets
|
---|
244 |
|
---|
245 | Perl provides the function C<sv_chop> to efficiently remove characters
|
---|
246 | from the beginning of a string; you give it an SV and a pointer to
|
---|
247 | somewhere inside the PV, and it discards everything before the
|
---|
248 | pointer. The efficiency comes by means of a little hack: instead of
|
---|
249 | actually removing the characters, C<sv_chop> sets the flag C<OOK>
|
---|
250 | (offset OK) to signal to other functions that the offset hack is in
|
---|
251 | effect, and it puts the number of bytes chopped off into the IV field
|
---|
252 | of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
|
---|
253 | many bytes, and adjusts C<SvCUR> and C<SvLEN>.
|
---|
254 |
|
---|
255 | Hence, at this point, the start of the buffer that we allocated lives
|
---|
256 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
|
---|
257 | into the middle of this allocated storage.
|
---|
258 |
|
---|
259 | This is best demonstrated by example:
|
---|
260 |
|
---|
261 | % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
|
---|
262 | SV = PVIV(0x8128450) at 0x81340f0
|
---|
263 | REFCNT = 1
|
---|
264 | FLAGS = (POK,OOK,pPOK)
|
---|
265 | IV = 1 (OFFSET)
|
---|
266 | PV = 0x8135781 ( "1" . ) "2345"\0
|
---|
267 | CUR = 4
|
---|
268 | LEN = 5
|
---|
269 |
|
---|
270 | Here the number of bytes chopped off (1) is put into IV, and
|
---|
271 | C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
|
---|
272 | portion of the string between the "real" and the "fake" beginnings is
|
---|
273 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
|
---|
274 | the fake beginning, not the real one.
|
---|
275 |
|
---|
276 | Something similar to the offset hack is performed on AVs to enable
|
---|
277 | efficient shifting and splicing off the beginning of the array; while
|
---|
278 | C<AvARRAY> points to the first element in the array that is visible from
|
---|
279 | Perl, C<AvALLOC> points to the real start of the C array. These are
|
---|
280 | usually the same, but a C<shift> operation can be carried out by
|
---|
281 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
|
---|
282 | Again, the location of the real start of the C array only comes into
|
---|
283 | play when freeing the array. See C<av_shift> in F<av.c>.
|
---|
284 |
|
---|
285 | =head2 What's Really Stored in an SV?
|
---|
286 |
|
---|
287 | Recall that the usual method of determining the type of scalar you have is
|
---|
288 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
|
---|
289 | usually these macros will always return TRUE and calling the C<Sv*V>
|
---|
290 | macros will do the appropriate conversion of string to integer/double or
|
---|
291 | integer/double to string.
|
---|
292 |
|
---|
293 | If you I<really> need to know if you have an integer, double, or string
|
---|
294 | pointer in an SV, you can use the following three macros instead:
|
---|
295 |
|
---|
296 | SvIOKp(SV*)
|
---|
297 | SvNOKp(SV*)
|
---|
298 | SvPOKp(SV*)
|
---|
299 |
|
---|
300 | These will tell you if you truly have an integer, double, or string pointer
|
---|
301 | stored in your SV. The "p" stands for private.
|
---|
302 |
|
---|
303 | The are various ways in which the private and public flags may differ.
|
---|
304 | For example, a tied SV may have a valid underlying value in the IV slot
|
---|
305 | (so SvIOKp is true), but the data should be accessed via the FETCH
|
---|
306 | routine rather than directly, so SvIOK is false. Another is when
|
---|
307 | numeric conversion has occurred and precision has been lost: only the
|
---|
308 | private flag is set on 'lossy' values. So when an NV is converted to an
|
---|
309 | IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
|
---|
310 |
|
---|
311 | In general, though, it's best to use the C<Sv*V> macros.
|
---|
312 |
|
---|
313 | =head2 Working with AVs
|
---|
314 |
|
---|
315 | There are two ways to create and load an AV. The first method creates an
|
---|
316 | empty AV:
|
---|
317 |
|
---|
318 | AV* newAV();
|
---|
319 |
|
---|
320 | The second method both creates the AV and initially populates it with SVs:
|
---|
321 |
|
---|
322 | AV* av_make(I32 num, SV **ptr);
|
---|
323 |
|
---|
324 | The second argument points to an array containing C<num> C<SV*>'s. Once the
|
---|
325 | AV has been created, the SVs can be destroyed, if so desired.
|
---|
326 |
|
---|
327 | Once the AV has been created, the following operations are possible on AVs:
|
---|
328 |
|
---|
329 | void av_push(AV*, SV*);
|
---|
330 | SV* av_pop(AV*);
|
---|
331 | SV* av_shift(AV*);
|
---|
332 | void av_unshift(AV*, I32 num);
|
---|
333 |
|
---|
334 | These should be familiar operations, with the exception of C<av_unshift>.
|
---|
335 | This routine adds C<num> elements at the front of the array with the C<undef>
|
---|
336 | value. You must then use C<av_store> (described below) to assign values
|
---|
337 | to these new elements.
|
---|
338 |
|
---|
339 | Here are some other functions:
|
---|
340 |
|
---|
341 | I32 av_len(AV*);
|
---|
342 | SV** av_fetch(AV*, I32 key, I32 lval);
|
---|
343 | SV** av_store(AV*, I32 key, SV* val);
|
---|
344 |
|
---|
345 | The C<av_len> function returns the highest index value in array (just
|
---|
346 | like $#array in Perl). If the array is empty, -1 is returned. The
|
---|
347 | C<av_fetch> function returns the value at index C<key>, but if C<lval>
|
---|
348 | is non-zero, then C<av_fetch> will store an undef value at that index.
|
---|
349 | The C<av_store> function stores the value C<val> at index C<key>, and does
|
---|
350 | not increment the reference count of C<val>. Thus the caller is responsible
|
---|
351 | for taking care of that, and if C<av_store> returns NULL, the caller will
|
---|
352 | have to decrement the reference count to avoid a memory leak. Note that
|
---|
353 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
|
---|
354 | return value.
|
---|
355 |
|
---|
356 | void av_clear(AV*);
|
---|
357 | void av_undef(AV*);
|
---|
358 | void av_extend(AV*, I32 key);
|
---|
359 |
|
---|
360 | The C<av_clear> function deletes all the elements in the AV* array, but
|
---|
361 | does not actually delete the array itself. The C<av_undef> function will
|
---|
362 | delete all the elements in the array plus the array itself. The
|
---|
363 | C<av_extend> function extends the array so that it contains at least C<key+1>
|
---|
364 | elements. If C<key+1> is less than the currently allocated length of the array,
|
---|
365 | then nothing is done.
|
---|
366 |
|
---|
367 | If you know the name of an array variable, you can get a pointer to its AV
|
---|
368 | by using the following:
|
---|
369 |
|
---|
370 | AV* get_av("package::varname", FALSE);
|
---|
371 |
|
---|
372 | This returns NULL if the variable does not exist.
|
---|
373 |
|
---|
374 | See L<Understanding the Magic of Tied Hashes and Arrays> for more
|
---|
375 | information on how to use the array access functions on tied arrays.
|
---|
376 |
|
---|
377 | =head2 Working with HVs
|
---|
378 |
|
---|
379 | To create an HV, you use the following routine:
|
---|
380 |
|
---|
381 | HV* newHV();
|
---|
382 |
|
---|
383 | Once the HV has been created, the following operations are possible on HVs:
|
---|
384 |
|
---|
385 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
|
---|
386 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
|
---|
387 |
|
---|
388 | The C<klen> parameter is the length of the key being passed in (Note that
|
---|
389 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
|
---|
390 | length of the key). The C<val> argument contains the SV pointer to the
|
---|
391 | scalar being stored, and C<hash> is the precomputed hash value (zero if
|
---|
392 | you want C<hv_store> to calculate it for you). The C<lval> parameter
|
---|
393 | indicates whether this fetch is actually a part of a store operation, in
|
---|
394 | which case a new undefined value will be added to the HV with the supplied
|
---|
395 | key and C<hv_fetch> will return as if the value had already existed.
|
---|
396 |
|
---|
397 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
|
---|
398 | C<SV*>. To access the scalar value, you must first dereference the return
|
---|
399 | value. However, you should check to make sure that the return value is
|
---|
400 | not NULL before dereferencing it.
|
---|
401 |
|
---|
402 | These two functions check if a hash table entry exists, and deletes it.
|
---|
403 |
|
---|
404 | bool hv_exists(HV*, const char* key, U32 klen);
|
---|
405 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
|
---|
406 |
|
---|
407 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
|
---|
408 | create and return a mortal copy of the deleted value.
|
---|
409 |
|
---|
410 | And more miscellaneous functions:
|
---|
411 |
|
---|
412 | void hv_clear(HV*);
|
---|
413 | void hv_undef(HV*);
|
---|
414 |
|
---|
415 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
|
---|
416 | table but does not actually delete the hash table. The C<hv_undef> deletes
|
---|
417 | both the entries and the hash table itself.
|
---|
418 |
|
---|
419 | Perl keeps the actual data in linked list of structures with a typedef of HE.
|
---|
420 | These contain the actual key and value pointers (plus extra administrative
|
---|
421 | overhead). The key is a string pointer; the value is an C<SV*>. However,
|
---|
422 | once you have an C<HE*>, to get the actual key and value, use the routines
|
---|
423 | specified below.
|
---|
424 |
|
---|
425 | I32 hv_iterinit(HV*);
|
---|
426 | /* Prepares starting point to traverse hash table */
|
---|
427 | HE* hv_iternext(HV*);
|
---|
428 | /* Get the next entry, and return a pointer to a
|
---|
429 | structure that has both the key and value */
|
---|
430 | char* hv_iterkey(HE* entry, I32* retlen);
|
---|
431 | /* Get the key from an HE structure and also return
|
---|
432 | the length of the key string */
|
---|
433 | SV* hv_iterval(HV*, HE* entry);
|
---|
434 | /* Return an SV pointer to the value of the HE
|
---|
435 | structure */
|
---|
436 | SV* hv_iternextsv(HV*, char** key, I32* retlen);
|
---|
437 | /* This convenience routine combines hv_iternext,
|
---|
438 | hv_iterkey, and hv_iterval. The key and retlen
|
---|
439 | arguments are return values for the key and its
|
---|
440 | length. The value is returned in the SV* argument */
|
---|
441 |
|
---|
442 | If you know the name of a hash variable, you can get a pointer to its HV
|
---|
443 | by using the following:
|
---|
444 |
|
---|
445 | HV* get_hv("package::varname", FALSE);
|
---|
446 |
|
---|
447 | This returns NULL if the variable does not exist.
|
---|
448 |
|
---|
449 | The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
|
---|
450 |
|
---|
451 | hash = 0;
|
---|
452 | while (klen--)
|
---|
453 | hash = (hash * 33) + *key++;
|
---|
454 | hash = hash + (hash >> 5); /* after 5.6 */
|
---|
455 |
|
---|
456 | The last step was added in version 5.6 to improve distribution of
|
---|
457 | lower bits in the resulting hash value.
|
---|
458 |
|
---|
459 | See L<Understanding the Magic of Tied Hashes and Arrays> for more
|
---|
460 | information on how to use the hash access functions on tied hashes.
|
---|
461 |
|
---|
462 | =head2 Hash API Extensions
|
---|
463 |
|
---|
464 | Beginning with version 5.004, the following functions are also supported:
|
---|
465 |
|
---|
466 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
|
---|
467 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
|
---|
468 |
|
---|
469 | bool hv_exists_ent (HV* tb, SV* key, U32 hash);
|
---|
470 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
|
---|
471 |
|
---|
472 | SV* hv_iterkeysv (HE* entry);
|
---|
473 |
|
---|
474 | Note that these functions take C<SV*> keys, which simplifies writing
|
---|
475 | of extension code that deals with hash structures. These functions
|
---|
476 | also allow passing of C<SV*> keys to C<tie> functions without forcing
|
---|
477 | you to stringify the keys (unlike the previous set of functions).
|
---|
478 |
|
---|
479 | They also return and accept whole hash entries (C<HE*>), making their
|
---|
480 | use more efficient (since the hash number for a particular string
|
---|
481 | doesn't have to be recomputed every time). See L<perlapi> for detailed
|
---|
482 | descriptions.
|
---|
483 |
|
---|
484 | The following macros must always be used to access the contents of hash
|
---|
485 | entries. Note that the arguments to these macros must be simple
|
---|
486 | variables, since they may get evaluated more than once. See
|
---|
487 | L<perlapi> for detailed descriptions of these macros.
|
---|
488 |
|
---|
489 | HePV(HE* he, STRLEN len)
|
---|
490 | HeVAL(HE* he)
|
---|
491 | HeHASH(HE* he)
|
---|
492 | HeSVKEY(HE* he)
|
---|
493 | HeSVKEY_force(HE* he)
|
---|
494 | HeSVKEY_set(HE* he, SV* sv)
|
---|
495 |
|
---|
496 | These two lower level macros are defined, but must only be used when
|
---|
497 | dealing with keys that are not C<SV*>s:
|
---|
498 |
|
---|
499 | HeKEY(HE* he)
|
---|
500 | HeKLEN(HE* he)
|
---|
501 |
|
---|
502 | Note that both C<hv_store> and C<hv_store_ent> do not increment the
|
---|
503 | reference count of the stored C<val>, which is the caller's responsibility.
|
---|
504 | If these functions return a NULL value, the caller will usually have to
|
---|
505 | decrement the reference count of C<val> to avoid a memory leak.
|
---|
506 |
|
---|
507 | =head2 AVs, HVs and undefined values
|
---|
508 |
|
---|
509 | Sometimes you have to store undefined values in AVs or HVs. Although
|
---|
510 | this may be a rare case, it can be tricky. That's because you're
|
---|
511 | used to using C<&PL_sv_undef> if you need an undefined SV.
|
---|
512 |
|
---|
513 | For example, intuition tells you that this XS code:
|
---|
514 |
|
---|
515 | AV *av = newAV();
|
---|
516 | av_store( av, 0, &PL_sv_undef );
|
---|
517 |
|
---|
518 | is equivalent to this Perl code:
|
---|
519 |
|
---|
520 | my @av;
|
---|
521 | $av[0] = undef;
|
---|
522 |
|
---|
523 | Unfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker
|
---|
524 | for indicating that an array element has not yet been initialized.
|
---|
525 | Thus, C<exists $av[0]> would be true for the above Perl code, but
|
---|
526 | false for the array generated by the XS code.
|
---|
527 |
|
---|
528 | Other problems can occur when storing C<&PL_sv_undef> in HVs:
|
---|
529 |
|
---|
530 | hv_store( hv, "key", 3, &PL_sv_undef, 0 );
|
---|
531 |
|
---|
532 | This will indeed make the value C<undef>, but if you try to modify
|
---|
533 | the value of C<key>, you'll get the following error:
|
---|
534 |
|
---|
535 | Modification of non-creatable hash value attempted
|
---|
536 |
|
---|
537 | In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
|
---|
538 | in restricted hashes. This caused such hash entries not to appear
|
---|
539 | when iterating over the hash or when checking for the keys
|
---|
540 | with the C<hv_exists> function.
|
---|
541 |
|
---|
542 | You can run into similar problems when you store C<&PL_sv_true> or
|
---|
543 | C<&PL_sv_false> into AVs or HVs. Trying to modify such elements
|
---|
544 | will give you the following error:
|
---|
545 |
|
---|
546 | Modification of a read-only value attempted
|
---|
547 |
|
---|
548 | To make a long story short, you can use the special variables
|
---|
549 | C<&PL_sv_undef>, C<&PL_sv_true> and C<&PL_sv_false> with AVs and
|
---|
550 | HVs, but you have to make sure you know what you're doing.
|
---|
551 |
|
---|
552 | Generally, if you want to store an undefined value in an AV
|
---|
553 | or HV, you should not use C<&PL_sv_undef>, but rather create a
|
---|
554 | new undefined value using the C<newSV> function, for example:
|
---|
555 |
|
---|
556 | av_store( av, 42, newSV(0) );
|
---|
557 | hv_store( hv, "foo", 3, newSV(0), 0 );
|
---|
558 |
|
---|
559 | =head2 References
|
---|
560 |
|
---|
561 | References are a special type of scalar that point to other data types
|
---|
562 | (including references).
|
---|
563 |
|
---|
564 | To create a reference, use either of the following functions:
|
---|
565 |
|
---|
566 | SV* newRV_inc((SV*) thing);
|
---|
567 | SV* newRV_noinc((SV*) thing);
|
---|
568 |
|
---|
569 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The
|
---|
570 | functions are identical except that C<newRV_inc> increments the reference
|
---|
571 | count of the C<thing>, while C<newRV_noinc> does not. For historical
|
---|
572 | reasons, C<newRV> is a synonym for C<newRV_inc>.
|
---|
573 |
|
---|
574 | Once you have a reference, you can use the following macro to dereference
|
---|
575 | the reference:
|
---|
576 |
|
---|
577 | SvRV(SV*)
|
---|
578 |
|
---|
579 | then call the appropriate routines, casting the returned C<SV*> to either an
|
---|
580 | C<AV*> or C<HV*>, if required.
|
---|
581 |
|
---|
582 | To determine if an SV is a reference, you can use the following macro:
|
---|
583 |
|
---|
584 | SvROK(SV*)
|
---|
585 |
|
---|
586 | To discover what type of value the reference refers to, use the following
|
---|
587 | macro and then check the return value.
|
---|
588 |
|
---|
589 | SvTYPE(SvRV(SV*))
|
---|
590 |
|
---|
591 | The most useful types that will be returned are:
|
---|
592 |
|
---|
593 | SVt_IV Scalar
|
---|
594 | SVt_NV Scalar
|
---|
595 | SVt_PV Scalar
|
---|
596 | SVt_RV Scalar
|
---|
597 | SVt_PVAV Array
|
---|
598 | SVt_PVHV Hash
|
---|
599 | SVt_PVCV Code
|
---|
600 | SVt_PVGV Glob (possible a file handle)
|
---|
601 | SVt_PVMG Blessed or Magical Scalar
|
---|
602 |
|
---|
603 | See the sv.h header file for more details.
|
---|
604 |
|
---|
605 | =head2 Blessed References and Class Objects
|
---|
606 |
|
---|
607 | References are also used to support object-oriented programming. In perl's
|
---|
608 | OO lexicon, an object is simply a reference that has been blessed into a
|
---|
609 | package (or class). Once blessed, the programmer may now use the reference
|
---|
610 | to access the various methods in the class.
|
---|
611 |
|
---|
612 | A reference can be blessed into a package with the following function:
|
---|
613 |
|
---|
614 | SV* sv_bless(SV* sv, HV* stash);
|
---|
615 |
|
---|
616 | The C<sv> argument must be a reference value. The C<stash> argument
|
---|
617 | specifies which class the reference will belong to. See
|
---|
618 | L<Stashes and Globs> for information on converting class names into stashes.
|
---|
619 |
|
---|
620 | /* Still under construction */
|
---|
621 |
|
---|
622 | Upgrades rv to reference if not already one. Creates new SV for rv to
|
---|
623 | point to. If C<classname> is non-null, the SV is blessed into the specified
|
---|
624 | class. SV is returned.
|
---|
625 |
|
---|
626 | SV* newSVrv(SV* rv, const char* classname);
|
---|
627 |
|
---|
628 | Copies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed
|
---|
629 | if C<classname> is non-null.
|
---|
630 |
|
---|
631 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
|
---|
632 | SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
|
---|
633 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
|
---|
634 |
|
---|
635 | Copies the pointer value (I<the address, not the string!>) into an SV whose
|
---|
636 | reference is rv. SV is blessed if C<classname> is non-null.
|
---|
637 |
|
---|
638 | SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
|
---|
639 |
|
---|
640 | Copies string into an SV whose reference is C<rv>. Set length to 0 to let
|
---|
641 | Perl calculate the string length. SV is blessed if C<classname> is non-null.
|
---|
642 |
|
---|
643 | SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
|
---|
644 |
|
---|
645 | Tests whether the SV is blessed into the specified class. It does not
|
---|
646 | check inheritance relationships.
|
---|
647 |
|
---|
648 | int sv_isa(SV* sv, const char* name);
|
---|
649 |
|
---|
650 | Tests whether the SV is a reference to a blessed object.
|
---|
651 |
|
---|
652 | int sv_isobject(SV* sv);
|
---|
653 |
|
---|
654 | Tests whether the SV is derived from the specified class. SV can be either
|
---|
655 | a reference to a blessed object or a string containing a class name. This
|
---|
656 | is the function implementing the C<UNIVERSAL::isa> functionality.
|
---|
657 |
|
---|
658 | bool sv_derived_from(SV* sv, const char* name);
|
---|
659 |
|
---|
660 | To check if you've got an object derived from a specific class you have
|
---|
661 | to write:
|
---|
662 |
|
---|
663 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
|
---|
664 |
|
---|
665 | =head2 Creating New Variables
|
---|
666 |
|
---|
667 | To create a new Perl variable with an undef value which can be accessed from
|
---|
668 | your Perl script, use the following routines, depending on the variable type.
|
---|
669 |
|
---|
670 | SV* get_sv("package::varname", TRUE);
|
---|
671 | AV* get_av("package::varname", TRUE);
|
---|
672 | HV* get_hv("package::varname", TRUE);
|
---|
673 |
|
---|
674 | Notice the use of TRUE as the second parameter. The new variable can now
|
---|
675 | be set, using the routines appropriate to the data type.
|
---|
676 |
|
---|
677 | There are additional macros whose values may be bitwise OR'ed with the
|
---|
678 | C<TRUE> argument to enable certain extra features. Those bits are:
|
---|
679 |
|
---|
680 | =over
|
---|
681 |
|
---|
682 | =item GV_ADDMULTI
|
---|
683 |
|
---|
684 | Marks the variable as multiply defined, thus preventing the:
|
---|
685 |
|
---|
686 | Name <varname> used only once: possible typo
|
---|
687 |
|
---|
688 | warning.
|
---|
689 |
|
---|
690 | =item GV_ADDWARN
|
---|
691 |
|
---|
692 | Issues the warning:
|
---|
693 |
|
---|
694 | Had to create <varname> unexpectedly
|
---|
695 |
|
---|
696 | if the variable did not exist before the function was called.
|
---|
697 |
|
---|
698 | =back
|
---|
699 |
|
---|
700 | If you do not specify a package name, the variable is created in the current
|
---|
701 | package.
|
---|
702 |
|
---|
703 | =head2 Reference Counts and Mortality
|
---|
704 |
|
---|
705 | Perl uses a reference count-driven garbage collection mechanism. SVs,
|
---|
706 | AVs, or HVs (xV for short in the following) start their life with a
|
---|
707 | reference count of 1. If the reference count of an xV ever drops to 0,
|
---|
708 | then it will be destroyed and its memory made available for reuse.
|
---|
709 |
|
---|
710 | This normally doesn't happen at the Perl level unless a variable is
|
---|
711 | undef'ed or the last variable holding a reference to it is changed or
|
---|
712 | overwritten. At the internal level, however, reference counts can be
|
---|
713 | manipulated with the following macros:
|
---|
714 |
|
---|
715 | int SvREFCNT(SV* sv);
|
---|
716 | SV* SvREFCNT_inc(SV* sv);
|
---|
717 | void SvREFCNT_dec(SV* sv);
|
---|
718 |
|
---|
719 | However, there is one other function which manipulates the reference
|
---|
720 | count of its argument. The C<newRV_inc> function, you will recall,
|
---|
721 | creates a reference to the specified argument. As a side effect,
|
---|
722 | it increments the argument's reference count. If this is not what
|
---|
723 | you want, use C<newRV_noinc> instead.
|
---|
724 |
|
---|
725 | For example, imagine you want to return a reference from an XSUB function.
|
---|
726 | Inside the XSUB routine, you create an SV which initially has a reference
|
---|
727 | count of one. Then you call C<newRV_inc>, passing it the just-created SV.
|
---|
728 | This returns the reference as a new SV, but the reference count of the
|
---|
729 | SV you passed to C<newRV_inc> has been incremented to two. Now you
|
---|
730 | return the reference from the XSUB routine and forget about the SV.
|
---|
731 | But Perl hasn't! Whenever the returned reference is destroyed, the
|
---|
732 | reference count of the original SV is decreased to one and nothing happens.
|
---|
733 | The SV will hang around without any way to access it until Perl itself
|
---|
734 | terminates. This is a memory leak.
|
---|
735 |
|
---|
736 | The correct procedure, then, is to use C<newRV_noinc> instead of
|
---|
737 | C<newRV_inc>. Then, if and when the last reference is destroyed,
|
---|
738 | the reference count of the SV will go to zero and it will be destroyed,
|
---|
739 | stopping any memory leak.
|
---|
740 |
|
---|
741 | There are some convenience functions available that can help with the
|
---|
742 | destruction of xVs. These functions introduce the concept of "mortality".
|
---|
743 | An xV that is mortal has had its reference count marked to be decremented,
|
---|
744 | but not actually decremented, until "a short time later". Generally the
|
---|
745 | term "short time later" means a single Perl statement, such as a call to
|
---|
746 | an XSUB function. The actual determinant for when mortal xVs have their
|
---|
747 | reference count decremented depends on two macros, SAVETMPS and FREETMPS.
|
---|
748 | See L<perlcall> and L<perlxs> for more details on these macros.
|
---|
749 |
|
---|
750 | "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
|
---|
751 | However, if you mortalize a variable twice, the reference count will
|
---|
752 | later be decremented twice.
|
---|
753 |
|
---|
754 | "Mortal" SVs are mainly used for SVs that are placed on perl's stack.
|
---|
755 | For example an SV which is created just to pass a number to a called sub
|
---|
756 | is made mortal to have it cleaned up automatically when it's popped off
|
---|
757 | the stack. Similarly, results returned by XSUBs (which are pushed on the
|
---|
758 | stack) are often made mortal.
|
---|
759 |
|
---|
760 | To create a mortal variable, use the functions:
|
---|
761 |
|
---|
762 | SV* sv_newmortal()
|
---|
763 | SV* sv_2mortal(SV*)
|
---|
764 | SV* sv_mortalcopy(SV*)
|
---|
765 |
|
---|
766 | The first call creates a mortal SV (with no value), the second converts an existing
|
---|
767 | SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
|
---|
768 | third creates a mortal copy of an existing SV.
|
---|
769 | Because C<sv_newmortal> gives the new SV no value,it must normally be given one
|
---|
770 | via C<sv_setpv>, C<sv_setiv>, etc. :
|
---|
771 |
|
---|
772 | SV *tmp = sv_newmortal();
|
---|
773 | sv_setiv(tmp, an_integer);
|
---|
774 |
|
---|
775 | As that is multiple C statements it is quite common so see this idiom instead:
|
---|
776 |
|
---|
777 | SV *tmp = sv_2mortal(newSViv(an_integer));
|
---|
778 |
|
---|
779 |
|
---|
780 | You should be careful about creating mortal variables. Strange things
|
---|
781 | can happen if you make the same value mortal within multiple contexts,
|
---|
782 | or if you make a variable mortal multiple times. Thinking of "Mortalization"
|
---|
783 | as deferred C<SvREFCNT_dec> should help to minimize such problems.
|
---|
784 | For example if you are passing an SV which you I<know> has high enough REFCNT
|
---|
785 | to survive its use on the stack you need not do any mortalization.
|
---|
786 | If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
|
---|
787 | making a C<sv_mortalcopy> is safer.
|
---|
788 |
|
---|
789 | The mortal routines are not just for SVs -- AVs and HVs can be
|
---|
790 | made mortal by passing their address (type-casted to C<SV*>) to the
|
---|
791 | C<sv_2mortal> or C<sv_mortalcopy> routines.
|
---|
792 |
|
---|
793 | =head2 Stashes and Globs
|
---|
794 |
|
---|
795 | A B<stash> is a hash that contains all variables that are defined
|
---|
796 | within a package. Each key of the stash is a symbol
|
---|
797 | name (shared by all the different types of objects that have the same
|
---|
798 | name), and each value in the hash table is a GV (Glob Value). This GV
|
---|
799 | in turn contains references to the various objects of that name,
|
---|
800 | including (but not limited to) the following:
|
---|
801 |
|
---|
802 | Scalar Value
|
---|
803 | Array Value
|
---|
804 | Hash Value
|
---|
805 | I/O Handle
|
---|
806 | Format
|
---|
807 | Subroutine
|
---|
808 |
|
---|
809 | There is a single stash called C<PL_defstash> that holds the items that exist
|
---|
810 | in the C<main> package. To get at the items in other packages, append the
|
---|
811 | string "::" to the package name. The items in the C<Foo> package are in
|
---|
812 | the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are
|
---|
813 | in the stash C<Baz::> in C<Bar::>'s stash.
|
---|
814 |
|
---|
815 | To get the stash pointer for a particular package, use the function:
|
---|
816 |
|
---|
817 | HV* gv_stashpv(const char* name, I32 create)
|
---|
818 | HV* gv_stashsv(SV*, I32 create)
|
---|
819 |
|
---|
820 | The first function takes a literal string, the second uses the string stored
|
---|
821 | in the SV. Remember that a stash is just a hash table, so you get back an
|
---|
822 | C<HV*>. The C<create> flag will create a new package if it is set.
|
---|
823 |
|
---|
824 | The name that C<gv_stash*v> wants is the name of the package whose symbol table
|
---|
825 | you want. The default package is called C<main>. If you have multiply nested
|
---|
826 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
|
---|
827 | language itself.
|
---|
828 |
|
---|
829 | Alternately, if you have an SV that is a blessed reference, you can find
|
---|
830 | out the stash pointer by using:
|
---|
831 |
|
---|
832 | HV* SvSTASH(SvRV(SV*));
|
---|
833 |
|
---|
834 | then use the following to get the package name itself:
|
---|
835 |
|
---|
836 | char* HvNAME(HV* stash);
|
---|
837 |
|
---|
838 | If you need to bless or re-bless an object you can use the following
|
---|
839 | function:
|
---|
840 |
|
---|
841 | SV* sv_bless(SV*, HV* stash)
|
---|
842 |
|
---|
843 | where the first argument, an C<SV*>, must be a reference, and the second
|
---|
844 | argument is a stash. The returned C<SV*> can now be used in the same way
|
---|
845 | as any other SV.
|
---|
846 |
|
---|
847 | For more information on references and blessings, consult L<perlref>.
|
---|
848 |
|
---|
849 | =head2 Double-Typed SVs
|
---|
850 |
|
---|
851 | Scalar variables normally contain only one type of value, an integer,
|
---|
852 | double, pointer, or reference. Perl will automatically convert the
|
---|
853 | actual scalar data from the stored type into the requested type.
|
---|
854 |
|
---|
855 | Some scalar variables contain more than one type of scalar data. For
|
---|
856 | example, the variable C<$!> contains either the numeric value of C<errno>
|
---|
857 | or its string equivalent from either C<strerror> or C<sys_errlist[]>.
|
---|
858 |
|
---|
859 | To force multiple data values into an SV, you must do two things: use the
|
---|
860 | C<sv_set*v> routines to add the additional scalar type, then set a flag
|
---|
861 | so that Perl will believe it contains more than one type of data. The
|
---|
862 | four macros to set the flags are:
|
---|
863 |
|
---|
864 | SvIOK_on
|
---|
865 | SvNOK_on
|
---|
866 | SvPOK_on
|
---|
867 | SvROK_on
|
---|
868 |
|
---|
869 | The particular macro you must use depends on which C<sv_set*v> routine
|
---|
870 | you called first. This is because every C<sv_set*v> routine turns on
|
---|
871 | only the bit for the particular type of data being set, and turns off
|
---|
872 | all the rest.
|
---|
873 |
|
---|
874 | For example, to create a new Perl variable called "dberror" that contains
|
---|
875 | both the numeric and descriptive string error values, you could use the
|
---|
876 | following code:
|
---|
877 |
|
---|
878 | extern int dberror;
|
---|
879 | extern char *dberror_list;
|
---|
880 |
|
---|
881 | SV* sv = get_sv("dberror", TRUE);
|
---|
882 | sv_setiv(sv, (IV) dberror);
|
---|
883 | sv_setpv(sv, dberror_list[dberror]);
|
---|
884 | SvIOK_on(sv);
|
---|
885 |
|
---|
886 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
|
---|
887 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
|
---|
888 |
|
---|
889 | =head2 Magic Variables
|
---|
890 |
|
---|
891 | [This section still under construction. Ignore everything here. Post no
|
---|
892 | bills. Everything not permitted is forbidden.]
|
---|
893 |
|
---|
894 | Any SV may be magical, that is, it has special features that a normal
|
---|
895 | SV does not have. These features are stored in the SV structure in a
|
---|
896 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
|
---|
897 |
|
---|
898 | struct magic {
|
---|
899 | MAGIC* mg_moremagic;
|
---|
900 | MGVTBL* mg_virtual;
|
---|
901 | U16 mg_private;
|
---|
902 | char mg_type;
|
---|
903 | U8 mg_flags;
|
---|
904 | SV* mg_obj;
|
---|
905 | char* mg_ptr;
|
---|
906 | I32 mg_len;
|
---|
907 | };
|
---|
908 |
|
---|
909 | Note this is current as of patchlevel 0, and could change at any time.
|
---|
910 |
|
---|
911 | =head2 Assigning Magic
|
---|
912 |
|
---|
913 | Perl adds magic to an SV using the sv_magic function:
|
---|
914 |
|
---|
915 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
|
---|
916 |
|
---|
917 | The C<sv> argument is a pointer to the SV that is to acquire a new magical
|
---|
918 | feature.
|
---|
919 |
|
---|
920 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
|
---|
921 | convert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic
|
---|
922 | to the beginning of the linked list of magical features. Any prior entry
|
---|
923 | of the same type of magic is deleted. Note that this can be overridden,
|
---|
924 | and multiple instances of the same type of magic can be associated with an
|
---|
925 | SV.
|
---|
926 |
|
---|
927 | The C<name> and C<namlen> arguments are used to associate a string with
|
---|
928 | the magic, typically the name of a variable. C<namlen> is stored in the
|
---|
929 | C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of
|
---|
930 | C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on
|
---|
931 | whether C<namlen> is greater than zero or equal to zero respectively. As a
|
---|
932 | special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed
|
---|
933 | to contain an C<SV*> and is stored as-is with its REFCNT incremented.
|
---|
934 |
|
---|
935 | The sv_magic function uses C<how> to determine which, if any, predefined
|
---|
936 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field.
|
---|
937 | See the L<Magic Virtual Tables> section below. The C<how> argument is also
|
---|
938 | stored in the C<mg_type> field. The value of C<how> should be chosen
|
---|
939 | from the set of macros C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
|
---|
940 | these macros were added, Perl internals used to directly use character
|
---|
941 | literals, so you may occasionally come across old code or documentation
|
---|
942 | referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
|
---|
943 |
|
---|
944 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
|
---|
945 | structure. If it is not the same as the C<sv> argument, the reference
|
---|
946 | count of the C<obj> object is incremented. If it is the same, or if
|
---|
947 | the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer,
|
---|
948 | then C<obj> is merely stored, without the reference count being incremented.
|
---|
949 |
|
---|
950 | See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic
|
---|
951 | to an SV.
|
---|
952 |
|
---|
953 | There is also a function to add magic to an C<HV>:
|
---|
954 |
|
---|
955 | void hv_magic(HV *hv, GV *gv, int how);
|
---|
956 |
|
---|
957 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
|
---|
958 |
|
---|
959 | To remove the magic from an SV, call the function sv_unmagic:
|
---|
960 |
|
---|
961 | void sv_unmagic(SV *sv, int type);
|
---|
962 |
|
---|
963 | The C<type> argument should be equal to the C<how> value when the C<SV>
|
---|
964 | was initially made magical.
|
---|
965 |
|
---|
966 | =head2 Magic Virtual Tables
|
---|
967 |
|
---|
968 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an
|
---|
969 | C<MGVTBL>, which is a structure of function pointers and stands for
|
---|
970 | "Magic Virtual Table" to handle the various operations that might be
|
---|
971 | applied to that variable.
|
---|
972 |
|
---|
973 | The C<MGVTBL> has five pointers to the following routine types:
|
---|
974 |
|
---|
975 | int (*svt_get)(SV* sv, MAGIC* mg);
|
---|
976 | int (*svt_set)(SV* sv, MAGIC* mg);
|
---|
977 | U32 (*svt_len)(SV* sv, MAGIC* mg);
|
---|
978 | int (*svt_clear)(SV* sv, MAGIC* mg);
|
---|
979 | int (*svt_free)(SV* sv, MAGIC* mg);
|
---|
980 |
|
---|
981 | This MGVTBL structure is set at compile-time in F<perl.h> and there are
|
---|
982 | currently 19 types (or 21 with overloading turned on). These different
|
---|
983 | structures contain pointers to various routines that perform additional
|
---|
984 | actions depending on which function is being called.
|
---|
985 |
|
---|
986 | Function pointer Action taken
|
---|
987 | ---------------- ------------
|
---|
988 | svt_get Do something before the value of the SV is retrieved.
|
---|
989 | svt_set Do something after the SV is assigned a value.
|
---|
990 | svt_len Report on the SV's length.
|
---|
991 | svt_clear Clear something the SV represents.
|
---|
992 | svt_free Free any extra storage associated with the SV.
|
---|
993 |
|
---|
994 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
|
---|
995 | to an C<mg_type> of C<PERL_MAGIC_sv>) contains:
|
---|
996 |
|
---|
997 | { magic_get, magic_set, magic_len, 0, 0 }
|
---|
998 |
|
---|
999 | Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>,
|
---|
1000 | if a get operation is being performed, the routine C<magic_get> is
|
---|
1001 | called. All the various routines for the various magical types begin
|
---|
1002 | with C<magic_>. NOTE: the magic routines are not considered part of
|
---|
1003 | the Perl API, and may not be exported by the Perl library.
|
---|
1004 |
|
---|
1005 | The current kinds of Magic Virtual Tables are:
|
---|
1006 |
|
---|
1007 | mg_type
|
---|
1008 | (old-style char and macro) MGVTBL Type of magic
|
---|
1009 | -------------------------- ------ ----------------------------
|
---|
1010 | \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
|
---|
1011 | A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
|
---|
1012 | a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element
|
---|
1013 | c PERL_MAGIC_overload_table (none) Holds overload table (AMT)
|
---|
1014 | on stash
|
---|
1015 | B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search)
|
---|
1016 | D PERL_MAGIC_regdata vtbl_regdata Regex match position data
|
---|
1017 | (@+ and @- vars)
|
---|
1018 | d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
|
---|
1019 | element
|
---|
1020 | E PERL_MAGIC_env vtbl_env %ENV hash
|
---|
1021 | e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
|
---|
1022 | f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format)
|
---|
1023 | g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string
|
---|
1024 | I PERL_MAGIC_isa vtbl_isa @ISA array
|
---|
1025 | i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
|
---|
1026 | k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
|
---|
1027 | L PERL_MAGIC_dbfile (none) Debugger %_<filename
|
---|
1028 | l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element
|
---|
1029 | m PERL_MAGIC_mutex vtbl_mutex ???
|
---|
1030 | o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation
|
---|
1031 | P PERL_MAGIC_tied vtbl_pack Tied array or hash
|
---|
1032 | p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
|
---|
1033 | q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
|
---|
1034 | r PERL_MAGIC_qr vtbl_qr precompiled qr// regex
|
---|
1035 | S PERL_MAGIC_sig vtbl_sig %SIG hash
|
---|
1036 | s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
|
---|
1037 | t PERL_MAGIC_taint vtbl_taint Taintedness
|
---|
1038 | U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions
|
---|
1039 | v PERL_MAGIC_vec vtbl_vec vec() lvalue
|
---|
1040 | V PERL_MAGIC_vstring (none) v-string scalars
|
---|
1041 | w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache
|
---|
1042 | x PERL_MAGIC_substr vtbl_substr substr() lvalue
|
---|
1043 | y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
|
---|
1044 | variable / smart parameter
|
---|
1045 | vivification
|
---|
1046 | * PERL_MAGIC_glob vtbl_glob GV (typeglob)
|
---|
1047 | # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
|
---|
1048 | . PERL_MAGIC_pos vtbl_pos pos() lvalue
|
---|
1049 | < PERL_MAGIC_backref vtbl_backref ???
|
---|
1050 | ~ PERL_MAGIC_ext (none) Available for use by extensions
|
---|
1051 |
|
---|
1052 | When an uppercase and lowercase letter both exist in the table, then the
|
---|
1053 | uppercase letter is typically used to represent some kind of composite type
|
---|
1054 | (a list or a hash), and the lowercase letter is used to represent an element
|
---|
1055 | of that composite type. Some internals code makes use of this case
|
---|
1056 | relationship. However, 'v' and 'V' (vec and v-string) are in no way related.
|
---|
1057 |
|
---|
1058 | The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined
|
---|
1059 | specifically for use by extensions and will not be used by perl itself.
|
---|
1060 | Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information
|
---|
1061 | to variables (typically objects). This is especially useful because
|
---|
1062 | there is no way for normal perl code to corrupt this private information
|
---|
1063 | (unlike using extra elements of a hash object).
|
---|
1064 |
|
---|
1065 | Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a
|
---|
1066 | C function any time a scalar's value is used or changed. The C<MAGIC>'s
|
---|
1067 | C<mg_ptr> field points to a C<ufuncs> structure:
|
---|
1068 |
|
---|
1069 | struct ufuncs {
|
---|
1070 | I32 (*uf_val)(pTHX_ IV, SV*);
|
---|
1071 | I32 (*uf_set)(pTHX_ IV, SV*);
|
---|
1072 | IV uf_index;
|
---|
1073 | };
|
---|
1074 |
|
---|
1075 | When the SV is read from or written to, the C<uf_val> or C<uf_set>
|
---|
1076 | function will be called with C<uf_index> as the first arg and a pointer to
|
---|
1077 | the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar>
|
---|
1078 | magic is shown below. Note that the ufuncs structure is copied by
|
---|
1079 | sv_magic, so you can safely allocate it on the stack.
|
---|
1080 |
|
---|
1081 | void
|
---|
1082 | Umagic(sv)
|
---|
1083 | SV *sv;
|
---|
1084 | PREINIT:
|
---|
1085 | struct ufuncs uf;
|
---|
1086 | CODE:
|
---|
1087 | uf.uf_val = &my_get_fn;
|
---|
1088 | uf.uf_set = &my_set_fn;
|
---|
1089 | uf.uf_index = 0;
|
---|
1090 | sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
|
---|
1091 |
|
---|
1092 | Note that because multiple extensions may be using C<PERL_MAGIC_ext>
|
---|
1093 | or C<PERL_MAGIC_uvar> magic, it is important for extensions to take
|
---|
1094 | extra care to avoid conflict. Typically only using the magic on
|
---|
1095 | objects blessed into the same class as the extension is sufficient.
|
---|
1096 | For C<PERL_MAGIC_ext> magic, it may also be appropriate to add an I32
|
---|
1097 | 'signature' at the top of the private data area and check that.
|
---|
1098 |
|
---|
1099 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described
|
---|
1100 | earlier do B<not> invoke 'set' magic on their targets. This must
|
---|
1101 | be done by the user either by calling the C<SvSETMAGIC()> macro after
|
---|
1102 | calling these functions, or by using one of the C<sv_set*_mg()> or
|
---|
1103 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the
|
---|
1104 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
|
---|
1105 | obtained from external sources in functions that don't handle magic.
|
---|
1106 | See L<perlapi> for a description of these functions.
|
---|
1107 | For example, calls to the C<sv_cat*()> functions typically need to be
|
---|
1108 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
|
---|
1109 | since their implementation handles 'get' magic.
|
---|
1110 |
|
---|
1111 | =head2 Finding Magic
|
---|
1112 |
|
---|
1113 | MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
|
---|
1114 |
|
---|
1115 | This routine returns a pointer to the C<MAGIC> structure stored in the SV.
|
---|
1116 | If the SV does not have that magical feature, C<NULL> is returned. Also,
|
---|
1117 | if the SV is not of type SVt_PVMG, Perl may core dump.
|
---|
1118 |
|
---|
1119 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
|
---|
1120 |
|
---|
1121 | This routine checks to see what types of magic C<sv> has. If the mg_type
|
---|
1122 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
|
---|
1123 | the mg_type field is changed to be the lowercase letter.
|
---|
1124 |
|
---|
1125 | =head2 Understanding the Magic of Tied Hashes and Arrays
|
---|
1126 |
|
---|
1127 | Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied>
|
---|
1128 | magic type.
|
---|
1129 |
|
---|
1130 | WARNING: As of the 5.004 release, proper usage of the array and hash
|
---|
1131 | access functions requires understanding a few caveats. Some
|
---|
1132 | of these caveats are actually considered bugs in the API, to be fixed
|
---|
1133 | in later releases, and are bracketed with [MAYCHANGE] below. If
|
---|
1134 | you find yourself actually applying such information in this section, be
|
---|
1135 | aware that the behavior may change in the future, umm, without warning.
|
---|
1136 |
|
---|
1137 | The perl tie function associates a variable with an object that implements
|
---|
1138 | the various GET, SET, etc methods. To perform the equivalent of the perl
|
---|
1139 | tie function from an XSUB, you must mimic this behaviour. The code below
|
---|
1140 | carries out the necessary steps - firstly it creates a new hash, and then
|
---|
1141 | creates a second hash which it blesses into the class which will implement
|
---|
1142 | the tie methods. Lastly it ties the two hashes together, and returns a
|
---|
1143 | reference to the new tied hash. Note that the code below does NOT call the
|
---|
1144 | TIEHASH method in the MyTie class -
|
---|
1145 | see L<Calling Perl Routines from within C Programs> for details on how
|
---|
1146 | to do this.
|
---|
1147 |
|
---|
1148 | SV*
|
---|
1149 | mytie()
|
---|
1150 | PREINIT:
|
---|
1151 | HV *hash;
|
---|
1152 | HV *stash;
|
---|
1153 | SV *tie;
|
---|
1154 | CODE:
|
---|
1155 | hash = newHV();
|
---|
1156 | tie = newRV_noinc((SV*)newHV());
|
---|
1157 | stash = gv_stashpv("MyTie", TRUE);
|
---|
1158 | sv_bless(tie, stash);
|
---|
1159 | hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
|
---|
1160 | RETVAL = newRV_noinc(hash);
|
---|
1161 | OUTPUT:
|
---|
1162 | RETVAL
|
---|
1163 |
|
---|
1164 | The C<av_store> function, when given a tied array argument, merely
|
---|
1165 | copies the magic of the array onto the value to be "stored", using
|
---|
1166 | C<mg_copy>. It may also return NULL, indicating that the value did not
|
---|
1167 | actually need to be stored in the array. [MAYCHANGE] After a call to
|
---|
1168 | C<av_store> on a tied array, the caller will usually need to call
|
---|
1169 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the
|
---|
1170 | TIEARRAY object. If C<av_store> did return NULL, a call to
|
---|
1171 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
|
---|
1172 | leak. [/MAYCHANGE]
|
---|
1173 |
|
---|
1174 | The previous paragraph is applicable verbatim to tied hash access using the
|
---|
1175 | C<hv_store> and C<hv_store_ent> functions as well.
|
---|
1176 |
|
---|
1177 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and
|
---|
1178 | C<hv_fetch_ent> actually return an undefined mortal value whose magic
|
---|
1179 | has been initialized using C<mg_copy>. Note the value so returned does not
|
---|
1180 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
|
---|
1181 | need to call C<mg_get()> on the returned value in order to actually invoke
|
---|
1182 | the perl level "FETCH" method on the underlying TIE object. Similarly,
|
---|
1183 | you may also call C<mg_set()> on the return value after possibly assigning
|
---|
1184 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE"
|
---|
1185 | method on the TIE object. [/MAYCHANGE]
|
---|
1186 |
|
---|
1187 | [MAYCHANGE]
|
---|
1188 | In other words, the array or hash fetch/store functions don't really
|
---|
1189 | fetch and store actual values in the case of tied arrays and hashes. They
|
---|
1190 | merely call C<mg_copy> to attach magic to the values that were meant to be
|
---|
1191 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
|
---|
1192 | do the job of invoking the TIE methods on the underlying objects. Thus
|
---|
1193 | the magic mechanism currently implements a kind of lazy access to arrays
|
---|
1194 | and hashes.
|
---|
1195 |
|
---|
1196 | Currently (as of perl version 5.004), use of the hash and array access
|
---|
1197 | functions requires the user to be aware of whether they are operating on
|
---|
1198 | "normal" hashes and arrays, or on their tied variants. The API may be
|
---|
1199 | changed to provide more transparent access to both tied and normal data
|
---|
1200 | types in future versions.
|
---|
1201 | [/MAYCHANGE]
|
---|
1202 |
|
---|
1203 | You would do well to understand that the TIEARRAY and TIEHASH interfaces
|
---|
1204 | are mere sugar to invoke some perl method calls while using the uniform hash
|
---|
1205 | and array syntax. The use of this sugar imposes some overhead (typically
|
---|
1206 | about two to four extra opcodes per FETCH/STORE operation, in addition to
|
---|
1207 | the creation of all the mortal variables required to invoke the methods).
|
---|
1208 | This overhead will be comparatively small if the TIE methods are themselves
|
---|
1209 | substantial, but if they are only a few statements long, the overhead
|
---|
1210 | will not be insignificant.
|
---|
1211 |
|
---|
1212 | =head2 Localizing changes
|
---|
1213 |
|
---|
1214 | Perl has a very handy construction
|
---|
1215 |
|
---|
1216 | {
|
---|
1217 | local $var = 2;
|
---|
1218 | ...
|
---|
1219 | }
|
---|
1220 |
|
---|
1221 | This construction is I<approximately> equivalent to
|
---|
1222 |
|
---|
1223 | {
|
---|
1224 | my $oldvar = $var;
|
---|
1225 | $var = 2;
|
---|
1226 | ...
|
---|
1227 | $var = $oldvar;
|
---|
1228 | }
|
---|
1229 |
|
---|
1230 | The biggest difference is that the first construction would
|
---|
1231 | reinstate the initial value of $var, irrespective of how control exits
|
---|
1232 | the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
|
---|
1233 | more efficient as well.
|
---|
1234 |
|
---|
1235 | There is a way to achieve a similar task from C via Perl API: create a
|
---|
1236 | I<pseudo-block>, and arrange for some changes to be automatically
|
---|
1237 | undone at the end of it, either explicit, or via a non-local exit (via
|
---|
1238 | die()). A I<block>-like construct is created by a pair of
|
---|
1239 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
|
---|
1240 | Such a construct may be created specially for some important localized
|
---|
1241 | task, or an existing one (like boundaries of enclosing Perl
|
---|
1242 | subroutine/block, or an existing pair for freeing TMPs) may be
|
---|
1243 | used. (In the second case the overhead of additional localization must
|
---|
1244 | be almost negligible.) Note that any XSUB is automatically enclosed in
|
---|
1245 | an C<ENTER>/C<LEAVE> pair.
|
---|
1246 |
|
---|
1247 | Inside such a I<pseudo-block> the following service is available:
|
---|
1248 |
|
---|
1249 | =over 4
|
---|
1250 |
|
---|
1251 | =item C<SAVEINT(int i)>
|
---|
1252 |
|
---|
1253 | =item C<SAVEIV(IV i)>
|
---|
1254 |
|
---|
1255 | =item C<SAVEI32(I32 i)>
|
---|
1256 |
|
---|
1257 | =item C<SAVELONG(long i)>
|
---|
1258 |
|
---|
1259 | These macros arrange things to restore the value of integer variable
|
---|
1260 | C<i> at the end of enclosing I<pseudo-block>.
|
---|
1261 |
|
---|
1262 | =item C<SAVESPTR(s)>
|
---|
1263 |
|
---|
1264 | =item C<SAVEPPTR(p)>
|
---|
1265 |
|
---|
1266 | These macros arrange things to restore the value of pointers C<s> and
|
---|
1267 | C<p>. C<s> must be a pointer of a type which survives conversion to
|
---|
1268 | C<SV*> and back, C<p> should be able to survive conversion to C<char*>
|
---|
1269 | and back.
|
---|
1270 |
|
---|
1271 | =item C<SAVEFREESV(SV *sv)>
|
---|
1272 |
|
---|
1273 | The refcount of C<sv> would be decremented at the end of
|
---|
1274 | I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a
|
---|
1275 | mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal>
|
---|
1276 | extends the lifetime of C<sv> until the beginning of the next statement,
|
---|
1277 | C<SAVEFREESV> extends it until the end of the enclosing scope. These
|
---|
1278 | lifetimes can be wildly different.
|
---|
1279 |
|
---|
1280 | Also compare C<SAVEMORTALIZESV>.
|
---|
1281 |
|
---|
1282 | =item C<SAVEMORTALIZESV(SV *sv)>
|
---|
1283 |
|
---|
1284 | Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
|
---|
1285 | scope instead of decrementing its reference count. This usually has the
|
---|
1286 | effect of keeping C<sv> alive until the statement that called the currently
|
---|
1287 | live scope has finished executing.
|
---|
1288 |
|
---|
1289 | =item C<SAVEFREEOP(OP *op)>
|
---|
1290 |
|
---|
1291 | The C<OP *> is op_free()ed at the end of I<pseudo-block>.
|
---|
1292 |
|
---|
1293 | =item C<SAVEFREEPV(p)>
|
---|
1294 |
|
---|
1295 | The chunk of memory which is pointed to by C<p> is Safefree()ed at the
|
---|
1296 | end of I<pseudo-block>.
|
---|
1297 |
|
---|
1298 | =item C<SAVECLEARSV(SV *sv)>
|
---|
1299 |
|
---|
1300 | Clears a slot in the current scratchpad which corresponds to C<sv> at
|
---|
1301 | the end of I<pseudo-block>.
|
---|
1302 |
|
---|
1303 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)>
|
---|
1304 |
|
---|
1305 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
|
---|
1306 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in
|
---|
1307 | short-lived storage, the corresponding string may be reallocated like
|
---|
1308 | this:
|
---|
1309 |
|
---|
1310 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
|
---|
1311 |
|
---|
1312 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
|
---|
1313 |
|
---|
1314 | At the end of I<pseudo-block> the function C<f> is called with the
|
---|
1315 | only argument C<p>.
|
---|
1316 |
|
---|
1317 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
|
---|
1318 |
|
---|
1319 | At the end of I<pseudo-block> the function C<f> is called with the
|
---|
1320 | implicit context argument (if any), and C<p>.
|
---|
1321 |
|
---|
1322 | =item C<SAVESTACK_POS()>
|
---|
1323 |
|
---|
1324 | The current offset on the Perl internal stack (cf. C<SP>) is restored
|
---|
1325 | at the end of I<pseudo-block>.
|
---|
1326 |
|
---|
1327 | =back
|
---|
1328 |
|
---|
1329 | The following API list contains functions, thus one needs to
|
---|
1330 | provide pointers to the modifiable data explicitly (either C pointers,
|
---|
1331 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar
|
---|
1332 | function takes C<int *>.
|
---|
1333 |
|
---|
1334 | =over 4
|
---|
1335 |
|
---|
1336 | =item C<SV* save_scalar(GV *gv)>
|
---|
1337 |
|
---|
1338 | Equivalent to Perl code C<local $gv>.
|
---|
1339 |
|
---|
1340 | =item C<AV* save_ary(GV *gv)>
|
---|
1341 |
|
---|
1342 | =item C<HV* save_hash(GV *gv)>
|
---|
1343 |
|
---|
1344 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
|
---|
1345 |
|
---|
1346 | =item C<void save_item(SV *item)>
|
---|
1347 |
|
---|
1348 | Duplicates the current value of C<SV>, on the exit from the current
|
---|
1349 | C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
|
---|
1350 | using the stored value.
|
---|
1351 |
|
---|
1352 | =item C<void save_list(SV **sarg, I32 maxsarg)>
|
---|
1353 |
|
---|
1354 | A variant of C<save_item> which takes multiple arguments via an array
|
---|
1355 | C<sarg> of C<SV*> of length C<maxsarg>.
|
---|
1356 |
|
---|
1357 | =item C<SV* save_svref(SV **sptr)>
|
---|
1358 |
|
---|
1359 | Similar to C<save_scalar>, but will reinstate an C<SV *>.
|
---|
1360 |
|
---|
1361 | =item C<void save_aptr(AV **aptr)>
|
---|
1362 |
|
---|
1363 | =item C<void save_hptr(HV **hptr)>
|
---|
1364 |
|
---|
1365 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>.
|
---|
1366 |
|
---|
1367 | =back
|
---|
1368 |
|
---|
1369 | The C<Alias> module implements localization of the basic types within the
|
---|
1370 | I<caller's scope>. People who are interested in how to localize things in
|
---|
1371 | the containing scope should take a look there too.
|
---|
1372 |
|
---|
1373 | =head1 Subroutines
|
---|
1374 |
|
---|
1375 | =head2 XSUBs and the Argument Stack
|
---|
1376 |
|
---|
1377 | The XSUB mechanism is a simple way for Perl programs to access C subroutines.
|
---|
1378 | An XSUB routine will have a stack that contains the arguments from the Perl
|
---|
1379 | program, and a way to map from the Perl data structures to a C equivalent.
|
---|
1380 |
|
---|
1381 | The stack arguments are accessible through the C<ST(n)> macro, which returns
|
---|
1382 | the C<n>'th stack argument. Argument 0 is the first argument passed in the
|
---|
1383 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere
|
---|
1384 | an C<SV*> is used.
|
---|
1385 |
|
---|
1386 | Most of the time, output from the C routine can be handled through use of
|
---|
1387 | the RETVAL and OUTPUT directives. However, there are some cases where the
|
---|
1388 | argument stack is not already long enough to handle all the return values.
|
---|
1389 | An example is the POSIX tzname() call, which takes no arguments, but returns
|
---|
1390 | two, the local time zone's standard and summer time abbreviations.
|
---|
1391 |
|
---|
1392 | To handle this situation, the PPCODE directive is used and the stack is
|
---|
1393 | extended using the macro:
|
---|
1394 |
|
---|
1395 | EXTEND(SP, num);
|
---|
1396 |
|
---|
1397 | where C<SP> is the macro that represents the local copy of the stack pointer,
|
---|
1398 | and C<num> is the number of elements the stack should be extended by.
|
---|
1399 |
|
---|
1400 | Now that there is room on the stack, values can be pushed on it using C<PUSHs>
|
---|
1401 | macro. The pushed values will often need to be "mortal" (See
|
---|
1402 | L</Reference Counts and Mortality>):
|
---|
1403 |
|
---|
1404 | PUSHs(sv_2mortal(newSViv(an_integer)))
|
---|
1405 | PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
|
---|
1406 | PUSHs(sv_2mortal(newSVnv(a_double)))
|
---|
1407 | PUSHs(sv_2mortal(newSVpv("Some String",0)))
|
---|
1408 |
|
---|
1409 | And now the Perl program calling C<tzname>, the two values will be assigned
|
---|
1410 | as in:
|
---|
1411 |
|
---|
1412 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
|
---|
1413 |
|
---|
1414 | An alternate (and possibly simpler) method to pushing values on the stack is
|
---|
1415 | to use the macro:
|
---|
1416 |
|
---|
1417 | XPUSHs(SV*)
|
---|
1418 |
|
---|
1419 | This macro automatically adjust the stack for you, if needed. Thus, you
|
---|
1420 | do not need to call C<EXTEND> to extend the stack.
|
---|
1421 |
|
---|
1422 | Despite their suggestions in earlier versions of this document the macros
|
---|
1423 | C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results.
|
---|
1424 | For that, either stick to the C<(X)PUSHs> macros shown above, or use the new
|
---|
1425 | C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>.
|
---|
1426 |
|
---|
1427 | For more information, consult L<perlxs> and L<perlxstut>.
|
---|
1428 |
|
---|
1429 | =head2 Calling Perl Routines from within C Programs
|
---|
1430 |
|
---|
1431 | There are four routines that can be used to call a Perl subroutine from
|
---|
1432 | within a C program. These four are:
|
---|
1433 |
|
---|
1434 | I32 call_sv(SV*, I32);
|
---|
1435 | I32 call_pv(const char*, I32);
|
---|
1436 | I32 call_method(const char*, I32);
|
---|
1437 | I32 call_argv(const char*, I32, register char**);
|
---|
1438 |
|
---|
1439 | The routine most often used is C<call_sv>. The C<SV*> argument
|
---|
1440 | contains either the name of the Perl subroutine to be called, or a
|
---|
1441 | reference to the subroutine. The second argument consists of flags
|
---|
1442 | that control the context in which the subroutine is called, whether
|
---|
1443 | or not the subroutine is being passed arguments, how errors should be
|
---|
1444 | trapped, and how to treat return values.
|
---|
1445 |
|
---|
1446 | All four routines return the number of arguments that the subroutine returned
|
---|
1447 | on the Perl stack.
|
---|
1448 |
|
---|
1449 | These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0,
|
---|
1450 | but those names are now deprecated; macros of the same name are provided for
|
---|
1451 | compatibility.
|
---|
1452 |
|
---|
1453 | When using any of these routines (except C<call_argv>), the programmer
|
---|
1454 | must manipulate the Perl stack. These include the following macros and
|
---|
1455 | functions:
|
---|
1456 |
|
---|
1457 | dSP
|
---|
1458 | SP
|
---|
1459 | PUSHMARK()
|
---|
1460 | PUTBACK
|
---|
1461 | SPAGAIN
|
---|
1462 | ENTER
|
---|
1463 | SAVETMPS
|
---|
1464 | FREETMPS
|
---|
1465 | LEAVE
|
---|
1466 | XPUSH*()
|
---|
1467 | POP*()
|
---|
1468 |
|
---|
1469 | For a detailed description of calling conventions from C to Perl,
|
---|
1470 | consult L<perlcall>.
|
---|
1471 |
|
---|
1472 | =head2 Memory Allocation
|
---|
1473 |
|
---|
1474 | =head3 Allocation
|
---|
1475 |
|
---|
1476 | All memory meant to be used with the Perl API functions should be manipulated
|
---|
1477 | using the macros described in this section. The macros provide the necessary
|
---|
1478 | transparency between differences in the actual malloc implementation that is
|
---|
1479 | used within perl.
|
---|
1480 |
|
---|
1481 | It is suggested that you enable the version of malloc that is distributed
|
---|
1482 | with Perl. It keeps pools of various sizes of unallocated memory in
|
---|
1483 | order to satisfy allocation requests more quickly. However, on some
|
---|
1484 | platforms, it may cause spurious malloc or free errors.
|
---|
1485 |
|
---|
1486 | The following three macros are used to initially allocate memory :
|
---|
1487 |
|
---|
1488 | Newx(pointer, number, type);
|
---|
1489 | Newxc(pointer, number, type, cast);
|
---|
1490 | Newxz(pointer, number, type);
|
---|
1491 |
|
---|
1492 | The first argument C<pointer> should be the name of a variable that will
|
---|
1493 | point to the newly allocated memory.
|
---|
1494 |
|
---|
1495 | The second and third arguments C<number> and C<type> specify how many of
|
---|
1496 | the specified type of data structure should be allocated. The argument
|
---|
1497 | C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>,
|
---|
1498 | should be used if the C<pointer> argument is different from the C<type>
|
---|
1499 | argument.
|
---|
1500 |
|
---|
1501 | Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero>
|
---|
1502 | to zero out all the newly allocated memory.
|
---|
1503 |
|
---|
1504 | =head3 Reallocation
|
---|
1505 |
|
---|
1506 | Renew(pointer, number, type);
|
---|
1507 | Renewc(pointer, number, type, cast);
|
---|
1508 | Safefree(pointer)
|
---|
1509 |
|
---|
1510 | These three macros are used to change a memory buffer size or to free a
|
---|
1511 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
|
---|
1512 | match those of C<New> and C<Newc> with the exception of not needing the
|
---|
1513 | "magic cookie" argument.
|
---|
1514 |
|
---|
1515 | =head3 Moving
|
---|
1516 |
|
---|
1517 | Move(source, dest, number, type);
|
---|
1518 | Copy(source, dest, number, type);
|
---|
1519 | Zero(dest, number, type);
|
---|
1520 |
|
---|
1521 | These three macros are used to move, copy, or zero out previously allocated
|
---|
1522 | memory. The C<source> and C<dest> arguments point to the source and
|
---|
1523 | destination starting points. Perl will move, copy, or zero out C<number>
|
---|
1524 | instances of the size of the C<type> data structure (using the C<sizeof>
|
---|
1525 | function).
|
---|
1526 |
|
---|
1527 | =head2 PerlIO
|
---|
1528 |
|
---|
1529 | The most recent development releases of Perl has been experimenting with
|
---|
1530 | removing Perl's dependency on the "normal" standard I/O suite and allowing
|
---|
1531 | other stdio implementations to be used. This involves creating a new
|
---|
1532 | abstraction layer that then calls whichever implementation of stdio Perl
|
---|
1533 | was compiled with. All XSUBs should now use the functions in the PerlIO
|
---|
1534 | abstraction layer and not make any assumptions about what kind of stdio
|
---|
1535 | is being used.
|
---|
1536 |
|
---|
1537 | For a complete description of the PerlIO abstraction, consult L<perlapio>.
|
---|
1538 |
|
---|
1539 | =head2 Putting a C value on Perl stack
|
---|
1540 |
|
---|
1541 | A lot of opcodes (this is an elementary operation in the internal perl
|
---|
1542 | stack machine) put an SV* on the stack. However, as an optimization
|
---|
1543 | the corresponding SV is (usually) not recreated each time. The opcodes
|
---|
1544 | reuse specially assigned SVs (I<target>s) which are (as a corollary)
|
---|
1545 | not constantly freed/created.
|
---|
1546 |
|
---|
1547 | Each of the targets is created only once (but see
|
---|
1548 | L<Scratchpads and recursion> below), and when an opcode needs to put
|
---|
1549 | an integer, a double, or a string on stack, it just sets the
|
---|
1550 | corresponding parts of its I<target> and puts the I<target> on stack.
|
---|
1551 |
|
---|
1552 | The macro to put this target on stack is C<PUSHTARG>, and it is
|
---|
1553 | directly used in some opcodes, as well as indirectly in zillions of
|
---|
1554 | others, which use it via C<(X)PUSH[iunp]>.
|
---|
1555 |
|
---|
1556 | Because the target is reused, you must be careful when pushing multiple
|
---|
1557 | values on the stack. The following code will not do what you think:
|
---|
1558 |
|
---|
1559 | XPUSHi(10);
|
---|
1560 | XPUSHi(20);
|
---|
1561 |
|
---|
1562 | This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
|
---|
1563 | the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
|
---|
1564 | At the end of the operation, the stack does not contain the values 10
|
---|
1565 | and 20, but actually contains two pointers to C<TARG>, which we have set
|
---|
1566 | to 20.
|
---|
1567 |
|
---|
1568 | If you need to push multiple different values then you should either use
|
---|
1569 | the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros,
|
---|
1570 | none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an
|
---|
1571 | SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>,
|
---|
1572 | will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make
|
---|
1573 | this a little easier to achieve by creating a new mortal for you (via
|
---|
1574 | C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary
|
---|
1575 | in the case of the C<mXPUSH[iunp]> macros), and then setting its value.
|
---|
1576 | Thus, instead of writing this to "fix" the example above:
|
---|
1577 |
|
---|
1578 | XPUSHs(sv_2mortal(newSViv(10)))
|
---|
1579 | XPUSHs(sv_2mortal(newSViv(20)))
|
---|
1580 |
|
---|
1581 | you can simply write:
|
---|
1582 |
|
---|
1583 | mXPUSHi(10)
|
---|
1584 | mXPUSHi(20)
|
---|
1585 |
|
---|
1586 | On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to
|
---|
1587 | need a C<dTARG> in your variable declarations so that the C<*PUSH*>
|
---|
1588 | macros can make use of the local variable C<TARG>. See also C<dTARGET>
|
---|
1589 | and C<dXSTARG>.
|
---|
1590 |
|
---|
1591 | =head2 Scratchpads
|
---|
1592 |
|
---|
1593 | The question remains on when the SVs which are I<target>s for opcodes
|
---|
1594 | are created. The answer is that they are created when the current unit --
|
---|
1595 | a subroutine or a file (for opcodes for statements outside of
|
---|
1596 | subroutines) -- is compiled. During this time a special anonymous Perl
|
---|
1597 | array is created, which is called a scratchpad for the current
|
---|
1598 | unit.
|
---|
1599 |
|
---|
1600 | A scratchpad keeps SVs which are lexicals for the current unit and are
|
---|
1601 | targets for opcodes. One can deduce that an SV lives on a scratchpad
|
---|
1602 | by looking on its flags: lexicals have C<SVs_PADMY> set, and
|
---|
1603 | I<target>s have C<SVs_PADTMP> set.
|
---|
1604 |
|
---|
1605 | The correspondence between OPs and I<target>s is not 1-to-1. Different
|
---|
1606 | OPs in the compile tree of the unit can use the same target, if this
|
---|
1607 | would not conflict with the expected life of the temporary.
|
---|
1608 |
|
---|
1609 | =head2 Scratchpads and recursion
|
---|
1610 |
|
---|
1611 | In fact it is not 100% true that a compiled unit contains a pointer to
|
---|
1612 | the scratchpad AV. In fact it contains a pointer to an AV of
|
---|
1613 | (initially) one element, and this element is the scratchpad AV. Why do
|
---|
1614 | we need an extra level of indirection?
|
---|
1615 |
|
---|
1616 | The answer is B<recursion>, and maybe B<threads>. Both
|
---|
1617 | these can create several execution pointers going into the same
|
---|
1618 | subroutine. For the subroutine-child not write over the temporaries
|
---|
1619 | for the subroutine-parent (lifespan of which covers the call to the
|
---|
1620 | child), the parent and the child should have different
|
---|
1621 | scratchpads. (I<And> the lexicals should be separate anyway!)
|
---|
1622 |
|
---|
1623 | So each subroutine is born with an array of scratchpads (of length 1).
|
---|
1624 | On each entry to the subroutine it is checked that the current
|
---|
1625 | depth of the recursion is not more than the length of this array, and
|
---|
1626 | if it is, new scratchpad is created and pushed into the array.
|
---|
1627 |
|
---|
1628 | The I<target>s on this scratchpad are C<undef>s, but they are already
|
---|
1629 | marked with correct flags.
|
---|
1630 |
|
---|
1631 | =head1 Compiled code
|
---|
1632 |
|
---|
1633 | =head2 Code tree
|
---|
1634 |
|
---|
1635 | Here we describe the internal form your code is converted to by
|
---|
1636 | Perl. Start with a simple example:
|
---|
1637 |
|
---|
1638 | $a = $b + $c;
|
---|
1639 |
|
---|
1640 | This is converted to a tree similar to this one:
|
---|
1641 |
|
---|
1642 | assign-to
|
---|
1643 | / \
|
---|
1644 | + $a
|
---|
1645 | / \
|
---|
1646 | $b $c
|
---|
1647 |
|
---|
1648 | (but slightly more complicated). This tree reflects the way Perl
|
---|
1649 | parsed your code, but has nothing to do with the execution order.
|
---|
1650 | There is an additional "thread" going through the nodes of the tree
|
---|
1651 | which shows the order of execution of the nodes. In our simplified
|
---|
1652 | example above it looks like:
|
---|
1653 |
|
---|
1654 | $b ---> $c ---> + ---> $a ---> assign-to
|
---|
1655 |
|
---|
1656 | But with the actual compile tree for C<$a = $b + $c> it is different:
|
---|
1657 | some nodes I<optimized away>. As a corollary, though the actual tree
|
---|
1658 | contains more nodes than our simplified example, the execution order
|
---|
1659 | is the same as in our example.
|
---|
1660 |
|
---|
1661 | =head2 Examining the tree
|
---|
1662 |
|
---|
1663 | If you have your perl compiled for debugging (usually done with
|
---|
1664 | C<-DDEBUGGING> on the C<Configure> command line), you may examine the
|
---|
1665 | compiled tree by specifying C<-Dx> on the Perl command line. The
|
---|
1666 | output takes several lines per node, and for C<$b+$c> it looks like
|
---|
1667 | this:
|
---|
1668 |
|
---|
1669 | 5 TYPE = add ===> 6
|
---|
1670 | TARG = 1
|
---|
1671 | FLAGS = (SCALAR,KIDS)
|
---|
1672 | {
|
---|
1673 | TYPE = null ===> (4)
|
---|
1674 | (was rv2sv)
|
---|
1675 | FLAGS = (SCALAR,KIDS)
|
---|
1676 | {
|
---|
1677 | 3 TYPE = gvsv ===> 4
|
---|
1678 | FLAGS = (SCALAR)
|
---|
1679 | GV = main::b
|
---|
1680 | }
|
---|
1681 | }
|
---|
1682 | {
|
---|
1683 | TYPE = null ===> (5)
|
---|
1684 | (was rv2sv)
|
---|
1685 | FLAGS = (SCALAR,KIDS)
|
---|
1686 | {
|
---|
1687 | 4 TYPE = gvsv ===> 5
|
---|
1688 | FLAGS = (SCALAR)
|
---|
1689 | GV = main::c
|
---|
1690 | }
|
---|
1691 | }
|
---|
1692 |
|
---|
1693 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
|
---|
1694 | not optimized away (one per number in the left column). The immediate
|
---|
1695 | children of the given node correspond to C<{}> pairs on the same level
|
---|
1696 | of indentation, thus this listing corresponds to the tree:
|
---|
1697 |
|
---|
1698 | add
|
---|
1699 | / \
|
---|
1700 | null null
|
---|
1701 | | |
|
---|
1702 | gvsv gvsv
|
---|
1703 |
|
---|
1704 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3
|
---|
1705 | 4 5 6> (node C<6> is not included into above listing), i.e.,
|
---|
1706 | C<gvsv gvsv add whatever>.
|
---|
1707 |
|
---|
1708 | Each of these nodes represents an op, a fundamental operation inside the
|
---|
1709 | Perl core. The code which implements each operation can be found in the
|
---|
1710 | F<pp*.c> files; the function which implements the op with type C<gvsv>
|
---|
1711 | is C<pp_gvsv>, and so on. As the tree above shows, different ops have
|
---|
1712 | different numbers of children: C<add> is a binary operator, as one would
|
---|
1713 | expect, and so has two children. To accommodate the various different
|
---|
1714 | numbers of children, there are various types of op data structure, and
|
---|
1715 | they link together in different ways.
|
---|
1716 |
|
---|
1717 | The simplest type of op structure is C<OP>: this has no children. Unary
|
---|
1718 | operators, C<UNOP>s, have one child, and this is pointed to by the
|
---|
1719 | C<op_first> field. Binary operators (C<BINOP>s) have not only an
|
---|
1720 | C<op_first> field but also an C<op_last> field. The most complex type of
|
---|
1721 | op is a C<LISTOP>, which has any number of children. In this case, the
|
---|
1722 | first child is pointed to by C<op_first> and the last child by
|
---|
1723 | C<op_last>. The children in between can be found by iteratively
|
---|
1724 | following the C<op_sibling> pointer from the first child to the last.
|
---|
1725 |
|
---|
1726 | There are also two other op types: a C<PMOP> holds a regular expression,
|
---|
1727 | and has no children, and a C<LOOP> may or may not have children. If the
|
---|
1728 | C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
|
---|
1729 | complicate matters, if a C<UNOP> is actually a C<null> op after
|
---|
1730 | optimization (see L</Compile pass 2: context propagation>) it will still
|
---|
1731 | have children in accordance with its former type.
|
---|
1732 |
|
---|
1733 | Another way to examine the tree is to use a compiler back-end module, such
|
---|
1734 | as L<B::Concise>.
|
---|
1735 |
|
---|
1736 | =head2 Compile pass 1: check routines
|
---|
1737 |
|
---|
1738 | The tree is created by the compiler while I<yacc> code feeds it
|
---|
1739 | the constructions it recognizes. Since I<yacc> works bottom-up, so does
|
---|
1740 | the first pass of perl compilation.
|
---|
1741 |
|
---|
1742 | What makes this pass interesting for perl developers is that some
|
---|
1743 | optimization may be performed on this pass. This is optimization by
|
---|
1744 | so-called "check routines". The correspondence between node names
|
---|
1745 | and corresponding check routines is described in F<opcode.pl> (do not
|
---|
1746 | forget to run C<make regen_headers> if you modify this file).
|
---|
1747 |
|
---|
1748 | A check routine is called when the node is fully constructed except
|
---|
1749 | for the execution-order thread. Since at this time there are no
|
---|
1750 | back-links to the currently constructed node, one can do most any
|
---|
1751 | operation to the top-level node, including freeing it and/or creating
|
---|
1752 | new nodes above/below it.
|
---|
1753 |
|
---|
1754 | The check routine returns the node which should be inserted into the
|
---|
1755 | tree (if the top-level node was not modified, check routine returns
|
---|
1756 | its argument).
|
---|
1757 |
|
---|
1758 | By convention, check routines have names C<ck_*>. They are usually
|
---|
1759 | called from C<new*OP> subroutines (or C<convert>) (which in turn are
|
---|
1760 | called from F<perly.y>).
|
---|
1761 |
|
---|
1762 | =head2 Compile pass 1a: constant folding
|
---|
1763 |
|
---|
1764 | Immediately after the check routine is called the returned node is
|
---|
1765 | checked for being compile-time executable. If it is (the value is
|
---|
1766 | judged to be constant) it is immediately executed, and a I<constant>
|
---|
1767 | node with the "return value" of the corresponding subtree is
|
---|
1768 | substituted instead. The subtree is deleted.
|
---|
1769 |
|
---|
1770 | If constant folding was not performed, the execution-order thread is
|
---|
1771 | created.
|
---|
1772 |
|
---|
1773 | =head2 Compile pass 2: context propagation
|
---|
1774 |
|
---|
1775 | When a context for a part of compile tree is known, it is propagated
|
---|
1776 | down through the tree. At this time the context can have 5 values
|
---|
1777 | (instead of 2 for runtime context): void, boolean, scalar, list, and
|
---|
1778 | lvalue. In contrast with the pass 1 this pass is processed from top
|
---|
1779 | to bottom: a node's context determines the context for its children.
|
---|
1780 |
|
---|
1781 | Additional context-dependent optimizations are performed at this time.
|
---|
1782 | Since at this moment the compile tree contains back-references (via
|
---|
1783 | "thread" pointers), nodes cannot be free()d now. To allow
|
---|
1784 | optimized-away nodes at this stage, such nodes are null()ified instead
|
---|
1785 | of free()ing (i.e. their type is changed to OP_NULL).
|
---|
1786 |
|
---|
1787 | =head2 Compile pass 3: peephole optimization
|
---|
1788 |
|
---|
1789 | After the compile tree for a subroutine (or for an C<eval> or a file)
|
---|
1790 | is created, an additional pass over the code is performed. This pass
|
---|
1791 | is neither top-down or bottom-up, but in the execution order (with
|
---|
1792 | additional complications for conditionals). These optimizations are
|
---|
1793 | done in the subroutine peep(). Optimizations performed at this stage
|
---|
1794 | are subject to the same restrictions as in the pass 2.
|
---|
1795 |
|
---|
1796 | =head2 Pluggable runops
|
---|
1797 |
|
---|
1798 | The compile tree is executed in a runops function. There are two runops
|
---|
1799 | functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used
|
---|
1800 | with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine
|
---|
1801 | control over the execution of the compile tree it is possible to provide
|
---|
1802 | your own runops function.
|
---|
1803 |
|
---|
1804 | It's probably best to copy one of the existing runops functions and
|
---|
1805 | change it to suit your needs. Then, in the BOOT section of your XS
|
---|
1806 | file, add the line:
|
---|
1807 |
|
---|
1808 | PL_runops = my_runops;
|
---|
1809 |
|
---|
1810 | This function should be as efficient as possible to keep your programs
|
---|
1811 | running as fast as possible.
|
---|
1812 |
|
---|
1813 | =head1 Examining internal data structures with the C<dump> functions
|
---|
1814 |
|
---|
1815 | To aid debugging, the source file F<dump.c> contains a number of
|
---|
1816 | functions which produce formatted output of internal data structures.
|
---|
1817 |
|
---|
1818 | The most commonly used of these functions is C<Perl_sv_dump>; it's used
|
---|
1819 | for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
|
---|
1820 | C<sv_dump> to produce debugging output from Perl-space, so users of that
|
---|
1821 | module should already be familiar with its format.
|
---|
1822 |
|
---|
1823 | C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
|
---|
1824 | derivatives, and produces output similar to C<perl -Dx>; in fact,
|
---|
1825 | C<Perl_dump_eval> will dump the main root of the code being evaluated,
|
---|
1826 | exactly like C<-Dx>.
|
---|
1827 |
|
---|
1828 | Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
|
---|
1829 | op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
|
---|
1830 | subroutines in a package like so: (Thankfully, these are all xsubs, so
|
---|
1831 | there is no op tree)
|
---|
1832 |
|
---|
1833 | (gdb) print Perl_dump_packsubs(PL_defstash)
|
---|
1834 |
|
---|
1835 | SUB attributes::bootstrap = (xsub 0x811fedc 0)
|
---|
1836 |
|
---|
1837 | SUB UNIVERSAL::can = (xsub 0x811f50c 0)
|
---|
1838 |
|
---|
1839 | SUB UNIVERSAL::isa = (xsub 0x811f304 0)
|
---|
1840 |
|
---|
1841 | SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
|
---|
1842 |
|
---|
1843 | SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
|
---|
1844 |
|
---|
1845 | and C<Perl_dump_all>, which dumps all the subroutines in the stash and
|
---|
1846 | the op tree of the main root.
|
---|
1847 |
|
---|
1848 | =head1 How multiple interpreters and concurrency are supported
|
---|
1849 |
|
---|
1850 | =head2 Background and PERL_IMPLICIT_CONTEXT
|
---|
1851 |
|
---|
1852 | The Perl interpreter can be regarded as a closed box: it has an API
|
---|
1853 | for feeding it code or otherwise making it do things, but it also has
|
---|
1854 | functions for its own use. This smells a lot like an object, and
|
---|
1855 | there are ways for you to build Perl so that you can have multiple
|
---|
1856 | interpreters, with one interpreter represented either as a C structure,
|
---|
1857 | or inside a thread-specific structure. These structures contain all
|
---|
1858 | the context, the state of that interpreter.
|
---|
1859 |
|
---|
1860 | Two macros control the major Perl build flavors: MULTIPLICITY and
|
---|
1861 | USE_5005THREADS. The MULTIPLICITY build has a C structure
|
---|
1862 | that packages all the interpreter state, and there is a similar thread-specific
|
---|
1863 | data structure under USE_5005THREADS. In both cases,
|
---|
1864 | PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
|
---|
1865 | support for passing in a "hidden" first argument that represents all three
|
---|
1866 | data structures.
|
---|
1867 |
|
---|
1868 | All this obviously requires a way for the Perl internal functions to be
|
---|
1869 | either subroutines taking some kind of structure as the first
|
---|
1870 | argument, or subroutines taking nothing as the first argument. To
|
---|
1871 | enable these two very different ways of building the interpreter,
|
---|
1872 | the Perl source (as it does in so many other situations) makes heavy
|
---|
1873 | use of macros and subroutine naming conventions.
|
---|
1874 |
|
---|
1875 | First problem: deciding which functions will be public API functions and
|
---|
1876 | which will be private. All functions whose names begin C<S_> are private
|
---|
1877 | (think "S" for "secret" or "static"). All other functions begin with
|
---|
1878 | "Perl_", but just because a function begins with "Perl_" does not mean it is
|
---|
1879 | part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
|
---|
1880 | function is part of the API is to find its entry in L<perlapi>.
|
---|
1881 | If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
|
---|
1882 | think it should be (i.e., you need it for your extension), send mail via
|
---|
1883 | L<perlbug> explaining why you think it should be.
|
---|
1884 |
|
---|
1885 | Second problem: there must be a syntax so that the same subroutine
|
---|
1886 | declarations and calls can pass a structure as their first argument,
|
---|
1887 | or pass nothing. To solve this, the subroutines are named and
|
---|
1888 | declared in a particular way. Here's a typical start of a static
|
---|
1889 | function used within the Perl guts:
|
---|
1890 |
|
---|
1891 | STATIC void
|
---|
1892 | S_incline(pTHX_ char *s)
|
---|
1893 |
|
---|
1894 | STATIC becomes "static" in C, and may be #define'd to nothing in some
|
---|
1895 | configurations in future.
|
---|
1896 |
|
---|
1897 | A public function (i.e. part of the internal API, but not necessarily
|
---|
1898 | sanctioned for use in extensions) begins like this:
|
---|
1899 |
|
---|
1900 | void
|
---|
1901 | Perl_sv_setiv(pTHX_ SV* dsv, IV num)
|
---|
1902 |
|
---|
1903 | C<pTHX_> is one of a number of macros (in perl.h) that hide the
|
---|
1904 | details of the interpreter's context. THX stands for "thread", "this",
|
---|
1905 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
|
---|
1906 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
|
---|
1907 | or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
|
---|
1908 | their variants.
|
---|
1909 |
|
---|
1910 | When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
|
---|
1911 | first argument containing the interpreter's context. The trailing underscore
|
---|
1912 | in the pTHX_ macro indicates that the macro expansion needs a comma
|
---|
1913 | after the context argument because other arguments follow it. If
|
---|
1914 | PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
|
---|
1915 | subroutine is not prototyped to take the extra argument. The form of the
|
---|
1916 | macro without the trailing underscore is used when there are no additional
|
---|
1917 | explicit arguments.
|
---|
1918 |
|
---|
1919 | When a core function calls another, it must pass the context. This
|
---|
1920 | is normally hidden via macros. Consider C<sv_setiv>. It expands into
|
---|
1921 | something like this:
|
---|
1922 |
|
---|
1923 | #ifdef PERL_IMPLICIT_CONTEXT
|
---|
1924 | #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
|
---|
1925 | /* can't do this for vararg functions, see below */
|
---|
1926 | #else
|
---|
1927 | #define sv_setiv Perl_sv_setiv
|
---|
1928 | #endif
|
---|
1929 |
|
---|
1930 | This works well, and means that XS authors can gleefully write:
|
---|
1931 |
|
---|
1932 | sv_setiv(foo, bar);
|
---|
1933 |
|
---|
1934 | and still have it work under all the modes Perl could have been
|
---|
1935 | compiled with.
|
---|
1936 |
|
---|
1937 | This doesn't work so cleanly for varargs functions, though, as macros
|
---|
1938 | imply that the number of arguments is known in advance. Instead we
|
---|
1939 | either need to spell them out fully, passing C<aTHX_> as the first
|
---|
1940 | argument (the Perl core tends to do this with functions like
|
---|
1941 | Perl_warner), or use a context-free version.
|
---|
1942 |
|
---|
1943 | The context-free version of Perl_warner is called
|
---|
1944 | Perl_warner_nocontext, and does not take the extra argument. Instead
|
---|
1945 | it does dTHX; to get the context from thread-local storage. We
|
---|
1946 | C<#define warner Perl_warner_nocontext> so that extensions get source
|
---|
1947 | compatibility at the expense of performance. (Passing an arg is
|
---|
1948 | cheaper than grabbing it from thread-local storage.)
|
---|
1949 |
|
---|
1950 | You can ignore [pad]THXx when browsing the Perl headers/sources.
|
---|
1951 | Those are strictly for use within the core. Extensions and embedders
|
---|
1952 | need only be aware of [pad]THX.
|
---|
1953 |
|
---|
1954 | =head2 So what happened to dTHR?
|
---|
1955 |
|
---|
1956 | C<dTHR> was introduced in perl 5.005 to support the older thread model.
|
---|
1957 | The older thread model now uses the C<THX> mechanism to pass context
|
---|
1958 | pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and
|
---|
1959 | later still have it for backward source compatibility, but it is defined
|
---|
1960 | to be a no-op.
|
---|
1961 |
|
---|
1962 | =head2 How do I use all this in extensions?
|
---|
1963 |
|
---|
1964 | When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
|
---|
1965 | any functions in the Perl API will need to pass the initial context
|
---|
1966 | argument somehow. The kicker is that you will need to write it in
|
---|
1967 | such a way that the extension still compiles when Perl hasn't been
|
---|
1968 | built with PERL_IMPLICIT_CONTEXT enabled.
|
---|
1969 |
|
---|
1970 | There are three ways to do this. First, the easy but inefficient way,
|
---|
1971 | which is also the default, in order to maintain source compatibility
|
---|
1972 | with extensions: whenever XSUB.h is #included, it redefines the aTHX
|
---|
1973 | and aTHX_ macros to call a function that will return the context.
|
---|
1974 | Thus, something like:
|
---|
1975 |
|
---|
1976 | sv_setiv(sv, num);
|
---|
1977 |
|
---|
1978 | in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
|
---|
1979 | in effect:
|
---|
1980 |
|
---|
1981 | Perl_sv_setiv(Perl_get_context(), sv, num);
|
---|
1982 |
|
---|
1983 | or to this otherwise:
|
---|
1984 |
|
---|
1985 | Perl_sv_setiv(sv, num);
|
---|
1986 |
|
---|
1987 | You have to do nothing new in your extension to get this; since
|
---|
1988 | the Perl library provides Perl_get_context(), it will all just
|
---|
1989 | work.
|
---|
1990 |
|
---|
1991 | The second, more efficient way is to use the following template for
|
---|
1992 | your Foo.xs:
|
---|
1993 |
|
---|
1994 | #define PERL_NO_GET_CONTEXT /* we want efficiency */
|
---|
1995 | #include "EXTERN.h"
|
---|
1996 | #include "perl.h"
|
---|
1997 | #include "XSUB.h"
|
---|
1998 |
|
---|
1999 | static my_private_function(int arg1, int arg2);
|
---|
2000 |
|
---|
2001 | static SV *
|
---|
2002 | my_private_function(int arg1, int arg2)
|
---|
2003 | {
|
---|
2004 | dTHX; /* fetch context */
|
---|
2005 | ... call many Perl API functions ...
|
---|
2006 | }
|
---|
2007 |
|
---|
2008 | [... etc ...]
|
---|
2009 |
|
---|
2010 | MODULE = Foo PACKAGE = Foo
|
---|
2011 |
|
---|
2012 | /* typical XSUB */
|
---|
2013 |
|
---|
2014 | void
|
---|
2015 | my_xsub(arg)
|
---|
2016 | int arg
|
---|
2017 | CODE:
|
---|
2018 | my_private_function(arg, 10);
|
---|
2019 |
|
---|
2020 | Note that the only two changes from the normal way of writing an
|
---|
2021 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
|
---|
2022 | including the Perl headers, followed by a C<dTHX;> declaration at
|
---|
2023 | the start of every function that will call the Perl API. (You'll
|
---|
2024 | know which functions need this, because the C compiler will complain
|
---|
2025 | that there's an undeclared identifier in those functions.) No changes
|
---|
2026 | are needed for the XSUBs themselves, because the XS() macro is
|
---|
2027 | correctly defined to pass in the implicit context if needed.
|
---|
2028 |
|
---|
2029 | The third, even more efficient way is to ape how it is done within
|
---|
2030 | the Perl guts:
|
---|
2031 |
|
---|
2032 |
|
---|
2033 | #define PERL_NO_GET_CONTEXT /* we want efficiency */
|
---|
2034 | #include "EXTERN.h"
|
---|
2035 | #include "perl.h"
|
---|
2036 | #include "XSUB.h"
|
---|
2037 |
|
---|
2038 | /* pTHX_ only needed for functions that call Perl API */
|
---|
2039 | static my_private_function(pTHX_ int arg1, int arg2);
|
---|
2040 |
|
---|
2041 | static SV *
|
---|
2042 | my_private_function(pTHX_ int arg1, int arg2)
|
---|
2043 | {
|
---|
2044 | /* dTHX; not needed here, because THX is an argument */
|
---|
2045 | ... call Perl API functions ...
|
---|
2046 | }
|
---|
2047 |
|
---|
2048 | [... etc ...]
|
---|
2049 |
|
---|
2050 | MODULE = Foo PACKAGE = Foo
|
---|
2051 |
|
---|
2052 | /* typical XSUB */
|
---|
2053 |
|
---|
2054 | void
|
---|
2055 | my_xsub(arg)
|
---|
2056 | int arg
|
---|
2057 | CODE:
|
---|
2058 | my_private_function(aTHX_ arg, 10);
|
---|
2059 |
|
---|
2060 | This implementation never has to fetch the context using a function
|
---|
2061 | call, since it is always passed as an extra argument. Depending on
|
---|
2062 | your needs for simplicity or efficiency, you may mix the previous
|
---|
2063 | two approaches freely.
|
---|
2064 |
|
---|
2065 | Never add a comma after C<pTHX> yourself--always use the form of the
|
---|
2066 | macro with the underscore for functions that take explicit arguments,
|
---|
2067 | or the form without the argument for functions with no explicit arguments.
|
---|
2068 |
|
---|
2069 | =head2 Should I do anything special if I call perl from multiple threads?
|
---|
2070 |
|
---|
2071 | If you create interpreters in one thread and then proceed to call them in
|
---|
2072 | another, you need to make sure perl's own Thread Local Storage (TLS) slot is
|
---|
2073 | initialized correctly in each of those threads.
|
---|
2074 |
|
---|
2075 | The C<perl_alloc> and C<perl_clone> API functions will automatically set
|
---|
2076 | the TLS slot to the interpreter they created, so that there is no need to do
|
---|
2077 | anything special if the interpreter is always accessed in the same thread that
|
---|
2078 | created it, and that thread did not create or call any other interpreters
|
---|
2079 | afterwards. If that is not the case, you have to set the TLS slot of the
|
---|
2080 | thread before calling any functions in the Perl API on that particular
|
---|
2081 | interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that
|
---|
2082 | thread as the first thing you do:
|
---|
2083 |
|
---|
2084 | /* do this before doing anything else with some_perl */
|
---|
2085 | PERL_SET_CONTEXT(some_perl);
|
---|
2086 |
|
---|
2087 | ... other Perl API calls on some_perl go here ...
|
---|
2088 |
|
---|
2089 | =head2 Future Plans and PERL_IMPLICIT_SYS
|
---|
2090 |
|
---|
2091 | Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
|
---|
2092 | that the interpreter knows about itself and pass it around, so too are
|
---|
2093 | there plans to allow the interpreter to bundle up everything it knows
|
---|
2094 | about the environment it's running on. This is enabled with the
|
---|
2095 | PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS
|
---|
2096 | and USE_5005THREADS on Windows (see inside iperlsys.h).
|
---|
2097 |
|
---|
2098 | This allows the ability to provide an extra pointer (called the "host"
|
---|
2099 | environment) for all the system calls. This makes it possible for
|
---|
2100 | all the system stuff to maintain their own state, broken down into
|
---|
2101 | seven C structures. These are thin wrappers around the usual system
|
---|
2102 | calls (see win32/perllib.c) for the default perl executable, but for a
|
---|
2103 | more ambitious host (like the one that would do fork() emulation) all
|
---|
2104 | the extra work needed to pretend that different interpreters are
|
---|
2105 | actually different "processes", would be done here.
|
---|
2106 |
|
---|
2107 | The Perl engine/interpreter and the host are orthogonal entities.
|
---|
2108 | There could be one or more interpreters in a process, and one or
|
---|
2109 | more "hosts", with free association between them.
|
---|
2110 |
|
---|
2111 | =head1 Internal Functions
|
---|
2112 |
|
---|
2113 | All of Perl's internal functions which will be exposed to the outside
|
---|
2114 | world are prefixed by C<Perl_> so that they will not conflict with XS
|
---|
2115 | functions or functions used in a program in which Perl is embedded.
|
---|
2116 | Similarly, all global variables begin with C<PL_>. (By convention,
|
---|
2117 | static functions start with C<S_>.)
|
---|
2118 |
|
---|
2119 | Inside the Perl core, you can get at the functions either with or
|
---|
2120 | without the C<Perl_> prefix, thanks to a bunch of defines that live in
|
---|
2121 | F<embed.h>. This header file is generated automatically from
|
---|
2122 | F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
|
---|
2123 | header files for the internal functions, generates the documentation
|
---|
2124 | and a lot of other bits and pieces. It's important that when you add
|
---|
2125 | a new function to the core or change an existing one, you change the
|
---|
2126 | data in the table in F<embed.fnc> as well. Here's a sample entry from
|
---|
2127 | that table:
|
---|
2128 |
|
---|
2129 | Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
|
---|
2130 |
|
---|
2131 | The second column is the return type, the third column the name. Columns
|
---|
2132 | after that are the arguments. The first column is a set of flags:
|
---|
2133 |
|
---|
2134 | =over 3
|
---|
2135 |
|
---|
2136 | =item A
|
---|
2137 |
|
---|
2138 | This function is a part of the public API. All such functions should also
|
---|
2139 | have 'd', very few do not.
|
---|
2140 |
|
---|
2141 | =item p
|
---|
2142 |
|
---|
2143 | This function has a C<Perl_> prefix; i.e. it is defined as
|
---|
2144 | C<Perl_av_fetch>.
|
---|
2145 |
|
---|
2146 | =item d
|
---|
2147 |
|
---|
2148 | This function has documentation using the C<apidoc> feature which we'll
|
---|
2149 | look at in a second. Some functions have 'd' but not 'A'; docs are good.
|
---|
2150 |
|
---|
2151 | =back
|
---|
2152 |
|
---|
2153 | Other available flags are:
|
---|
2154 |
|
---|
2155 | =over 3
|
---|
2156 |
|
---|
2157 | =item s
|
---|
2158 |
|
---|
2159 | This is a static function and is defined as C<STATIC S_whatever>, and
|
---|
2160 | usually called within the sources as C<whatever(...)>.
|
---|
2161 |
|
---|
2162 | =item n
|
---|
2163 |
|
---|
2164 | This does not need a interpreter context, so the definition has no
|
---|
2165 | C<pTHX>, and it follows that callers don't use C<aTHX>. (See
|
---|
2166 | L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
|
---|
2167 |
|
---|
2168 | =item r
|
---|
2169 |
|
---|
2170 | This function never returns; C<croak>, C<exit> and friends.
|
---|
2171 |
|
---|
2172 | =item f
|
---|
2173 |
|
---|
2174 | This function takes a variable number of arguments, C<printf> style.
|
---|
2175 | The argument list should end with C<...>, like this:
|
---|
2176 |
|
---|
2177 | Afprd |void |croak |const char* pat|...
|
---|
2178 |
|
---|
2179 | =item M
|
---|
2180 |
|
---|
2181 | This function is part of the experimental development API, and may change
|
---|
2182 | or disappear without notice.
|
---|
2183 |
|
---|
2184 | =item o
|
---|
2185 |
|
---|
2186 | This function should not have a compatibility macro to define, say,
|
---|
2187 | C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
|
---|
2188 |
|
---|
2189 | =item x
|
---|
2190 |
|
---|
2191 | This function isn't exported out of the Perl core.
|
---|
2192 |
|
---|
2193 | =item m
|
---|
2194 |
|
---|
2195 | This is implemented as a macro.
|
---|
2196 |
|
---|
2197 | =item X
|
---|
2198 |
|
---|
2199 | This function is explicitly exported.
|
---|
2200 |
|
---|
2201 | =item E
|
---|
2202 |
|
---|
2203 | This function is visible to extensions included in the Perl core.
|
---|
2204 |
|
---|
2205 | =item b
|
---|
2206 |
|
---|
2207 | Binary backward compatibility; this function is a macro but also has
|
---|
2208 | a C<Perl_> implementation (which is exported).
|
---|
2209 |
|
---|
2210 | =item others
|
---|
2211 |
|
---|
2212 | See the comments at the top of C<embed.fnc> for others.
|
---|
2213 |
|
---|
2214 | =back
|
---|
2215 |
|
---|
2216 | If you edit F<embed.pl> or F<embed.fnc>, you will need to run
|
---|
2217 | C<make regen_headers> to force a rebuild of F<embed.h> and other
|
---|
2218 | auto-generated files.
|
---|
2219 |
|
---|
2220 | =head2 Formatted Printing of IVs, UVs, and NVs
|
---|
2221 |
|
---|
2222 | If you are printing IVs, UVs, or NVS instead of the stdio(3) style
|
---|
2223 | formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
|
---|
2224 | following macros for portability
|
---|
2225 |
|
---|
2226 | IVdf IV in decimal
|
---|
2227 | UVuf UV in decimal
|
---|
2228 | UVof UV in octal
|
---|
2229 | UVxf UV in hexadecimal
|
---|
2230 | NVef NV %e-like
|
---|
2231 | NVff NV %f-like
|
---|
2232 | NVgf NV %g-like
|
---|
2233 |
|
---|
2234 | These will take care of 64-bit integers and long doubles.
|
---|
2235 | For example:
|
---|
2236 |
|
---|
2237 | printf("IV is %"IVdf"\n", iv);
|
---|
2238 |
|
---|
2239 | The IVdf will expand to whatever is the correct format for the IVs.
|
---|
2240 |
|
---|
2241 | If you are printing addresses of pointers, use UVxf combined
|
---|
2242 | with PTR2UV(), do not use %lx or %p.
|
---|
2243 |
|
---|
2244 | =head2 Pointer-To-Integer and Integer-To-Pointer
|
---|
2245 |
|
---|
2246 | Because pointer size does not necessarily equal integer size,
|
---|
2247 | use the follow macros to do it right.
|
---|
2248 |
|
---|
2249 | PTR2UV(pointer)
|
---|
2250 | PTR2IV(pointer)
|
---|
2251 | PTR2NV(pointer)
|
---|
2252 | INT2PTR(pointertotype, integer)
|
---|
2253 |
|
---|
2254 | For example:
|
---|
2255 |
|
---|
2256 | IV iv = ...;
|
---|
2257 | SV *sv = INT2PTR(SV*, iv);
|
---|
2258 |
|
---|
2259 | and
|
---|
2260 |
|
---|
2261 | AV *av = ...;
|
---|
2262 | UV uv = PTR2UV(av);
|
---|
2263 |
|
---|
2264 | =head2 Source Documentation
|
---|
2265 |
|
---|
2266 | There's an effort going on to document the internal functions and
|
---|
2267 | automatically produce reference manuals from them - L<perlapi> is one
|
---|
2268 | such manual which details all the functions which are available to XS
|
---|
2269 | writers. L<perlintern> is the autogenerated manual for the functions
|
---|
2270 | which are not part of the API and are supposedly for internal use only.
|
---|
2271 |
|
---|
2272 | Source documentation is created by putting POD comments into the C
|
---|
2273 | source, like this:
|
---|
2274 |
|
---|
2275 | /*
|
---|
2276 | =for apidoc sv_setiv
|
---|
2277 |
|
---|
2278 | Copies an integer into the given SV. Does not handle 'set' magic. See
|
---|
2279 | C<sv_setiv_mg>.
|
---|
2280 |
|
---|
2281 | =cut
|
---|
2282 | */
|
---|
2283 |
|
---|
2284 | Please try and supply some documentation if you add functions to the
|
---|
2285 | Perl core.
|
---|
2286 |
|
---|
2287 | =head2 Backwards compatibility
|
---|
2288 |
|
---|
2289 | The Perl API changes over time. New functions are added or the interfaces
|
---|
2290 | of existing functions are changed. The C<Devel::PPPort> module tries to
|
---|
2291 | provide compatibility code for some of these changes, so XS writers don't
|
---|
2292 | have to code it themselves when supporting multiple versions of Perl.
|
---|
2293 |
|
---|
2294 | C<Devel::PPPort> generates a C header file F<ppport.h> that can also
|
---|
2295 | be run as a Perl script. To generate F<ppport.h>, run:
|
---|
2296 |
|
---|
2297 | perl -MDevel::PPPort -eDevel::PPPort::WriteFile
|
---|
2298 |
|
---|
2299 | Besides checking existing XS code, the script can also be used to retrieve
|
---|
2300 | compatibility information for various API calls using the C<--api-info>
|
---|
2301 | command line switch. For example:
|
---|
2302 |
|
---|
2303 | % perl ppport.h --api-info=sv_magicext
|
---|
2304 |
|
---|
2305 | For details, see C<perldoc ppport.h>.
|
---|
2306 |
|
---|
2307 | =head1 Unicode Support
|
---|
2308 |
|
---|
2309 | Perl 5.6.0 introduced Unicode support. It's important for porters and XS
|
---|
2310 | writers to understand this support and make sure that the code they
|
---|
2311 | write does not corrupt Unicode data.
|
---|
2312 |
|
---|
2313 | =head2 What B<is> Unicode, anyway?
|
---|
2314 |
|
---|
2315 | In the olden, less enlightened times, we all used to use ASCII. Most of
|
---|
2316 | us did, anyway. The big problem with ASCII is that it's American. Well,
|
---|
2317 | no, that's not actually the problem; the problem is that it's not
|
---|
2318 | particularly useful for people who don't use the Roman alphabet. What
|
---|
2319 | used to happen was that particular languages would stick their own
|
---|
2320 | alphabet in the upper range of the sequence, between 128 and 255. Of
|
---|
2321 | course, we then ended up with plenty of variants that weren't quite
|
---|
2322 | ASCII, and the whole point of it being a standard was lost.
|
---|
2323 |
|
---|
2324 | Worse still, if you've got a language like Chinese or
|
---|
2325 | Japanese that has hundreds or thousands of characters, then you really
|
---|
2326 | can't fit them into a mere 256, so they had to forget about ASCII
|
---|
2327 | altogether, and build their own systems using pairs of numbers to refer
|
---|
2328 | to one character.
|
---|
2329 |
|
---|
2330 | To fix this, some people formed Unicode, Inc. and
|
---|
2331 | produced a new character set containing all the characters you can
|
---|
2332 | possibly think of and more. There are several ways of representing these
|
---|
2333 | characters, and the one Perl uses is called UTF-8. UTF-8 uses
|
---|
2334 | a variable number of bytes to represent a character, instead of just
|
---|
2335 | one. You can learn more about Unicode at http://www.unicode.org/
|
---|
2336 |
|
---|
2337 | =head2 How can I recognise a UTF-8 string?
|
---|
2338 |
|
---|
2339 | You can't. This is because UTF-8 data is stored in bytes just like
|
---|
2340 | non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
|
---|
2341 | capital E with a grave accent, is represented by the two bytes
|
---|
2342 | C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
|
---|
2343 | has that byte sequence as well. So you can't tell just by looking - this
|
---|
2344 | is what makes Unicode input an interesting problem.
|
---|
2345 |
|
---|
2346 | The API function C<is_utf8_string> can help; it'll tell you if a string
|
---|
2347 | contains only valid UTF-8 characters. However, it can't do the work for
|
---|
2348 | you. On a character-by-character basis, C<is_utf8_char> will tell you
|
---|
2349 | whether the current character in a string is valid UTF-8.
|
---|
2350 |
|
---|
2351 | =head2 How does UTF-8 represent Unicode characters?
|
---|
2352 |
|
---|
2353 | As mentioned above, UTF-8 uses a variable number of bytes to store a
|
---|
2354 | character. Characters with values 1...128 are stored in one byte, just
|
---|
2355 | like good ol' ASCII. Character 129 is stored as C<v194.129>; this
|
---|
2356 | continues up to character 191, which is C<v194.191>. Now we've run out of
|
---|
2357 | bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
|
---|
2358 | so it goes on, moving to three bytes at character 2048.
|
---|
2359 |
|
---|
2360 | Assuming you know you're dealing with a UTF-8 string, you can find out
|
---|
2361 | how long the first character in it is with the C<UTF8SKIP> macro:
|
---|
2362 |
|
---|
2363 | char *utf = "\305\233\340\240\201";
|
---|
2364 | I32 len;
|
---|
2365 |
|
---|
2366 | len = UTF8SKIP(utf); /* len is 2 here */
|
---|
2367 | utf += len;
|
---|
2368 | len = UTF8SKIP(utf); /* len is 3 here */
|
---|
2369 |
|
---|
2370 | Another way to skip over characters in a UTF-8 string is to use
|
---|
2371 | C<utf8_hop>, which takes a string and a number of characters to skip
|
---|
2372 | over. You're on your own about bounds checking, though, so don't use it
|
---|
2373 | lightly.
|
---|
2374 |
|
---|
2375 | All bytes in a multi-byte UTF-8 character will have the high bit set,
|
---|
2376 | so you can test if you need to do something special with this
|
---|
2377 | character like this (the UTF8_IS_INVARIANT() is a macro that tests
|
---|
2378 | whether the byte can be encoded as a single byte even in UTF-8):
|
---|
2379 |
|
---|
2380 | U8 *utf;
|
---|
2381 | UV uv; /* Note: a UV, not a U8, not a char */
|
---|
2382 |
|
---|
2383 | if (!UTF8_IS_INVARIANT(*utf))
|
---|
2384 | /* Must treat this as UTF-8 */
|
---|
2385 | uv = utf8_to_uv(utf);
|
---|
2386 | else
|
---|
2387 | /* OK to treat this character as a byte */
|
---|
2388 | uv = *utf;
|
---|
2389 |
|
---|
2390 | You can also see in that example that we use C<utf8_to_uv> to get the
|
---|
2391 | value of the character; the inverse function C<uv_to_utf8> is available
|
---|
2392 | for putting a UV into UTF-8:
|
---|
2393 |
|
---|
2394 | if (!UTF8_IS_INVARIANT(uv))
|
---|
2395 | /* Must treat this as UTF8 */
|
---|
2396 | utf8 = uv_to_utf8(utf8, uv);
|
---|
2397 | else
|
---|
2398 | /* OK to treat this character as a byte */
|
---|
2399 | *utf8++ = uv;
|
---|
2400 |
|
---|
2401 | You B<must> convert characters to UVs using the above functions if
|
---|
2402 | you're ever in a situation where you have to match UTF-8 and non-UTF-8
|
---|
2403 | characters. You may not skip over UTF-8 characters in this case. If you
|
---|
2404 | do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
|
---|
2405 | for instance, if your UTF-8 string contains C<v196.172>, and you skip
|
---|
2406 | that character, you can never match a C<chr(200)> in a non-UTF-8 string.
|
---|
2407 | So don't do that!
|
---|
2408 |
|
---|
2409 | =head2 How does Perl store UTF-8 strings?
|
---|
2410 |
|
---|
2411 | Currently, Perl deals with Unicode strings and non-Unicode strings
|
---|
2412 | slightly differently. If a string has been identified as being UTF-8
|
---|
2413 | encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
|
---|
2414 | manipulate this flag with the following macros:
|
---|
2415 |
|
---|
2416 | SvUTF8(sv)
|
---|
2417 | SvUTF8_on(sv)
|
---|
2418 | SvUTF8_off(sv)
|
---|
2419 |
|
---|
2420 | This flag has an important effect on Perl's treatment of the string: if
|
---|
2421 | Unicode data is not properly distinguished, regular expressions,
|
---|
2422 | C<length>, C<substr> and other string handling operations will have
|
---|
2423 | undesirable results.
|
---|
2424 |
|
---|
2425 | The problem comes when you have, for instance, a string that isn't
|
---|
2426 | flagged is UTF-8, and contains a byte sequence that could be UTF-8 -
|
---|
2427 | especially when combining non-UTF-8 and UTF-8 strings.
|
---|
2428 |
|
---|
2429 | Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
|
---|
2430 | need be sure you don't accidentally knock it off while you're
|
---|
2431 | manipulating SVs. More specifically, you cannot expect to do this:
|
---|
2432 |
|
---|
2433 | SV *sv;
|
---|
2434 | SV *nsv;
|
---|
2435 | STRLEN len;
|
---|
2436 | char *p;
|
---|
2437 |
|
---|
2438 | p = SvPV(sv, len);
|
---|
2439 | frobnicate(p);
|
---|
2440 | nsv = newSVpvn(p, len);
|
---|
2441 |
|
---|
2442 | The C<char*> string does not tell you the whole story, and you can't
|
---|
2443 | copy or reconstruct an SV just by copying the string value. Check if the
|
---|
2444 | old SV has the UTF-8 flag set, and act accordingly:
|
---|
2445 |
|
---|
2446 | p = SvPV(sv, len);
|
---|
2447 | frobnicate(p);
|
---|
2448 | nsv = newSVpvn(p, len);
|
---|
2449 | if (SvUTF8(sv))
|
---|
2450 | SvUTF8_on(nsv);
|
---|
2451 |
|
---|
2452 | In fact, your C<frobnicate> function should be made aware of whether or
|
---|
2453 | not it's dealing with UTF-8 data, so that it can handle the string
|
---|
2454 | appropriately.
|
---|
2455 |
|
---|
2456 | Since just passing an SV to an XS function and copying the data of
|
---|
2457 | the SV is not enough to copy the UTF-8 flags, even less right is just
|
---|
2458 | passing a C<char *> to an XS function.
|
---|
2459 |
|
---|
2460 | =head2 How do I convert a string to UTF-8?
|
---|
2461 |
|
---|
2462 | If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary
|
---|
2463 | to upgrade one of the strings to UTF-8. If you've got an SV, the easiest
|
---|
2464 | way to do this is:
|
---|
2465 |
|
---|
2466 | sv_utf8_upgrade(sv);
|
---|
2467 |
|
---|
2468 | However, you must not do this, for example:
|
---|
2469 |
|
---|
2470 | if (!SvUTF8(left))
|
---|
2471 | sv_utf8_upgrade(left);
|
---|
2472 |
|
---|
2473 | If you do this in a binary operator, you will actually change one of the
|
---|
2474 | strings that came into the operator, and, while it shouldn't be noticeable
|
---|
2475 | by the end user, it can cause problems.
|
---|
2476 |
|
---|
2477 | Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
|
---|
2478 | string argument. This is useful for having the data available for
|
---|
2479 | comparisons and so on, without harming the original SV. There's also
|
---|
2480 | C<utf8_to_bytes> to go the other way, but naturally, this will fail if
|
---|
2481 | the string contains any characters above 255 that can't be represented
|
---|
2482 | in a single byte.
|
---|
2483 |
|
---|
2484 | =head2 Is there anything else I need to know?
|
---|
2485 |
|
---|
2486 | Not really. Just remember these things:
|
---|
2487 |
|
---|
2488 | =over 3
|
---|
2489 |
|
---|
2490 | =item *
|
---|
2491 |
|
---|
2492 | There's no way to tell if a string is UTF-8 or not. You can tell if an SV
|
---|
2493 | is UTF-8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
|
---|
2494 | something should be UTF-8. Treat the flag as part of the PV, even though
|
---|
2495 | it's not - if you pass on the PV to somewhere, pass on the flag too.
|
---|
2496 |
|
---|
2497 | =item *
|
---|
2498 |
|
---|
2499 | If a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value,
|
---|
2500 | unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.
|
---|
2501 |
|
---|
2502 | =item *
|
---|
2503 |
|
---|
2504 | When writing a character C<uv> to a UTF-8 string, B<always> use
|
---|
2505 | C<uv_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case
|
---|
2506 | you can use C<*s = uv>.
|
---|
2507 |
|
---|
2508 | =item *
|
---|
2509 |
|
---|
2510 | Mixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get
|
---|
2511 | a new string which is UTF-8 encoded. There are tricks you can use to
|
---|
2512 | delay deciding whether you need to use a UTF-8 string until you get to a
|
---|
2513 | high character - C<HALF_UPGRADE> is one of those.
|
---|
2514 |
|
---|
2515 | =back
|
---|
2516 |
|
---|
2517 | =head1 Custom Operators
|
---|
2518 |
|
---|
2519 | Custom operator support is a new experimental feature that allows you to
|
---|
2520 | define your own ops. This is primarily to allow the building of
|
---|
2521 | interpreters for other languages in the Perl core, but it also allows
|
---|
2522 | optimizations through the creation of "macro-ops" (ops which perform the
|
---|
2523 | functions of multiple ops which are usually executed together, such as
|
---|
2524 | C<gvsv, gvsv, add>.)
|
---|
2525 |
|
---|
2526 | This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
|
---|
2527 | core does not "know" anything special about this op type, and so it will
|
---|
2528 | not be involved in any optimizations. This also means that you can
|
---|
2529 | define your custom ops to be any op structure - unary, binary, list and
|
---|
2530 | so on - you like.
|
---|
2531 |
|
---|
2532 | It's important to know what custom operators won't do for you. They
|
---|
2533 | won't let you add new syntax to Perl, directly. They won't even let you
|
---|
2534 | add new keywords, directly. In fact, they won't change the way Perl
|
---|
2535 | compiles a program at all. You have to do those changes yourself, after
|
---|
2536 | Perl has compiled the program. You do this either by manipulating the op
|
---|
2537 | tree using a C<CHECK> block and the C<B::Generate> module, or by adding
|
---|
2538 | a custom peephole optimizer with the C<optimize> module.
|
---|
2539 |
|
---|
2540 | When you do this, you replace ordinary Perl ops with custom ops by
|
---|
2541 | creating ops with the type C<OP_CUSTOM> and the C<pp_addr> of your own
|
---|
2542 | PP function. This should be defined in XS code, and should look like
|
---|
2543 | the PP ops in C<pp_*.c>. You are responsible for ensuring that your op
|
---|
2544 | takes the appropriate number of values from the stack, and you are
|
---|
2545 | responsible for adding stack marks if necessary.
|
---|
2546 |
|
---|
2547 | You should also "register" your op with the Perl interpreter so that it
|
---|
2548 | can produce sensible error and warning messages. Since it is possible to
|
---|
2549 | have multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
|
---|
2550 | Perl uses the value of C<< o->op_ppaddr >> as a key into the
|
---|
2551 | C<PL_custom_op_descs> and C<PL_custom_op_names> hashes. This means you
|
---|
2552 | need to enter a name and description for your op at the appropriate
|
---|
2553 | place in the C<PL_custom_op_names> and C<PL_custom_op_descs> hashes.
|
---|
2554 |
|
---|
2555 | Forthcoming versions of C<B::Generate> (version 1.0 and above) should
|
---|
2556 | directly support the creation of custom ops by name.
|
---|
2557 |
|
---|
2558 | =head1 AUTHORS
|
---|
2559 |
|
---|
2560 | Until May 1997, this document was maintained by Jeff Okamoto
|
---|
2561 | E<lt>[email protected]<gt>. It is now maintained as part of Perl
|
---|
2562 | itself by the Perl 5 Porters E<lt>[email protected]<gt>.
|
---|
2563 |
|
---|
2564 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
|
---|
2565 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
|
---|
2566 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
|
---|
2567 | Stephen McCamant, and Gurusamy Sarathy.
|
---|
2568 |
|
---|
2569 | =head1 SEE ALSO
|
---|
2570 |
|
---|
2571 | perlapi(1), perlintern(1), perlxs(1), perlembed(1)
|
---|