1 | =head1 NAME
|
---|
2 |
|
---|
3 | perlrequick - Perl regular expressions quick start
|
---|
4 |
|
---|
5 | =head1 DESCRIPTION
|
---|
6 |
|
---|
7 | This page covers the very basics of understanding, creating and
|
---|
8 | using regular expressions ('regexes') in Perl.
|
---|
9 |
|
---|
10 |
|
---|
11 | =head1 The Guide
|
---|
12 |
|
---|
13 | =head2 Simple word matching
|
---|
14 |
|
---|
15 | The simplest regex is simply a word, or more generally, a string of
|
---|
16 | characters. A regex consisting of a word matches any string that
|
---|
17 | contains that word:
|
---|
18 |
|
---|
19 | "Hello World" =~ /World/; # matches
|
---|
20 |
|
---|
21 | In this statement, C<World> is a regex and the C<//> enclosing
|
---|
22 | C</World/> tells perl to search a string for a match. The operator
|
---|
23 | C<=~> associates the string with the regex match and produces a true
|
---|
24 | value if the regex matched, or false if the regex did not match. In
|
---|
25 | our case, C<World> matches the second word in C<"Hello World">, so the
|
---|
26 | expression is true. This idea has several variations.
|
---|
27 |
|
---|
28 | Expressions like this are useful in conditionals:
|
---|
29 |
|
---|
30 | print "It matches\n" if "Hello World" =~ /World/;
|
---|
31 |
|
---|
32 | The sense of the match can be reversed by using C<!~> operator:
|
---|
33 |
|
---|
34 | print "It doesn't match\n" if "Hello World" !~ /World/;
|
---|
35 |
|
---|
36 | The literal string in the regex can be replaced by a variable:
|
---|
37 |
|
---|
38 | $greeting = "World";
|
---|
39 | print "It matches\n" if "Hello World" =~ /$greeting/;
|
---|
40 |
|
---|
41 | If you're matching against C<$_>, the C<$_ =~> part can be omitted:
|
---|
42 |
|
---|
43 | $_ = "Hello World";
|
---|
44 | print "It matches\n" if /World/;
|
---|
45 |
|
---|
46 | Finally, the C<//> default delimiters for a match can be changed to
|
---|
47 | arbitrary delimiters by putting an C<'m'> out front:
|
---|
48 |
|
---|
49 | "Hello World" =~ m!World!; # matches, delimited by '!'
|
---|
50 | "Hello World" =~ m{World}; # matches, note the matching '{}'
|
---|
51 | "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
|
---|
52 | # '/' becomes an ordinary char
|
---|
53 |
|
---|
54 | Regexes must match a part of the string I<exactly> in order for the
|
---|
55 | statement to be true:
|
---|
56 |
|
---|
57 | "Hello World" =~ /world/; # doesn't match, case sensitive
|
---|
58 | "Hello World" =~ /o W/; # matches, ' ' is an ordinary char
|
---|
59 | "Hello World" =~ /World /; # doesn't match, no ' ' at end
|
---|
60 |
|
---|
61 | perl will always match at the earliest possible point in the string:
|
---|
62 |
|
---|
63 | "Hello World" =~ /o/; # matches 'o' in 'Hello'
|
---|
64 | "That hat is red" =~ /hat/; # matches 'hat' in 'That'
|
---|
65 |
|
---|
66 | Not all characters can be used 'as is' in a match. Some characters,
|
---|
67 | called B<metacharacters>, are reserved for use in regex notation.
|
---|
68 | The metacharacters are
|
---|
69 |
|
---|
70 | {}[]()^$.|*+?\
|
---|
71 |
|
---|
72 | A metacharacter can be matched by putting a backslash before it:
|
---|
73 |
|
---|
74 | "2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
|
---|
75 | "2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +
|
---|
76 | 'C:\WIN32' =~ /C:\\WIN/; # matches
|
---|
77 | "/usr/bin/perl" =~ /\/usr\/bin\/perl/; # matches
|
---|
78 |
|
---|
79 | In the last regex, the forward slash C<'/'> is also backslashed,
|
---|
80 | because it is used to delimit the regex.
|
---|
81 |
|
---|
82 | Non-printable ASCII characters are represented by B<escape sequences>.
|
---|
83 | Common examples are C<\t> for a tab, C<\n> for a newline, and C<\r>
|
---|
84 | for a carriage return. Arbitrary bytes are represented by octal
|
---|
85 | escape sequences, e.g., C<\033>, or hexadecimal escape sequences,
|
---|
86 | e.g., C<\x1B>:
|
---|
87 |
|
---|
88 | "1000\t2000" =~ m(0\t2) # matches
|
---|
89 | "cat" =~ /\143\x61\x74/ # matches, but a weird way to spell cat
|
---|
90 |
|
---|
91 | Regexes are treated mostly as double quoted strings, so variable
|
---|
92 | substitution works:
|
---|
93 |
|
---|
94 | $foo = 'house';
|
---|
95 | 'cathouse' =~ /cat$foo/; # matches
|
---|
96 | 'housecat' =~ /${foo}cat/; # matches
|
---|
97 |
|
---|
98 | With all of the regexes above, if the regex matched anywhere in the
|
---|
99 | string, it was considered a match. To specify I<where> it should
|
---|
100 | match, we would use the B<anchor> metacharacters C<^> and C<$>. The
|
---|
101 | anchor C<^> means match at the beginning of the string and the anchor
|
---|
102 | C<$> means match at the end of the string, or before a newline at the
|
---|
103 | end of the string. Some examples:
|
---|
104 |
|
---|
105 | "housekeeper" =~ /keeper/; # matches
|
---|
106 | "housekeeper" =~ /^keeper/; # doesn't match
|
---|
107 | "housekeeper" =~ /keeper$/; # matches
|
---|
108 | "housekeeper\n" =~ /keeper$/; # matches
|
---|
109 | "housekeeper" =~ /^housekeeper$/; # matches
|
---|
110 |
|
---|
111 | =head2 Using character classes
|
---|
112 |
|
---|
113 | A B<character class> allows a set of possible characters, rather than
|
---|
114 | just a single character, to match at a particular point in a regex.
|
---|
115 | Character classes are denoted by brackets C<[...]>, with the set of
|
---|
116 | characters to be possibly matched inside. Here are some examples:
|
---|
117 |
|
---|
118 | /cat/; # matches 'cat'
|
---|
119 | /[bcr]at/; # matches 'bat', 'cat', or 'rat'
|
---|
120 | "abc" =~ /[cab]/; # matches 'a'
|
---|
121 |
|
---|
122 | In the last statement, even though C<'c'> is the first character in
|
---|
123 | the class, the earliest point at which the regex can match is C<'a'>.
|
---|
124 |
|
---|
125 | /[yY][eE][sS]/; # match 'yes' in a case-insensitive way
|
---|
126 | # 'yes', 'Yes', 'YES', etc.
|
---|
127 | /yes/i; # also match 'yes' in a case-insensitive way
|
---|
128 |
|
---|
129 | The last example shows a match with an C<'i'> B<modifier>, which makes
|
---|
130 | the match case-insensitive.
|
---|
131 |
|
---|
132 | Character classes also have ordinary and special characters, but the
|
---|
133 | sets of ordinary and special characters inside a character class are
|
---|
134 | different than those outside a character class. The special
|
---|
135 | characters for a character class are C<-]\^$> and are matched using an
|
---|
136 | escape:
|
---|
137 |
|
---|
138 | /[\]c]def/; # matches ']def' or 'cdef'
|
---|
139 | $x = 'bcr';
|
---|
140 | /[$x]at/; # matches 'bat, 'cat', or 'rat'
|
---|
141 | /[\$x]at/; # matches '$at' or 'xat'
|
---|
142 | /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'
|
---|
143 |
|
---|
144 | The special character C<'-'> acts as a range operator within character
|
---|
145 | classes, so that the unwieldy C<[0123456789]> and C<[abc...xyz]>
|
---|
146 | become the svelte C<[0-9]> and C<[a-z]>:
|
---|
147 |
|
---|
148 | /item[0-9]/; # matches 'item0' or ... or 'item9'
|
---|
149 | /[0-9a-fA-F]/; # matches a hexadecimal digit
|
---|
150 |
|
---|
151 | If C<'-'> is the first or last character in a character class, it is
|
---|
152 | treated as an ordinary character.
|
---|
153 |
|
---|
154 | The special character C<^> in the first position of a character class
|
---|
155 | denotes a B<negated character class>, which matches any character but
|
---|
156 | those in the brackets. Both C<[...]> and C<[^...]> must match a
|
---|
157 | character, or the match fails. Then
|
---|
158 |
|
---|
159 | /[^a]at/; # doesn't match 'aat' or 'at', but matches
|
---|
160 | # all other 'bat', 'cat, '0at', '%at', etc.
|
---|
161 | /[^0-9]/; # matches a non-numeric character
|
---|
162 | /[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
|
---|
163 |
|
---|
164 | Perl has several abbreviations for common character classes:
|
---|
165 |
|
---|
166 | =over 4
|
---|
167 |
|
---|
168 | =item *
|
---|
169 |
|
---|
170 | \d is a digit and represents
|
---|
171 |
|
---|
172 | [0-9]
|
---|
173 |
|
---|
174 | =item *
|
---|
175 |
|
---|
176 | \s is a whitespace character and represents
|
---|
177 |
|
---|
178 | [\ \t\r\n\f]
|
---|
179 |
|
---|
180 | =item *
|
---|
181 |
|
---|
182 | \w is a word character (alphanumeric or _) and represents
|
---|
183 |
|
---|
184 | [0-9a-zA-Z_]
|
---|
185 |
|
---|
186 | =item *
|
---|
187 |
|
---|
188 | \D is a negated \d; it represents any character but a digit
|
---|
189 |
|
---|
190 | [^0-9]
|
---|
191 |
|
---|
192 | =item *
|
---|
193 |
|
---|
194 | \S is a negated \s; it represents any non-whitespace character
|
---|
195 |
|
---|
196 | [^\s]
|
---|
197 |
|
---|
198 | =item *
|
---|
199 |
|
---|
200 | \W is a negated \w; it represents any non-word character
|
---|
201 |
|
---|
202 | [^\w]
|
---|
203 |
|
---|
204 | =item *
|
---|
205 |
|
---|
206 | The period '.' matches any character but "\n"
|
---|
207 |
|
---|
208 | =back
|
---|
209 |
|
---|
210 | The C<\d\s\w\D\S\W> abbreviations can be used both inside and outside
|
---|
211 | of character classes. Here are some in use:
|
---|
212 |
|
---|
213 | /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
|
---|
214 | /[\d\s]/; # matches any digit or whitespace character
|
---|
215 | /\w\W\w/; # matches a word char, followed by a
|
---|
216 | # non-word char, followed by a word char
|
---|
217 | /..rt/; # matches any two chars, followed by 'rt'
|
---|
218 | /end\./; # matches 'end.'
|
---|
219 | /end[.]/; # same thing, matches 'end.'
|
---|
220 |
|
---|
221 | The S<B<word anchor> > C<\b> matches a boundary between a word
|
---|
222 | character and a non-word character C<\w\W> or C<\W\w>:
|
---|
223 |
|
---|
224 | $x = "Housecat catenates house and cat";
|
---|
225 | $x =~ /\bcat/; # matches cat in 'catenates'
|
---|
226 | $x =~ /cat\b/; # matches cat in 'housecat'
|
---|
227 | $x =~ /\bcat\b/; # matches 'cat' at end of string
|
---|
228 |
|
---|
229 | In the last example, the end of the string is considered a word
|
---|
230 | boundary.
|
---|
231 |
|
---|
232 | =head2 Matching this or that
|
---|
233 |
|
---|
234 | We can match different character strings with the B<alternation>
|
---|
235 | metacharacter C<'|'>. To match C<dog> or C<cat>, we form the regex
|
---|
236 | C<dog|cat>. As before, perl will try to match the regex at the
|
---|
237 | earliest possible point in the string. At each character position,
|
---|
238 | perl will first try to match the first alternative, C<dog>. If
|
---|
239 | C<dog> doesn't match, perl will then try the next alternative, C<cat>.
|
---|
240 | If C<cat> doesn't match either, then the match fails and perl moves to
|
---|
241 | the next position in the string. Some examples:
|
---|
242 |
|
---|
243 | "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
|
---|
244 | "cats and dogs" =~ /dog|cat|bird/; # matches "cat"
|
---|
245 |
|
---|
246 | Even though C<dog> is the first alternative in the second regex,
|
---|
247 | C<cat> is able to match earlier in the string.
|
---|
248 |
|
---|
249 | "cats" =~ /c|ca|cat|cats/; # matches "c"
|
---|
250 | "cats" =~ /cats|cat|ca|c/; # matches "cats"
|
---|
251 |
|
---|
252 | At a given character position, the first alternative that allows the
|
---|
253 | regex match to succeed will be the one that matches. Here, all the
|
---|
254 | alternatives match at the first string position, so the first matches.
|
---|
255 |
|
---|
256 | =head2 Grouping things and hierarchical matching
|
---|
257 |
|
---|
258 | The B<grouping> metacharacters C<()> allow a part of a regex to be
|
---|
259 | treated as a single unit. Parts of a regex are grouped by enclosing
|
---|
260 | them in parentheses. The regex C<house(cat|keeper)> means match
|
---|
261 | C<house> followed by either C<cat> or C<keeper>. Some more examples
|
---|
262 | are
|
---|
263 |
|
---|
264 | /(a|b)b/; # matches 'ab' or 'bb'
|
---|
265 | /(^a|b)c/; # matches 'ac' at start of string or 'bc' anywhere
|
---|
266 |
|
---|
267 | /house(cat|)/; # matches either 'housecat' or 'house'
|
---|
268 | /house(cat(s|)|)/; # matches either 'housecats' or 'housecat' or
|
---|
269 | # 'house'. Note groups can be nested.
|
---|
270 |
|
---|
271 | "20" =~ /(19|20|)\d\d/; # matches the null alternative '()\d\d',
|
---|
272 | # because '20\d\d' can't match
|
---|
273 |
|
---|
274 | =head2 Extracting matches
|
---|
275 |
|
---|
276 | The grouping metacharacters C<()> also allow the extraction of the
|
---|
277 | parts of a string that matched. For each grouping, the part that
|
---|
278 | matched inside goes into the special variables C<$1>, C<$2>, etc.
|
---|
279 | They can be used just as ordinary variables:
|
---|
280 |
|
---|
281 | # extract hours, minutes, seconds
|
---|
282 | $time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format
|
---|
283 | $hours = $1;
|
---|
284 | $minutes = $2;
|
---|
285 | $seconds = $3;
|
---|
286 |
|
---|
287 | In list context, a match C</regex/> with groupings will return the
|
---|
288 | list of matched values C<($1,$2,...)>. So we could rewrite it as
|
---|
289 |
|
---|
290 | ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
|
---|
291 |
|
---|
292 | If the groupings in a regex are nested, C<$1> gets the group with the
|
---|
293 | leftmost opening parenthesis, C<$2> the next opening parenthesis,
|
---|
294 | etc. For example, here is a complex regex and the matching variables
|
---|
295 | indicated below it:
|
---|
296 |
|
---|
297 | /(ab(cd|ef)((gi)|j))/;
|
---|
298 | 1 2 34
|
---|
299 |
|
---|
300 | Associated with the matching variables C<$1>, C<$2>, ... are
|
---|
301 | the B<backreferences> C<\1>, C<\2>, ... Backreferences are
|
---|
302 | matching variables that can be used I<inside> a regex:
|
---|
303 |
|
---|
304 | /(\w\w\w)\s\1/; # find sequences like 'the the' in string
|
---|
305 |
|
---|
306 | C<$1>, C<$2>, ... should only be used outside of a regex, and C<\1>,
|
---|
307 | C<\2>, ... only inside a regex.
|
---|
308 |
|
---|
309 | =head2 Matching repetitions
|
---|
310 |
|
---|
311 | The B<quantifier> metacharacters C<?>, C<*>, C<+>, and C<{}> allow us
|
---|
312 | to determine the number of repeats of a portion of a regex we
|
---|
313 | consider to be a match. Quantifiers are put immediately after the
|
---|
314 | character, character class, or grouping that we want to specify. They
|
---|
315 | have the following meanings:
|
---|
316 |
|
---|
317 | =over 4
|
---|
318 |
|
---|
319 | =item *
|
---|
320 |
|
---|
321 | C<a?> = match 'a' 1 or 0 times
|
---|
322 |
|
---|
323 | =item *
|
---|
324 |
|
---|
325 | C<a*> = match 'a' 0 or more times, i.e., any number of times
|
---|
326 |
|
---|
327 | =item *
|
---|
328 |
|
---|
329 | C<a+> = match 'a' 1 or more times, i.e., at least once
|
---|
330 |
|
---|
331 | =item *
|
---|
332 |
|
---|
333 | C<a{n,m}> = match at least C<n> times, but not more than C<m>
|
---|
334 | times.
|
---|
335 |
|
---|
336 | =item *
|
---|
337 |
|
---|
338 | C<a{n,}> = match at least C<n> or more times
|
---|
339 |
|
---|
340 | =item *
|
---|
341 |
|
---|
342 | C<a{n}> = match exactly C<n> times
|
---|
343 |
|
---|
344 | =back
|
---|
345 |
|
---|
346 | Here are some examples:
|
---|
347 |
|
---|
348 | /[a-z]+\s+\d*/; # match a lowercase word, at least some space, and
|
---|
349 | # any number of digits
|
---|
350 | /(\w+)\s+\1/; # match doubled words of arbitrary length
|
---|
351 | $year =~ /\d{2,4}/; # make sure year is at least 2 but not more
|
---|
352 | # than 4 digits
|
---|
353 | $year =~ /\d{4}|\d{2}/; # better match; throw out 3 digit dates
|
---|
354 |
|
---|
355 | These quantifiers will try to match as much of the string as possible,
|
---|
356 | while still allowing the regex to match. So we have
|
---|
357 |
|
---|
358 | $x = 'the cat in the hat';
|
---|
359 | $x =~ /^(.*)(at)(.*)$/; # matches,
|
---|
360 | # $1 = 'the cat in the h'
|
---|
361 | # $2 = 'at'
|
---|
362 | # $3 = '' (0 matches)
|
---|
363 |
|
---|
364 | The first quantifier C<.*> grabs as much of the string as possible
|
---|
365 | while still having the regex match. The second quantifier C<.*> has
|
---|
366 | no string left to it, so it matches 0 times.
|
---|
367 |
|
---|
368 | =head2 More matching
|
---|
369 |
|
---|
370 | There are a few more things you might want to know about matching
|
---|
371 | operators. In the code
|
---|
372 |
|
---|
373 | $pattern = 'Seuss';
|
---|
374 | while (<>) {
|
---|
375 | print if /$pattern/;
|
---|
376 | }
|
---|
377 |
|
---|
378 | perl has to re-evaluate C<$pattern> each time through the loop. If
|
---|
379 | C<$pattern> won't be changing, use the C<//o> modifier, to only
|
---|
380 | perform variable substitutions once. If you don't want any
|
---|
381 | substitutions at all, use the special delimiter C<m''>:
|
---|
382 |
|
---|
383 | @pattern = ('Seuss');
|
---|
384 | m/@pattern/; # matches 'Seuss'
|
---|
385 | m'@pattern'; # matches the literal string '@pattern'
|
---|
386 |
|
---|
387 | The global modifier C<//g> allows the matching operator to match
|
---|
388 | within a string as many times as possible. In scalar context,
|
---|
389 | successive matches against a string will have C<//g> jump from match
|
---|
390 | to match, keeping track of position in the string as it goes along.
|
---|
391 | You can get or set the position with the C<pos()> function.
|
---|
392 | For example,
|
---|
393 |
|
---|
394 | $x = "cat dog house"; # 3 words
|
---|
395 | while ($x =~ /(\w+)/g) {
|
---|
396 | print "Word is $1, ends at position ", pos $x, "\n";
|
---|
397 | }
|
---|
398 |
|
---|
399 | prints
|
---|
400 |
|
---|
401 | Word is cat, ends at position 3
|
---|
402 | Word is dog, ends at position 7
|
---|
403 | Word is house, ends at position 13
|
---|
404 |
|
---|
405 | A failed match or changing the target string resets the position. If
|
---|
406 | you don't want the position reset after failure to match, add the
|
---|
407 | C<//c>, as in C</regex/gc>.
|
---|
408 |
|
---|
409 | In list context, C<//g> returns a list of matched groupings, or if
|
---|
410 | there are no groupings, a list of matches to the whole regex. So
|
---|
411 |
|
---|
412 | @words = ($x =~ /(\w+)/g); # matches,
|
---|
413 | # $word[0] = 'cat'
|
---|
414 | # $word[1] = 'dog'
|
---|
415 | # $word[2] = 'house'
|
---|
416 |
|
---|
417 | =head2 Search and replace
|
---|
418 |
|
---|
419 | Search and replace is performed using C<s/regex/replacement/modifiers>.
|
---|
420 | The C<replacement> is a Perl double quoted string that replaces in the
|
---|
421 | string whatever is matched with the C<regex>. The operator C<=~> is
|
---|
422 | also used here to associate a string with C<s///>. If matching
|
---|
423 | against C<$_>, the S<C<$_ =~> > can be dropped. If there is a match,
|
---|
424 | C<s///> returns the number of substitutions made, otherwise it returns
|
---|
425 | false. Here are a few examples:
|
---|
426 |
|
---|
427 | $x = "Time to feed the cat!";
|
---|
428 | $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
|
---|
429 | $y = "'quoted words'";
|
---|
430 | $y =~ s/^'(.*)'$/$1/; # strip single quotes,
|
---|
431 | # $y contains "quoted words"
|
---|
432 |
|
---|
433 | With the C<s///> operator, the matched variables C<$1>, C<$2>, etc.
|
---|
434 | are immediately available for use in the replacement expression. With
|
---|
435 | the global modifier, C<s///g> will search and replace all occurrences
|
---|
436 | of the regex in the string:
|
---|
437 |
|
---|
438 | $x = "I batted 4 for 4";
|
---|
439 | $x =~ s/4/four/; # $x contains "I batted four for 4"
|
---|
440 | $x = "I batted 4 for 4";
|
---|
441 | $x =~ s/4/four/g; # $x contains "I batted four for four"
|
---|
442 |
|
---|
443 | The evaluation modifier C<s///e> wraps an C<eval{...}> around the
|
---|
444 | replacement string and the evaluated result is substituted for the
|
---|
445 | matched substring. Some examples:
|
---|
446 |
|
---|
447 | # reverse all the words in a string
|
---|
448 | $x = "the cat in the hat";
|
---|
449 | $x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
|
---|
450 |
|
---|
451 | # convert percentage to decimal
|
---|
452 | $x = "A 39% hit rate";
|
---|
453 | $x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"
|
---|
454 |
|
---|
455 | The last example shows that C<s///> can use other delimiters, such as
|
---|
456 | C<s!!!> and C<s{}{}>, and even C<s{}//>. If single quotes are used
|
---|
457 | C<s'''>, then the regex and replacement are treated as single quoted
|
---|
458 | strings.
|
---|
459 |
|
---|
460 | =head2 The split operator
|
---|
461 |
|
---|
462 | C<split /regex/, string> splits C<string> into a list of substrings
|
---|
463 | and returns that list. The regex determines the character sequence
|
---|
464 | that C<string> is split with respect to. For example, to split a
|
---|
465 | string into words, use
|
---|
466 |
|
---|
467 | $x = "Calvin and Hobbes";
|
---|
468 | @word = split /\s+/, $x; # $word[0] = 'Calvin'
|
---|
469 | # $word[1] = 'and'
|
---|
470 | # $word[2] = 'Hobbes'
|
---|
471 |
|
---|
472 | To extract a comma-delimited list of numbers, use
|
---|
473 |
|
---|
474 | $x = "1.618,2.718, 3.142";
|
---|
475 | @const = split /,\s*/, $x; # $const[0] = '1.618'
|
---|
476 | # $const[1] = '2.718'
|
---|
477 | # $const[2] = '3.142'
|
---|
478 |
|
---|
479 | If the empty regex C<//> is used, the string is split into individual
|
---|
480 | characters. If the regex has groupings, then the list produced contains
|
---|
481 | the matched substrings from the groupings as well:
|
---|
482 |
|
---|
483 | $x = "/usr/bin";
|
---|
484 | @parts = split m!(/)!, $x; # $parts[0] = ''
|
---|
485 | # $parts[1] = '/'
|
---|
486 | # $parts[2] = 'usr'
|
---|
487 | # $parts[3] = '/'
|
---|
488 | # $parts[4] = 'bin'
|
---|
489 |
|
---|
490 | Since the first character of $x matched the regex, C<split> prepended
|
---|
491 | an empty initial element to the list.
|
---|
492 |
|
---|
493 | =head1 BUGS
|
---|
494 |
|
---|
495 | None.
|
---|
496 |
|
---|
497 | =head1 SEE ALSO
|
---|
498 |
|
---|
499 | This is just a quick start guide. For a more in-depth tutorial on
|
---|
500 | regexes, see L<perlretut> and for the reference page, see L<perlre>.
|
---|
501 |
|
---|
502 | =head1 AUTHOR AND COPYRIGHT
|
---|
503 |
|
---|
504 | Copyright (c) 2000 Mark Kvale
|
---|
505 | All rights reserved.
|
---|
506 |
|
---|
507 | This document may be distributed under the same terms as Perl itself.
|
---|
508 |
|
---|
509 | =head2 Acknowledgments
|
---|
510 |
|
---|
511 | The author would like to thank Mark-Jason Dominus, Tom Christiansen,
|
---|
512 | Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful
|
---|
513 | comments.
|
---|
514 |
|
---|
515 | =cut
|
---|
516 |
|
---|