source: extensions/gsdl-video/trunk/installed/cmdline/lib/ruby/1.8/scanf.rb@ 18425

Last change on this file since 18425 was 18425, checked in by davidb, 15 years ago

Video extension to Greenstone

File size: 20.6 KB
Line 
1# scanf for Ruby
2#
3# $Revision: 11708 $
4# $Id: scanf.rb 11708 2007-02-12 23:01:19Z shyouhei $
5# $Author: shyouhei $
6# $Date: 2007-02-13 08:01:19 +0900 (Tue, 13 Feb 2007) $
7#
8# A product of the Austin Ruby Codefest (Austin, Texas, August 2002)
9
10=begin
11
12=scanf for Ruby
13
14==Description
15
16scanf for Ruby is an implementation of the C function scanf(3),
17modified as necessary for Ruby compatibility.
18
19The methods provided are String#scanf, IO#scanf, and
20Kernel#scanf. Kernel#scanf is a wrapper around STDIN.scanf. IO#scanf
21can be used on any IO stream, including file handles and sockets.
22scanf can be called either with or without a block.
23
24scanf for Ruby scans an input string or stream according to a
25<b>format</b>, as described below ("Conversions"), and returns an
26array of matches between the format and the input. The format is
27defined in a string, and is similar (though not identical) to the
28formats used in Kernel#printf and Kernel#sprintf.
29
30The format may contain <b>conversion specifiers</b>, which tell scanf
31what form (type) each particular matched substring should be converted
32to (e.g., decimal integer, floating point number, literal string,
33etc.) The matches and conversions take place from left to right, and
34the conversions themselves are returned as an array.
35
36The format string may also contain characters other than those in the
37conversion specifiers. White space (blanks, tabs, or newlines) in the
38format string matches any amount of white space, including none, in
39the input. Everything else matches only itself.
40
41Scanning stops, and scanf returns, when any input character fails to
42match the specifications in the format string, or when input is
43exhausted, or when everything in the format string has been
44matched. All matches found up to the stopping point are returned in
45the return array (or yielded to the block, if a block was given).
46
47
48==Basic usage
49
50 require 'scanf.rb'
51
52 # String#scanf and IO#scanf take a single argument (a format string)
53 array = aString.scanf("%d%s")
54 array = anIO.scanf("%d%s")
55
56 # Kernel#scanf reads from STDIN
57 array = scanf("%d%s")
58
59==Block usage
60
61When called with a block, scanf keeps scanning the input, cycling back
62to the beginning of the format string, and yields a new array of
63conversions to the block every time the format string is matched
64(including partial matches, but not including complete failures). The
65actual return value of scanf when called with a block is an array
66containing the results of all the executions of the block.
67
68 str = "123 abc 456 def 789 ghi"
69 str.scanf("%d%s") { |num,str| [ num * 2, str.upcase ] }
70 # => [[246, "ABC"], [912, "DEF"], [1578, "GHI"]]
71
72==Conversions
73
74The single argument to scanf is a format string, which generally
75includes one or more conversion specifiers. Conversion specifiers
76begin with the percent character ('%') and include information about
77what scanf should next scan for (string, decimal number, single
78character, etc.).
79
80There may be an optional maximum field width, expressed as a decimal
81integer, between the % and the conversion. If no width is given, a
82default of `infinity' is used (with the exception of the %c specifier;
83see below). Otherwise, given a field width of <em>n</em> for a given
84conversion, at most <em>n</em> characters are scanned in processing
85that conversion. Before conversion begins, most conversions skip
86white space in the input string; this white space is not counted
87against the field width.
88
89The following conversions are available. (See the files EXAMPLES
90and <tt>tests/scanftests.rb</tt> for examples.)
91
92[%]
93 Matches a literal `%'. That is, `%%' in the format string matches a
94 single input `%' character. No conversion is done, and the resulting
95 '%' is not included in the return array.
96
97[d]
98 Matches an optionally signed decimal integer.
99
100[u]
101 Same as d.
102
103[i]
104 Matches an optionally signed integer. The integer is read in base
105 16 if it begins with `0x' or `0X', in base 8 if it begins with `0',
106 and in base 10 other- wise. Only characters that correspond to the
107 base are recognized.
108
109[o]
110 Matches an optionally signed octal integer.
111
112[x,X]
113 Matches an optionally signed hexadecimal integer,
114
115[f,g,e,E]
116 Matches an optionally signed floating-point number.
117
118[s]
119 Matches a sequence of non-white-space character. The input string stops at
120 white space or at the maximum field width, whichever occurs first.
121
122[c]
123 Matches a single character, or a sequence of <em>n</em> characters if a
124 field width of <em>n</em> is specified. The usual skip of leading white
125 space is suppressed. To skip white space first, use an explicit space in
126 the format.
127
128[<tt>[</tt>]
129 Matches a nonempty sequence of characters from the specified set
130 of accepted characters. The usual skip of leading white space is
131 suppressed. This bracketed sub-expression is interpreted exactly like a
132 character class in a Ruby regular expression. (In fact, it is placed as-is
133 in a regular expression.) The matching against the input string ends with
134 the appearance of a character not in (or, with a circumflex, in) the set,
135 or when the field width runs out, whichever comes first.
136
137===Assignment suppression
138
139To require that a particular match occur, but without including the result
140in the return array, place the <b>assignment suppression flag</b>, which is
141the star character ('*'), immediately after the leading '%' of a format
142specifier (just before the field width, if any).
143
144==Examples
145
146See the files <tt>EXAMPLES</tt> and <tt>tests/scanftests.rb</tt>.
147
148==scanf for Ruby compared with scanf in C
149
150scanf for Ruby is based on the C function scanf(3), but with modifications,
151dictated mainly by the underlying differences between the languages.
152
153===Unimplemented flags and specifiers
154
155* The only flag implemented in scanf for Ruby is '<tt>*</tt>' (ignore
156 upcoming conversion). Many of the flags available in C versions of scanf(4)
157 have to do with the type of upcoming pointer arguments, and are literally
158 meaningless in Ruby.
159
160* The <tt>n</tt> specifier (store number of characters consumed so far in
161 next pointer) is not implemented.
162
163* The <tt>p</tt> specifier (match a pointer value) is not implemented.
164
165===Altered specifiers
166
167[o,u,x,X]
168 In scanf for Ruby, all of these specifiers scan for an optionally signed
169 integer, rather than for an unsigned integer like their C counterparts.
170
171===Return values
172
173scanf for Ruby returns an array of successful conversions, whereas
174scanf(3) returns the number of conversions successfully
175completed. (See below for more details on scanf for Ruby's return
176values.)
177
178==Return values
179
180Without a block, scanf returns an array containing all the conversions
181it has found. If none are found, scanf will return an empty array. An
182unsuccesful match is never ignored, but rather always signals the end
183of the scanning operation. If the first unsuccessful match takes place
184after one or more successful matches have already taken place, the
185returned array will contain the results of those successful matches.
186
187With a block scanf returns a 'map'-like array of transformations from
188the block -- that is, an array reflecting what the block did with each
189yielded result from the iterative scanf operation. (See "Block
190usage", above.)
191
192==Test suite
193
194scanf for Ruby includes a suite of unit tests (requiring the
195<tt>TestUnit</tt> package), which can be run with the command <tt>ruby
196tests/scanftests.rb</tt> or the command <tt>make test</tt>.
197
198==Current limitations and bugs
199
200When using IO#scanf under Windows, make sure you open your files in
201binary mode:
202
203 File.open("filename", "rb")
204
205so that scanf can keep track of characters correctly.
206
207Support for character classes is reasonably complete (since it
208essentially piggy-backs on Ruby's regular expression handling of
209character classes), but users are advised that character class testing
210has not been exhaustive, and that they should exercise some caution
211in using any of the more complex and/or arcane character class
212idioms.
213
214
215==Technical notes
216
217===Rationale behind scanf for Ruby
218
219The impetus for a scanf implementation in Ruby comes chiefly from the fact
220that existing pattern matching operations, such as Regexp#match and
221String#scan, return all results as strings, which have to be converted to
222integers or floats explicitly in cases where what's ultimately wanted are
223integer or float values.
224
225===Design of scanf for Ruby
226
227scanf for Ruby is essentially a <format string>-to-<regular
228expression> converter.
229
230When scanf is called, a FormatString object is generated from the
231format string ("%d%s...") argument. The FormatString object breaks the
232format string down into atoms ("%d", "%5f", "blah", etc.), and from
233each atom it creates a FormatSpecifier object, which it
234saves.
235
236Each FormatSpecifier has a regular expression fragment and a "handler"
237associated with it. For example, the regular expression fragment
238associated with the format "%d" is "([-+]?\d+)", and the handler
239associated with it is a wrapper around String#to_i. scanf itself calls
240FormatString#match, passing in the input string. FormatString#match
241iterates through its FormatSpecifiers; for each one, it matches the
242corresponding regular expression fragment against the string. If
243there's a match, it sends the matched string to the handler associated
244with the FormatSpecifier.
245
246Thus, to follow up the "%d" example: if "123" occurs in the input
247string when a FormatSpecifier consisting of "%d" is reached, the "123"
248will be matched against "([-+]?\d+)", and the matched string will be
249rendered into an integer by a call to to_i.
250
251The rendered match is then saved to an accumulator array, and the
252input string is reduced to the post-match substring. Thus the string
253is "eaten" from the left as the FormatSpecifiers are applied in
254sequence. (This is done to a duplicate string; the original string is
255not altered.)
256
257As soon as a regular expression fragment fails to match the string, or
258when the FormatString object runs out of FormatSpecifiers, scanning
259stops and results accumulated so far are returned in an array.
260
261==License and copyright
262
263Copyright:: (c) 2002-2003 David Alan Black
264License:: Distributed on the same licensing terms as Ruby itself
265
266==Warranty disclaimer
267
268This software is provided "as is" and without any express or implied
269warranties, including, without limitation, the implied warranties of
270merchantibility and fitness for a particular purpose.
271
272==Credits and acknowledgements
273
274scanf for Ruby was developed as the major activity of the Austin
275Ruby Codefest (Austin, Texas, August 2002).
276
277Principal author:: David Alan Black (mailto:[email protected])
278Co-author:: Hal Fulton (mailto:[email protected])
279Project contributors:: Nolan Darilek, Jason Johnston
280
281Thanks to Hal Fulton for hosting the Codefest.
282
283Thanks to Matz for suggestions about the class design.
284
285Thanks to Gavin Sinclair for some feedback on the documentation.
286
287The text for parts of this document, especially the Description and
288Conversions sections, above, were adapted from the Linux Programmer's
289Manual manpage for scanf(3), dated 1995-11-01.
290
291==Bugs and bug reports
292
293scanf for Ruby is based on something of an amalgam of C scanf
294implementations and documentation, rather than on a single canonical
295description. Suggestions for features and behaviors which appear in
296other scanfs, and would be meaningful in Ruby, are welcome, as are
297reports of suspicious behaviors and/or bugs. (Please see "Credits and
298acknowledgements", above, for email addresses.)
299
300=end
301
302module Scanf
303
304 class FormatSpecifier
305
306 attr_reader :re_string, :matched_string, :conversion, :matched
307
308 private
309
310 def skip; /^\s*%\*/.match(@spec_string); end
311
312 def extract_float(s); s.to_f if s &&! skip; end
313 def extract_decimal(s); s.to_i if s &&! skip; end
314 def extract_hex(s); s.hex if s &&! skip; end
315 def extract_octal(s); s.oct if s &&! skip; end
316 def extract_integer(s); Integer(s) if s &&! skip; end
317 def extract_plain(s); s unless skip; end
318
319 def nil_proc(s); nil; end
320
321 public
322
323 def to_s
324 @spec_string
325 end
326
327 def count_space?
328 /(?:\A|\S)%\*?\d*c|\[/.match(@spec_string)
329 end
330
331 def initialize(str)
332 @spec_string = str
333 h = '[A-Fa-f0-9]'
334
335 @re_string, @handler =
336 case @spec_string
337
338 # %[[:...:]]
339 when /%\*?(\[\[:[a-z]+:\]\])/
340 [ "(#{$1}+)", :extract_plain ]
341
342 # %5[[:...:]]
343 when /%\*?(\d+)(\[\[:[a-z]+:\]\])/
344 [ "(#{$2}{1,#{$1}})", :extract_plain ]
345
346 # %[...]
347 when /%\*?\[([^\]]*)\]/
348 yes = $1
349 if /^\^/.match(yes) then no = yes[1..-1] else no = '^' + yes end
350 [ "([#{yes}]+)(?=[#{no}]|\\z)", :extract_plain ]
351
352 # %5[...]
353 when /%\*?(\d+)\[([^\]]*)\]/
354 yes = $2
355 w = $1
356 [ "([#{yes}]{1,#{w}})", :extract_plain ]
357
358 # %i
359 when /%\*?i/
360 [ "([-+]?(?:(?:0[0-7]+)|(?:0[Xx]#{h}+)|(?:[1-9]\\d+)))", :extract_integer ]
361
362 # %5i
363 when /%\*?(\d+)i/
364 n = $1.to_i
365 s = "("
366 if n > 1 then s += "[1-9]\\d{1,#{n-1}}|" end
367 if n > 1 then s += "0[0-7]{1,#{n-1}}|" end
368 if n > 2 then s += "[-+]0[0-7]{1,#{n-2}}|" end
369 if n > 2 then s += "[-+][1-9]\\d{1,#{n-2}}|" end
370 if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
371 if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
372 s += "\\d"
373 s += ")"
374 [ s, :extract_integer ]
375
376 # %d, %u
377 when /%\*?[du]/
378 [ '([-+]?\d+)', :extract_decimal ]
379
380 # %5d, %5u
381 when /%\*?(\d+)[du]/
382 n = $1.to_i
383 s = "("
384 if n > 1 then s += "[-+]\\d{1,#{n-1}}|" end
385 s += "\\d{1,#{$1}})"
386 [ s, :extract_decimal ]
387
388 # %x
389 when /%\*?[Xx]/
390 [ "([-+]?(?:0[Xx])?#{h}+)", :extract_hex ]
391
392 # %5x
393 when /%\*?(\d+)[Xx]/
394 n = $1.to_i
395 s = "("
396 if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
397 if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
398 if n > 1 then s += "[-+]#{h}{1,#{n-1}}|" end
399 s += "#{h}{1,#{n}}"
400 s += ")"
401 [ s, :extract_hex ]
402
403 # %o
404 when /%\*?o/
405 [ '([-+]?[0-7]+)', :extract_octal ]
406
407 # %5o
408 when /%\*?(\d+)o/
409 [ "([-+][0-7]{1,#{$1.to_i-1}}|[0-7]{1,#{$1}})", :extract_octal ]
410
411 # %f
412 when /%\*?f/
413 [ '([-+]?((\d+(?>(?=[^\d.]|$)))|(\d*(\.(\d*([eE][-+]?\d+)?)))))', :extract_float ]
414
415 # %5f
416 when /%\*?(\d+)f/
417 [ "(\\S{1,#{$1}})", :extract_float ]
418
419 # %5s
420 when /%\*?(\d+)s/
421 [ "(\\S{1,#{$1}})", :extract_plain ]
422
423 # %s
424 when /%\*?s/
425 [ '(\S+)', :extract_plain ]
426
427 # %c
428 when /\s%\*?c/
429 [ "\\s*(.)", :extract_plain ]
430
431 # %c
432 when /%\*?c/
433 [ "(.)", :extract_plain ]
434
435 # %5c (whitespace issues are handled by the count_*_space? methods)
436 when /%\*?(\d+)c/
437 [ "(.{1,#{$1}})", :extract_plain ]
438
439 # %%
440 when /%%/
441 [ '(\s*%)', :nil_proc ]
442
443 # literal characters
444 else
445 [ "(#{Regexp.escape(@spec_string)})", :nil_proc ]
446 end
447
448 @re_string = '\A' + @re_string
449 end
450
451 def to_re
452 Regexp.new(@re_string,Regexp::MULTILINE)
453 end
454
455 def match(str)
456 @matched = false
457 s = str.dup
458 s.sub!(/\A\s+/,'') unless count_space?
459 res = to_re.match(s)
460 if res
461 @conversion = send(@handler, res[1])
462 @matched_string = @conversion.to_s
463 @matched = true
464 end
465 res
466 end
467
468 def letter
469 /%\*?\d*([a-z\[])/.match(@spec_string).to_a[1]
470 end
471
472 def width
473 w = /%\*?(\d+)/.match(@spec_string).to_a[1]
474 w && w.to_i
475 end
476
477 def mid_match?
478 return false unless @matched
479 cc_no_width = letter == '[' &&! width
480 c_or_cc_width = (letter == 'c' || letter == '[') && width
481 width_left = c_or_cc_width && (matched_string.size < width)
482
483 return width_left || cc_no_width
484 end
485
486 end
487
488 class FormatString
489
490 attr_reader :string_left, :last_spec_tried,
491 :last_match_tried, :matched_count, :space
492
493 SPECIFIERS = 'diuXxofeEgsc'
494 REGEX = /
495 # possible space, followed by...
496 (?:\s*
497 # percent sign, followed by...
498 %
499 # another percent sign, or...
500 (?:%|
501 # optional assignment suppression flag
502 \*?
503 # optional maximum field width
504 \d*
505 # named character class, ...
506 (?:\[\[:\w+:\]\]|
507 # traditional character class, or...
508 \[[^\]]*\]|
509 # specifier letter.
510 [#{SPECIFIERS}])))|
511 # or miscellaneous characters
512 [^%\s]+/ix
513
514 def initialize(str)
515 @specs = []
516 @i = 1
517 s = str.to_s
518 return unless /\S/.match(s)
519 @space = true if /\s\z/.match(s)
520 @specs.replace s.scan(REGEX).map {|spec| FormatSpecifier.new(spec) }
521 end
522
523 def to_s
524 @specs.join('')
525 end
526
527 def prune(n=matched_count)
528 n.times { @specs.shift }
529 end
530
531 def spec_count
532 @specs.size
533 end
534
535 def last_spec
536 @i == spec_count - 1
537 end
538
539 def match(str)
540 accum = []
541 @string_left = str
542 @matched_count = 0
543
544 @specs.each_with_index do |spec,@i|
545 @last_spec_tried = spec
546 @last_match_tried = spec.match(@string_left)
547 break unless @last_match_tried
548 @matched_count += 1
549
550 accum << spec.conversion
551
552 @string_left = @last_match_tried.post_match
553 break if @string_left.empty?
554 end
555 return accum.compact
556 end
557 end
558end
559
560class IO
561
562# The trick here is doing a match where you grab one *line*
563# of input at a time. The linebreak may or may not occur
564# at the boundary where the string matches a format specifier.
565# And if it does, some rule about whitespace may or may not
566# be in effect...
567#
568# That's why this is much more elaborate than the string
569# version.
570#
571# For each line:
572# Match succeeds (non-emptily)
573# and the last attempted spec/string sub-match succeeded:
574#
575# could the last spec keep matching?
576# yes: save interim results and continue (next line)
577#
578# The last attempted spec/string did not match:
579#
580# are we on the next-to-last spec in the string?
581# yes:
582# is fmt_string.string_left all spaces?
583# yes: does current spec care about input space?
584# yes: fatal failure
585# no: save interim results and continue
586# no: continue [this state could be analyzed further]
587#
588#
589
590 def scanf(str,&b)
591 return block_scanf(str,&b) if b
592 return [] unless str.size > 0
593
594 start_position = pos rescue 0
595 matched_so_far = 0
596 source_buffer = ""
597 result_buffer = []
598 final_result = []
599
600 fstr = Scanf::FormatString.new(str)
601
602 loop do
603 if eof || (tty? &&! fstr.match(source_buffer))
604 final_result.concat(result_buffer)
605 break
606 end
607
608 source_buffer << gets
609
610 current_match = fstr.match(source_buffer)
611
612 spec = fstr.last_spec_tried
613
614 if spec.matched
615 if spec.mid_match?
616 result_buffer.replace(current_match)
617 next
618 end
619
620 elsif (fstr.matched_count == fstr.spec_count - 1)
621 if /\A\s*\z/.match(fstr.string_left)
622 break if spec.count_space?
623 result_buffer.replace(current_match)
624 next
625 end
626 end
627
628 final_result.concat(current_match)
629
630 matched_so_far += source_buffer.size
631 source_buffer.replace(fstr.string_left)
632 matched_so_far -= source_buffer.size
633 break if fstr.last_spec
634 fstr.prune
635 end
636 seek(start_position + matched_so_far, IO::SEEK_SET) rescue Errno::ESPIPE
637 soak_up_spaces if fstr.last_spec && fstr.space
638
639 return final_result
640 end
641
642 private
643
644 def soak_up_spaces
645 c = getc
646 ungetc(c) if c
647 until eof ||! c || /\S/.match(c.chr)
648 c = getc
649 end
650 ungetc(c) if (c && /\S/.match(c.chr))
651 end
652
653 def block_scanf(str)
654 final = []
655# Sub-ideal, since another FS gets created in scanf.
656# But used here to determine the number of specifiers.
657 fstr = Scanf::FormatString.new(str)
658 last_spec = fstr.last_spec
659 begin
660 current = scanf(str)
661 break if current.empty?
662 final.push(yield(current))
663 end until eof || fstr.last_spec_tried == last_spec
664 return final
665 end
666end
667
668class String
669
670 def scanf(fstr,&b)
671 if b
672 block_scanf(fstr,&b)
673 else
674 fs =
675 if fstr.is_a? Scanf::FormatString
676 fstr
677 else
678 Scanf::FormatString.new(fstr)
679 end
680 fs.match(self)
681 end
682 end
683
684 def block_scanf(fstr,&b)
685 fs = Scanf::FormatString.new(fstr)
686 str = self.dup
687 final = []
688 begin
689 current = str.scanf(fs)
690 final.push(yield(current)) unless current.empty?
691 str = fs.string_left
692 end until current.empty? || str.empty?
693 return final
694 end
695end
696
697module Kernel
698 private
699 def scanf(fs,&b)
700 STDIN.scanf(fs,&b)
701 end
702end
Note: See TracBrowser for help on using the repository browser.