I am looking for the best way to break an input string into individual
tokens (I do not want to use a lexer library); I found some Ruby
programs that do it by “nibbling” at the string, like this (for
simplicity, the tokens are simply printed):
str = “20 * sin(x) + …”
while (s.length > 0)
if str.sub!(\A\s*(\d+)/) { |m| puts “nr: #{m}” ; ‘’ }
elsif str.sub!(\A\s*(\w+)/) { |m| puts “func: #{m}” ; ‘’ }
This works, but it is very inefficient as the string has to be
continuously modified (a variation is to use str.match and then set str
= post_match, that is
probably even worse).
I was looking for the equivalent of what Perl calls “walking the string”
(if $str =~ /\G …/gcxms), picking up one token at the time at the point
after the previous one was retrieved.
I saw in the Pickaxe the mention of \G with scan; but I could not make
scan work ‘one token at the time’; I had to list all the tokens as
argument, and then I had to find out which token had hit, ie:
str.scan(/\G\s* (\d+ | []| [+] | [(] | …)/xm) do |m|
if m[0].match(/A\d+\z/) then puts “number: #{m}”
elsif m[0].match(/A[]\z/) then puts “power: #{m}”
…
It worked perfectly (almost to my surprise!); but it seems funny (unRuby
like) to have to repeat the tokens (even if in my real code I used
regexp vars to avoid hardcoding them twice, it still is a repetition).
I looked at 4 Ruby books and I found only platitudes on the subject (or
references to libraries). I would love to hear an elegant way to solve
this,
thanks!
Raul