I was glad to see this problem submitted. I worked on it a ways back
when I was
translating examples from Higher-Order Perl. It shows up in the book
during the
discussion of infinite streams.
Watching others solve it was great fun because I realized that Ruby Q.
fans
are crazy. A couple of you implemented an almost scary amount of the
regular
expression syntax.
Probably the craziest solution comes from Jesús who even shoehorned in
backreferences. Let’s take a look at parts of that lengthy solution
below.
First, some limitations:
Number of times to repeat for Star and Plus repeaters
TIMES = 2
Set of chars for Dot and negated [^] char groups
#CHARS = [(“a”…“z”).to_a, (“A”…“Z”).to_a, “.”, “,”, “;”].flatten
CHARS = %w{a b c d e}
…
Many regular expressions can match an infinite number of Strings.
Consider the
trivial expression /a+/. It matches “a”, “aa”, “aaa”, etc. There are
two ways
to handle this practically: limit the repetition or provide an infinite
series
of matches user code can explore and stop as needed. Jesús did the
former.
These variables control how much repetition is done and what character
set is
used for constructs that match any character or any character not
included in a
listed subset.
If your are curious to see the infinite streams approach, I wrote about
it as
part of this blog post:
That solution doesn’t include the regular expression parsing code
though.
Getting back to Jesús’s code, the next section defines several classes
in the
categories of repeaters and groups. Groups are pieces of a matched
String that
can be generated and repeaters control how many times those groups
appear.
Let’s begin with a trivial repeater that doesn’t really repeat:
…
class OneTimeRepeater
def initialize(group)
@group = group
end
def result
@group.result
end
end
…
As, you can see, this repeater just wraps some group. The result() of
this
repeater is the result of that group, one time. Obviously this is used
for
things that don’t repeat.
The next repeater should be plenty familiar to regular expression fans
though:
…
class StarRepeater
def initialize(group)
@group = group
end
def result
r = []
group_res = @group.result
group_res.unshift("")
TIMES.times do
r << group_res
end
combine(r).uniq
end
end
…
This would be the repeater used to handle expressions like /a*/. The
result of
this repeater is zero or more occurrences of the group it was passed.
Practically speaking here that means between one and TIMES copies of the
group
or the empty String to represent zero. We see this collected into an
Array here
and passed on to combine() for generation.
The PlusRepeater, QuestionMarkRepeater and RangeRepeater classes are
constructed
similarly. In the interests of space and time, I’m not going to show
those
here.
Now we are ready for the groups. Again we begin with the trivial cases:
…
class SingleChar
def initialize(c)
@c = c
end
def result
[@c]
end
end
…
class Dot
def result
CHARS
end
end
…
Here we see groups that echo a character and represent a special range
of
characters as their results. Obviously the first is used for literal
chunks of
an expression, like /a/, and the second is used for the special regular
expression character /./.
Let’s examine the group used to represent alternations like /a|b/:
…
class OrGroup
def initialize(first_groupset, second_groupset)
@first = first_groupset
@second = second_groupset
end
def result
strings = @first.map {|x| x.result}
s = combine(strings)
strings = @second.map {|x| x.result}
s.concat(combine(strings))
end
end
…
Since the result()s of a group are just the possible generations of that
group,
alternation can be represented as the left choices plus the right
choices.
That’s what we see here.
Jesús’s code includes other groups: CharGroup for character classes,
MultiGroup for nesting groups and managing reference counts, and
BackReference
for supporting the regular expression feature of the same name. I’m
going to
skip over these too, to keep the size of this summary reasonable.
Now let’s have a look at the combine() method I’ve glossed over twice
now:
…
Combines arrays, concatenating each string
merging the possible groups they have
Starts combining the first two arrays, then goes on
combining each other array to the result of the
previous combination
def combine(arrays)
string = arrays.inject do |r, rep|
temp = []
r.each {|aa| rep.each {|bb| temp <<
(aa.concat_and_merge_groups(bb))}}
temp
end
string
end
class String
attr_accessor :filled_groups
def add_filled_group(num, group)
@filled_groups ||= {}
@filled_groups[num] = group
end
def concat_and_merge_groups(other)
temp = self + other
groups = {}
groups.merge!(self.filled_groups) if self.filled_groups
groups.merge!(other.filled_groups) if other.filled_groups
temp.filled_groups = groups
temp
end
end
…
The combine() method just turns an Array of group result()s into the
actual
combinations of Strings. So given [%w[a b], %w[c d]] it outputs %w[ac
ad bc
bd].
The String additions here that are tracking groups were added to support
backreferences. This was a challenging feature to support and it
required
hooking into the code at many levels. I don’t want to focus on it too
much
though since it wasn’t a critical part of the quiz so much as an
impressive
extra Jesús managed to include.
The next method we need to look at is the parser. I’m going to trim
some of it
because it’s quite long, but you should still get to see how it’s put
together:
…
class Regexp
attr_reader :num_groups
def parse(s, i = 0)
repeaters = []
group = nil
while i < s.length
char = s[i].chr
case char
# …
when ‘.’
group = Dot.new
when ‘|’
groups, i = parse(s, i + 1)
group = OrGroup.new(repeaters, groups)
return [group], i
# …
else
group = SingleChar.new(char)
end
repeater = nil
i += 1
if i < s.length
case s[i].chr
when '*'
repeater = StarRepeater.new(group)
# ...
else
repeater = OneTimeRepeater.new(group)
i -= 1
end
i += 1
else
repeater = OneTimeRepeater.new(group)
end
repeaters << repeater
end
return repeaters, i
end
# ...
As you can see, this is a recursive character by character parser. It
reads
through the expression finding groups and wrapping those in repeaters,
with
whatever level of nesting is required. This builds up an Abstract
Syntax Tree
for the expression.
Let’s see how this gets put to use to create the quiz solution method:
# ...
def generate
@num_groups = 0
r = self.inspect[1..-2]
repeaters, _ = self.parse(r)
strings = repeaters.map {|x| x.result}
s = combine(strings)
# Makes a pass for the backreferences
s.each do |string|
string.gsub!(/__(\d+)__/) do |match|
string.filled_groups[$1.to_i]
end
end
s
end
end
The process here is very simple: pull the source of the expression,
parse it
into the AST, generate the result() of the AST, and combine() those
Strings into
the needed Array of matches. Again, the rest of the code here is used
to
substitute backreferences back into the end results.
Jesús also included another method that pretty-printed and verified
results,
but I won’t go into that here.
As usual I don’t want you to miss out on the other solutions. James
Koppel used
currying to build up an AST of functions. Vasil V. sent in a
slow but
unique approach that works on all regular expressions without even
parsing them.
Do take the time to inspect the solutions. It’s worth it.
My thanks to all who poured so much effort into this quiz. You are all
fearless.
Tomorrow we will take a stab at slicing up time…