I don’t know if it was the metaprogramming that scared people away
this week, or perhaps folks are away on summer vacations. In any case,
I’m going to summarize this week’s quiz by looking at the submission
from Matthias R.. The solution is, as Matthias indicates,
unexpectedly concise. “I guess that’s just the way Ruby works.”
Matthias’ code implements the Statistician
module in three parts,
each a class. Here is the first class, Rule
:
class Rule
def initialize(pattern)
@fields = []
pattern = Regexp.escape(pattern).gsub(/\\\[(.+?)\\\]/,
‘(?:\1)?’).
gsub(/<(.+?)>/) { @fields << $1; ‘(.+?)’ }
@regexp = Regexp.new(‘^’ + pattern + ‘$’)
end
def match(line)
@result = if md = @regexp.match(line)
Hash[*@fields.zip(md.captures).flatten]
end
end
def result
@result
end
end
Rule
makes use of regular expressions built-up as discussed in the
previous quiz, so I’m not going to discuss that here. I will point
out, though, the initialization of the @fields
member in the
initializer. Note the last gsub
call: it uses the block form of
gsub
.
gsub(/<(.+?)>/) { @fields << $1; '(.+?)' }
As the (.+?)
string is last evaluated in the block, that provides
the replacement in the string. However, makes use of the just-matched
expression to extract the field names. This avoids making a second
pass over the source string to get those fields names, and is arguably
simpler.
The match
method matches input lines against the regular expression,
returning nil if the input didn’t match, or a hash if it did. Field
names (@fields
) are first paired (zip
) with the matched values
(md.captures
), then flatten
-ed into a single array, finally
expanded (*
) and passed to a Hash
initializer that treats
alternate items as keys and values. The end result of Rule#match
,
when the input matches, is a hash that looks like this:
{ 'amount' => '108', 'name' => 'Tempest Warg' }
That hash is returned, but also stored internally into member
@result
for future reference, accessed by the last method, result
.
The next class is Reportable
:
class Reportable < OpenStruct
class << self
attr_reader :records
def inherited(klass)
klass.class_eval do
@rules, @records = [], []
end
super
end
def rule(pattern)
@rules << Rule.new(pattern)
end
def match(line)
if rule = @rules.find { |rule| rule.match(line) }
@records << self.new(rule.result)
end
end
end
end
This small class is the extent of the metaprogramming going on in the
solution, and it’s not much, though perhaps unfamiliar to some. Let’s
get into some of it. We’ll ignore the OpenStruct
inheritance for the
moment, coming back to it later.
Everything inside the Reportable
class is surrounded by a block that
opens with class << self
. There is a good summary on the Ruby T.
mailing list, but its use here can be summed up in two words:
class methods. The class << self
mechanism is not strictly about
class methods, but in this context it affects similar behavior.
Alternatively, these methods could have been defined in this manner:
class Reportable < OpenStruct
def Reportable.rule(pattern)
# etc.
end
def Reportable.match(line)
# etc.
end
# etc.
end
In the end, the class << self
mechanism is cleaner looking, and also
allows for use of attr_reader
in a natural way.
The next interesting bit is the inherited
method. This is a class
method, here implemented on Reportable
, that is called whenever
Reportable
is subclassed (which happens repeatedly in the client
code). It’s a convenient hook that allows the other bit of
metaprogramming to happen.
klass.class_eval do
@rules, @records = [], []
end
klass
is the class derived from Reportable
(i.e. our client’s
classes for future statistical analysis). Here, Matthias initializes
two members, both to empty arrays, in the scope of class klass
. This
serves to ensure that every class derived from Reportable
gets its
own, separate members, not shared with other Reportable
subclasses.
This could be done without metaprogramming, but would require effort
from the user.
class Reportable
# class methods here
end
class Offense < Reportable
@rules, @records = [], []
# rules, etc.
end
class Defense < Reportable
@rules, @records = [], []
# rules, etc.
end
If the client forgot to initialize those two members, or got the names
wrong, the class wouldn’t work, exceptions would be thrown, cats and
dogs living together… you get the idea.
You might consider defining those data members in the Reportable
class itself, like so:
class Reportable
@rules, @records = [], []
# class methods, without inherited
end
The problem with this is that every Reportable
subclass would now
share the same rules and records arrays: not the desired outcome.
In the end, the class_eval
used here, called from inherited
, is
the right way to do things. It provides a way for the superclass to
inject functionality into the subclass.
Getting back to functionality, Reportable#match
is straightforward,
but let me highlight one line:
@records << self.new(rule.result)
If you recall, result
returns a hash of field names to values. And
Reportable
is attempting to pass that hash to its own initializer,
of which none is defined. This is where OpenStruct
comes in.
OpenStruct “allows you to create data objects and set arbitrary
attributes.” And OpenStruct
provides an initializer that takes the
hash Matthias provides, and does the expected.
data = OpenStruct.new( {'amount' => '108', 'name' => 'Tempest Warg'}
)
p data.amount # → 108
p data.name # → Tempest Warg
By subclassing Reportable
from OpenStruct
, all of the client’s
classes will inherit the same behavior, which fulfills many of the
requirements provided in the class specification.
The final class, Reporter
, is pretty trivial. It reads through a
data source a line at a time, finding a matching rule (and creating
the appropriate record in the process) or adding the input line to
@unmatched
which the client can query later.
Next week we’ll take a short break from the Statistician for some
simple stuff. (Part III of Statistician will return in the not-distant
future.)