I have spent the last months to write an alternative Ruby grammar now
registered at rubyforge.org under the name “Ruby top down grammar”.
The grammar is hosting language neutral. It must be interpreted or
translated to be run, i.e. to parse something. Currently there are two
translators, one to Emacs Lisp, the other to C. Both produce recursive
descent parsers.
The whole grammar is written in 270 rules taking 1500 lines. It
unifies lexical and syntactic analyses.
Another popular naming for such a grammar is Parsing Expression
Grammar. I extended some forms I found absolutely necessary. It’s a
neutral and minimal but still practical grammar definition
language. The variation used here contains 31 different forms. The svn
repo contains a description of it.
Without being sure, I’d like to claim the grammar is close to cover
100% of the Ruby language. It does, for example, parse Ruby stdlib
completely.
If you get to play around with this grammar and discover valid Ruby
code that doesn’t parse correctly or not at all then please contact
me. Warnings are not yet implemented.
In direct comparison to the MRI parser there are some pros and cons.
The advantages are less complexity and more flexibility. I got default
values for block formals working (believed impossible with MRI parser;
but let me emphasize this grammar doesn’t primarily want to extend
Ruby. I was just seeking what’s possible).
Also this grammar is hosting language neutral, meaning, one could
write translators to arbitrary languages and then parse Ruby code from
that language. For example, when translated to (or interpreted by)
JavaScript you could colorify the Ruby snippets on your blog at the
client side. (This grammar project does not (yet) support JavaScript
and people have found another working solution to color Ruby code)
On the bad side parsers using this grammar work slower. Even the
faster of both implementations is many times slower than the MRI
parser.
Also, the abstract syntax trees produced by this grammar have a
different structure than the ones produced by MRI parser.
Currently there are two implementations. The first is a translator to
Emacs Lisp, the other a translator to C. Both care not only for
parsing but also for the construction of abstract syntax trees and
pretty printing.
They do both memoize temporary parsing results. This is more
complicated than in ordinary Packrat parsers, since I’ve introduced
something I call “modes” in the grammar definition language, which
needs to be cared for in memoizing. Also, there is a feature in there
to “mark” arbitrary parsed words and to check whether a word is
marked. (If you read till here you might have guessed it. It’s used
for local variables). That’s also something memoization has to take
account of.
If you are interested, take a look at
http://rubyforge.org/projects/ruby-tp-dw-gram/
You can get the Subversion repository with
svn checkout svn://rubyforge.org/var/svn/ruby-tp-dw-gram
To get you started fast: The command “make gram” in the checkout
directory should compile the C parser. Then you can pipe-in Ruby code
like with:
echo ‘foo(bar 6,7,8)’ | ./gram
Markus L.