Ruby_parser 2.0.0 Released

aguynamedryan · October 23, 2008, 6:41am

ruby_parser version 2.0.0 has been released!

http://rubyforge.org/projects/parsetree/

ruby_parser (RP) is a ruby parser written in pure ruby (utilizing
racc–which does by default use a C extension). RP’s output is
the same as ParseTree’s output: s-expressions using ruby’s arrays and
base types.

Changes:

=== 2.0.0 / 2008-10-22

1 major enhancement
- Brought on the AWESOME! 4x faster! no known lexing/parsing bugs!
71 minor enhancements
- 1.9: Added Fixnum#ord.
- 1.9: Added missing Regexp constants and did it so it’d work on 1.9.
- Added #store_comment and #comments
- Added StringScanner #begin_of_line?
- Added a bunch of tests for regexp escape chars, #parse_string,
  #read_escape, ? numbers, ? whitespace.
- Added a hack for rubinius’ r2l eval bug.
- Added a new token type tSTRING that bypasses tSTRING_BEG/END
  entirely. Only does non-interpolated strings and then falls back to
  the old way. MUCH cleaner tho.
- Added bin/ruby_parse
- Added compare rule to Rakefile.
- Added coverage files/dirs to clean rule.
- Added file and line numbers to all sexp nodes. Column/ranges to
  come.
- Added lex_state change for lvars at the end of yylex.
- Added lexed comments to defn/defs/class/module nodes.
- Added stats gathering for yylex. Reordered yylex for avg data
- Added tSYMBOL token type and parser rule to speed up symbol lexing.
- Added tally output for getch, unread, and unread_many.
- Added tests for ambigous uminus/uplus, backtick in cmdarg, square
  and curly brackets, numeric gvars, eos edge cases, string quoting %<>
  and %%%.
- All cases throughout yylex now return directly if they match, no
  passthroughs.
- All lexer cases now slurp entire token in one swoop.
- All zarrays are now just empty arrays.
- Changed s(:block_arg, :blah) to :“&blah” in args sexp.
- Cleaned up lexer error handling. Now just raises all over.
- Cleaned up read_escape and regx_options
- Cleaned up tokadd_string (for some definition of cleaned).
- Converted single quoted strings to new tSTRING token type.
- Coverage is currently 94.4% on lexer.
- Done what I can to clean up heredoc lexing… still sucks.
- Flattened resbodies in rescue node. Fixed .autotest file.
- Folded lex_keywords back in now that it screams.
- Found very last instanceof ILiteralNode in the code. haha!
- Got the tests subclassing PTTC and cleaned up a lot. YAY
- Handle yield(*ary) properly
- MASSIVELY cleaned out =begin/=end comment processor.
- Massive overhaul on Keyword class. All hail the mighty Hash!
- Massively cleaned up ident= edge cases and fixed a stupid bug
  from jruby.
- Merged @/@@ scanner together, going to try to do the same
  everywhere.
- Refactored fix_arg_lex_state, common across the lexer.
- Refactored new_fcall into new_call.
- Refactored some code to get better profile numbers.
- Refactored some more #fix_arg_lex_state.
- Refactored tail of yylex into its own method.
- Removed Module#kill
- Removed Token, replaced with Sexp.
- Removed all parse_number and parse_quote tests.
- Removed argspush, argscat. YAY!
- Removed as many token_buffer.split(//)'s as possible. 1 to go.
- Removed begins from compstmts
- Removed buffer arg for tokadd_string.
- Removed crufty (?) solo ‘@’ token… wtf was that anyhow?
- Removed most jruby/stringio cruft from StringScanner.
- Removed one unread_many… 2 to go. They’re harder.
- Removed store_comment, now done directly.
- Removed token_buffer. Now I just use token ivar.
- Removed use of s() from lexer. Changed the way line numbers are
  gathered.
- Renamed *qwords to *awords.
- Renamed StringScanner to RPStringScanner (a subclass) to fix
  namespace trashing.
- Renamed parse to process and aliased to parse.
- Renamed token_buffer to string_buffer since that arcane shit
  still needs it.
- Resolved the rest of the lexing issues I brought up w/ ruby-core.
- Revamped tokadd_escape.
- Rewrote Keyword and KWtable.
- Rewrote RubyLexer using StringScanner.
- Rewrote tokadd_escape. 79 lines down to 21.
- Split out lib/ruby_parser_extras.rb so lexer is standalone.
- Started to clean up the parser and make it as skinny as possible
- Stripped out as much code as possible.
- Stripped yylex of some dead code.
- Switched from StringIO to StringScanner.
- Updated rakefile for new hoe.
- Uses pure ruby racc if ENV[‘PURE_RUBY’], otherwise use c.
- Wrote a ton of lexer tests. Coverage is as close to 100% as
  possible.
- Wrote args to clean up the big nasty args processing grammar
  section.
- lex_strterm is now a plain array, removed RubyLexer#s(…).
- yield and super now flatten args.
21+ bug fixes:
- I’m sure this list is missing a lot:
- Fixed 2 bugs both involving attrasgn (and ilk) esp when lhs is an
  array.
- Fixed a bug in the lexer for strings with single digit hex escapes.
- Fixed a bug parsing: a (args) { expr }… the space caused a
  different route to be followed and all hell broke loose.
- Fixed a bug with x\n=beginvar not putting begin back.
- Fixed attrasgn to have arglists, not arrays.
- Fixed bug in defn/defs with block fixing.
- Fixed class/module’s name slot if colon2/3.
- Fixed dstr with empty interpolation body.
- Fixed for 1.9 string/char changes.
- Fixed lexer BS wrt determining token type of words.
- Fixed lexer BS wrt pass through values and lexing words. SO STUPID.
- Fixed lexing of floats.
- Fixed lexing of identifiers followed by equals. I hope.
- Fixed masgn with splat on lhs
- Fixed new_super to deal with block_pass correctly.
- Fixed parser’s treatment of :colon2 and :colon3.
- Fixed regexp scanning of escaped numbers, ANY number is valid,
  not just octs.
- Fixed string scanning of escaped octs, allowing 1-3 chars.
- Fixed unescape for \n
- Fixed: omg this is stupid. ‘()’ was returning bare nil
- Fixed: remove_begin now goes to the end, not sure why it didn’t
  before.

aguynamedryan · October 23, 2008, 8:35am

This is amazing work. The Ruby collective superorganism thanks you.

Since parse_tree is pure ruby, and is therefore platform-independent and
ruby-implementation-independent, I suppose my default choice is
ruby_parser over ParseTree. Is there any reason to use ParseTree
instead of ruby_parser, other than perhaps speed considerations?

I’ve not used this stuff before, so pardon my ignorance. I can make
sexprs from ruby, and I see ruby2ruby which will convert back to ruby.
However is there a way to execute sexprs directly? Or is that something
rubinius will do?

The possibilities boggle the mind. In principle I can write a Lisp
program which executes the sexpr output of parse_tree. Now I have
another Ruby interpreter. And if it’s SBCL, I can generate an
executable out of that. Now I have a stand-alone ruby interpreter
executable. For the platform-specific stuff, a single Lisp
implementation would need to be chosen (probably SBCL).

And then there is the possibility of ruby macros. I do see a
newly-created defmacro project on rubyforge, however it’s currently just
30 lines of code. Do you have any suggestions on what places to look
for this kind of thing? I want to be sure I’m not reinventing or
co-inventing the wheel.

As a practical example, suppose I want to take an .rb file and
surgically remove all calls to Kernel#log. That shouldn’t be hard,
right? Now I need to set up a separate “compilation” phase, assuming I
want to keep my Kernel#log calls in the source.

Once that’s done, here comes defmacro. The compilation phase looks out
for defmacro, and does an inline substitution according to whatever
rules we decide. The next step is to make a post to comp.lang.lisp
saying, “suck it!”.

–Mike G.

P.S. Don’t let the nattering numbskulls of nitwittery bother you.

aguynamedryan · October 27, 2008, 9:31am

On Oct 22, 2008, at 23:33 , Mike G. wrote:

This is amazing work. The Ruby collective superorganism thanks you.

thanks!

Since parse_tree is pure ruby, and is therefore platform-independent
and
ruby-implementation-independent, I suppose my default choice is
ruby_parser over ParseTree. Is there any reason to use ParseTree
instead of ruby_parser, other than perhaps speed considerations?

That’s certainly the direction I’m going in for my tools. The only
reason to use PT over RP besides speed is getting sexps from procs and
other instances (methods, modules, classes). But most of the projects
that used to do that (Ambition, etc) are weening off of it for various
reasons. We don’t have a migration solution for 1.9 at all, and I
don’t see anyone coming up with one anytime soon.

I’ve not used this stuff before, so pardon my ignorance. I can make
sexprs from ruby, and I see ruby2ruby which will convert back to ruby.
However is there a way to execute sexprs directly? Or is that
something
rubinius will do?

rubinius doesn’t execute sexps directly. It complies them into
bytecode and then the VM runs that. It is certainly possible. That’s
what ruby (<1.9) does with the internal ASTs.

The possibilities boggle the mind. In principle I can write a Lisp
program which executes the sexpr output of parse_tree. Now I have
another Ruby interpreter. And if it’s SBCL, I can generate an
executable out of that. Now I have a stand-alone ruby interpreter
executable. For the platform-specific stuff, a single Lisp
implementation would need to be chosen (probably SBCL).

Absolutely. See what Luis L. has done with nekovm:

http://blog.mmediasys.com/2008/10/19/experiment-ruby-to-neko-
possible/

Absolutely rad stuff, and probably only took a couple hours of work
for the PoC.

And then there is the possibility of ruby macros. I do see a
newly-created defmacro project on rubyforge, however it’s currently
just
30 lines of code. Do you have any suggestions on what places to look
for this kind of thing? I want to be sure I’m not reinventing or
co-inventing the wheel.

There have been a few. Caleb just announced one for his parser red-
parse, but I couldn’t get it to work. In that thread you can find
these as well:

http://blog.drewolson.org/2008/06/ruby-and-macros-experiment.html
http://weblog.raganwald.com/2008/06/macros-hygiene-and-call-by-name-in-ruby.html

As a practical example, suppose I want to take an .rb file and
surgically remove all calls to Kernel#log. That shouldn’t be hard,
right? Now I need to set up a separate “compilation” phase,
assuming I
want to keep my Kernel#log calls in the source.

No, it isn’t hard at all… and doesn’t even require anything as
“scary” as macros. A simple SexpProcessor + ruby2ruby + eval will
suffice to preprocess (or post… you can do all of this stuff on the
fly) the code.

class StripLogging < SexpProcessor

…

def process_call exp
return super unless exp[2] == :log
s(:nil)
end
end

that would prolly do it… with some extra stuff.

Once that’s done, here comes defmacro. The compilation phase looks
out
for defmacro, and does an inline substitution according to whatever
rules we decide. The next step is to make a post to comp.lang.lisp
saying, “suck it!”.

um. the lispers (and smalltalkers) still have a few (actually, more
than a few) things on us.

aguynamedryan · October 28, 2008, 9:53am

Very, very cool.

Just a minor point: something odd went on with gem dependencies. After
“gem install ruby_parser” my first test of ruby_parse gave:

/usr/local/lib/site_ruby/1.8/rubygems.rb:578:in `report_activate_error’:
RubyGem version error: hoe(1.7.0 not >= 1.8.0) (Gem::LoadError)

A “gem update hoe” did the trick, but I would have thought that would
happen at install time.

This could just be a bug in rubygems, as the dependencies seem to be
declared properly:

$ gem dependency ruby_parser
Gem ruby_parser-2.0.0
ParseTree (>= 0, runtime)
sexp_processor (>= 3.0.0, runtime)
hoe (>= 1.8.0, runtime)

I have rubygems 1.2.0

Regards,

Brian.

aguynamedryan · October 28, 2008, 2:19am

Ryan D. wrote:

ruby_parser version 2.0.0 has been released!

Rock on. I can finally migrate my projects to 1.9, since they depend on
ruby_parser
Maybe I can forget about 1.8 altogether now…oh wait except there’s
still no ruby debug for 1.9–that’s all that lacks now.

Thanks!
-=R

aguynamedryan · October 29, 2008, 3:39am

$ gem dependency ruby_parser
Gem ruby_parser-2.0.0
ParseTree (>= 0, runtime)
sexp_processor (>= 3.0.0, runtime)
hoe (>= 1.8.0, runtime)

I have rubygems 1.2.0

Hoe should be a developer dependency rather than runtime, right?

–
http://jeremymcanally.com/
http://entp.com/

My books:

http://humblelittlerubybook.com/ (FREE!)

aguynamedryan · October 29, 2008, 3:54am

On Oct 28, 2008, at 8:38 PM, Jeremy McAnally wrote:

Hoe should be a developer dependency rather than runtime, right?

That works correctly with rubygems >= 1.3.0

Hoe is a runtime dependency for rubygems < 1.3.0

Blessings,
TwP

aguynamedryan · October 29, 2008, 7:47am

On Tue, Oct 28, 2008 at 8:17 PM, Jeremy McAnally
[email protected] wrote:

Ahh! For some reason I thought the runtime/dev stuff was in 1.2.

It was, it just didn’t work.

~ j.

aguynamedryan · October 29, 2008, 4:18am

Ahh! For some reason I thought the runtime/dev stuff was in 1.2.

–Jeremy

On Tue, Oct 28, 2008 at 9:53 PM, Tim P. [email protected] wrote:

–
http://jeremymcanally.com/
http://entp.com/

My books:

http://humblelittlerubybook.com/ (FREE!)