Bytecode Compiler (#100)

On Wed, 2006-11-08 at 08:47 +0900, Justin B. wrote:

Some people started doing the Ruby quiz problems using Haskell, and
this was a perfect opportunity for me to learn some Haskell. So here’s
my solution below, in Haskell. It’s hard to test the byte code
interpretation but all the expression do evaluate to the correct
values.

If anyone has questions about the Haskell code, please let me know.
I’m just learning it and its really cool!

Aww man, now I’m gonna have to look into Haskell a bit more :slight_smile: I’ve been
putting it off for a while, but this is too cool to not have a play
with!

Because I’m currently Haskell-ignorant, however, I don’t know how to run
your solution :frowning: . What interpreter (or compiler?) should be used? I
have Hugs installed, but that complains about a “Syntax error in input
(unexpected backslash (lambda))” at line 1.

#!/usr/bin/env ruby

compiler.rb - Byte-code compiler for simple arithmetic expressions

Lou S. [email protected]

Wed Nov 8 20:33 EST 2006

Here’s my solution for Rubyquiz 100. Nothing too fancy on this one: I

went

for trying to make the shunting algorith as readable as possile.

As for the parsing, StringScanner is very nice, although it might have

been

overkill for this problem.

Thanks again to Ross and James for another fun quiz.

require ‘enumerator’
require ‘interp’
require ‘optparse’
require ‘strscan’

class Token
attr_reader :type, :value

def initialize(value)
@value = value
end

%w|number lparen rparen op|.each do |a|
module_eval %{ def #{a}?; false end }
end
end

class Paren < Token
def initialize(value, type)
super(value)
@type = type
end
def lparen?; @type == :lparen end
def rparen?; @type == :rparen end
end

class Number < Token
def initialize(value)
super(value.to_i)
end

def to_bc
code, fmt = ((-32768…32767).include? value) ? [0x01, ‘n’] : [0x02,
‘N’]

[code, *[value].pack(fmt).to_enum(:each_byte).to_a]
end
def number?; true end
end

class Op < Token
attr_reader :precedence

CodeTable = [:+, :-, :*, :**, :/, :%].to_enum(:each_with_index).
inject({}) {|h, (op,i)| h[op] = i + 0x0a; h}

def initialize(value,assoc,prec)
super(value.to_sym)
@assoc, @precedence = assoc, prec
end

%w|assoc lassoc rassoc|.each do |a|
module_eval %{
def #{a}?
@assoc == :#{a}
end
}
end

def op?; true end

def to_bc
CodeTable[value]
end
end

class Compiler
class << self

def compile(exp)
shunting_yard(exp).collect {|t| t.to_bc }.flatten
end

def tokens(i)
input = StringScanner.new(i)
until input.eos?
case
when t = input.scan(/\d+/) : yield Number.new(t)
when t = input.scan(/[(]/) : yield Paren.new(t, :lparen)
when t = input.scan(/[)]/) : yield Paren.new(t, :rparen)
when t = input.scan(/**/) : yield Op.new(t, :rassoc, 3)
when t = input.scan(%r<[%/]>) : yield Op.new(t, :lassoc, 2)
when t = input.scan(%r<[*]>) : yield Op.new(t, :assoc, 2)
when t = input.scan(%r<[-]>) : yield Op.new(t, :lassoc, 1)
when t = input.scan(%r<[+]>) : yield Op.new(t, :assoc, 1)
when input.scan(/\s+/) : # skip ws
else
raise RuntimeError, “Parse Error: near ‘#{input.peek(8)}’”
end
end
end

def shunting_yard(s)
stack, queue = [] , []
last_tok, negate = nil, false # detect unary minus
tokens(s) do |token|
case
when token.number?
queue << (negate ? Number.new(-token.value) : token)
negate = false
when token.op?
if !last_tok || (last_tok.op? || last_tok.lparen?) &&
(token.value == :slight_smile:
negate = true
else
while stack.size > 0 and stack.last.op?
other_op = stack.last
if ( token.assoc? || token.lassoc? and
token.precedence <= other_op.precedence) ||
(token.rassoc? and token.precedence <
other_op.precedence)
queue << stack.pop
else
break
end
end
stack << token
end
when token.lparen?
stack << token
when token.rparen?
while stack.size != 0 and op = stack.pop
break if op.lparen?
queue << op
end
end
last_tok = token
end
stack.reverse.each do |op|
queue << op
end
queue
end

def to_rpn(exp)
shunting_yard(exp).collect{|t| t.value}.join(’ ')
end

DCBin = ‘/usr/bin/dc’

def dc_eval(exp)
if File.executable?(DCBin)
exp = to_rpn(exp)
IO.popen(DCBin, “w+”) do |f|
f.write(exp.gsub(/**/, ‘^’) + ’ p’)
f.close_write
f.read
end
end
end

end
end

if $0 == FILE
opt = OptionParser.new do |opt|
opt.banner = “Usage: #$0 compile_method”
opt.separator ‘’

opt.on(‘-c’, ‘–compile [expression]’,
‘prints bytecode sequence for [expression]’) do |exp|
p Compiler.compile(exp)
end

opt.on(‘-d’, ‘–dc-eval [expression]’,
‘trys to evaluate [expression] using dc(1)’) do |exp|
puts Compiler.dc_eval(exp)
end

opt.on(‘-i’, ‘–interpret [expression]’,
‘uses the byte-code interpreter to process [expression]’) do
|exp|
puts Interpreter.new(Compiler.compile(exp)).run
end

opt.on(‘-r’, ‘–show-rpn [expression]’,
‘prints out an RPN translated version of [expression]’) do |exp|
puts Compiler.to_rpn(exp)
end

opt.on(‘-h’, ‘–help’) { puts opt }
end
if ARGV.empty?
puts opt
else
opt.parse(ARGV)
end
end

On 11/9/06, Ross B. [email protected] wrote:

Aww man, now I’m gonna have to look into Haskell a bit more :slight_smile: I’ve been
putting it off for a while, but this is too cool to not have a play
with!

You won’t be disappointed. Haskell is an amazingly cool language.

Because I’m currently Haskell-ignorant, however, I don’t know how to run
your solution :frowning: . What interpreter (or compiler?) should be used? I
have Hugs installed, but that complains about a “Syntax error in input
(unexpected backslash (lambda))” at line 1.

It looks like its supposed to be embedded in LaTeX. Just remove the
first and last lines since the TeX directives look like lambdas to
Haskell.

\x → x * x
<==>
lamba {|x| x*x}

Also, Haskell is very picky about formatting. Your going to get a
bunch of errors because of line wraps.

On Thu, 09 Nov 2006 12:00:38 -0000, Louis J Scoras
[email protected] wrote:

On 11/9/06, Ross B. [email protected] wrote:

Aww man, now I’m gonna have to look into Haskell a bit more :slight_smile: I’ve been
putting it off for a while, but this is too cool to not have a play
with!

You won’t be disappointed. Haskell is an amazingly cool language.

It certainly looks it, too. I’ve glanced a few times, and put it on my
list of things to do, but now I’ve got a reason to bump it up the list a
bit :slight_smile:

Because I’m currently Haskell-ignorant, however, I don’t know how to run
your solution :frowning: . What interpreter (or compiler?) should be used? I
have Hugs installed, but that complains about a “Syntax error in input
(unexpected backslash (lambda))” at line 1.

It looks like its supposed to be embedded in LaTeX. Just remove the
first and last lines since the TeX directives look like lambdas to
Haskell.

Ahh, gotcha. I did wonder, but being totally Haskell-ignorant I didn’t
want to just start deleting stuff :slight_smile:

\x → x * x
<==>
lamba {|x| x*x}

Also, Haskell is very picky about formatting. Your going to get a
bunch of errors because of line wraps.

Yeah, I think I’m finding that now. Never mind, at least I’ll get the
chance to play with Haskell a bit today as I try to get it going :slight_smile:

Cheers,

On 11/9/06, Ross B. [email protected] wrote:

Aww man, now I’m gonna have to look into Haskell a bit more :slight_smile: I’ve been
putting it off for a while, but this is too cool to not have a play
with!

Awesome!

Because I’m currently Haskell-ignorant, however, I don’t know how to run
your solution :frowning: . What interpreter (or compiler?) should be used? I
have Hugs installed, but that complains about a “Syntax error in input
(unexpected backslash (lambda))” at line 1.

Like someone else pointed out, it’s meant to be in a LateX document.
It’s called “literate coding” and Haskell is the first place I
encountered it. The basic idea is you flip the normal source/comments
order and make comments the number 1 element in the document. Code is
then embedded in \begin{code} & \end{code} directives. (throwaway
comment: enabling ruby to run “literate code” would make a cool quiz).

Anyways, to run a literate haskell file in hugs, just save it with a
“lhs” extension. At least, that works for me in WinHugs.

To get a properly formatted version, its available on the web at:

http://www.haskell.org/haskellwiki/Haskell_Quiz/Bytecode_Compiler/Solution_Justin_Bailey

That version actually works (I’ll be resubmitting in a minute). To run
all the tests there, type “interpret_tests” after you load the file.

Justin

p.s. One nice thing about literate haskell is you can copy that entire
page (i.e. don’t worry about trying to grab only the code), paste it
into an lhs file, and run it. Works great on blogs too!

My solution in Haskell - and this time it actually works. The previous
implementation didn’t work well with negative numbers and CONST/LCONST
weren’t generating correctly.

The code is “literate” haskell, which means it must be saved in a file
with “lhs” extension to run under WinHugs. To test the generated byte
codes, run “interpret_tests” after loading the file. Other functions
which demonstrate what is generated are:

compile_tests - Spits out byte codes for all test expressions
generate_tests - Spits out symbolic byte codes for all test
expressions
evaluate_tests - Evaluates ASTs generated (not bytecode) for all
test expressions.

This solution is also posted at

http://www.haskell.org/haskellwiki/Haskell_Quiz/Bytecode_Compiler/Solution_Justin_Bailey

Thanks again for a great quiz!

Justin

\begin{code}
import Text.ParserCombinators.Parsec hiding (parse)
import qualified Text.ParserCombinators.Parsec as P (parse)
import Text.ParserCombinators.Parsec.Expr
import Data.Bits
import Data.Int

– Represents various operations that can be applied
– to expressions.
data Op = Plus | Minus | Mult | Div | Pow | Mod | Neg
deriving (Show, Eq)

– Represents expression we can build - either numbers or expressions
– connected by operators. This structure is the basis of the AST built
– when parsing
data Expression = Statement Op Expression Expression
| Val Integer
| Empty
deriving (Show)

– Define the byte codes that can be generated.
data Bytecode = NOOP | CONST Integer | LCONST Integer
| ADD
| SUB
| MUL
| POW
| DIV
| MOD
| SWAP
deriving (Show)

– Using imported Parsec.Expr library, build a parser for expressions.
expr :: Parser Expression
expr =
buildExpressionParser table factor

<?> "expression"

where
– Recognizes a factor in an expression
factor =
do{ char ‘(’
; x ← expr
; char ‘)’
; return x
}
<|> number
<?> "simple expression" -- Recognizes a number number :: Parser Expression number = do{ ds <- many1 digit ; return (Val (read ds)) } <?> “number”
– Specifies operator, associativity, precendence, and constructor to
execute
– and built AST with.
table =
[[prefix “-” (Statement Mult (Val (-1)))],
[binary “^” (Statement Pow) AssocRight],
[binary “*” (Statement Mult) AssocLeft, binary “/” (Statement
Div) AssocLeft, binary “%” (Statement Mod) AssocLeft],
[binary “+” (Statement Plus) AssocLeft, binary “-” (Statement
Minus) AssocLeft]
]
where
binary s f assoc
= Infix (do{ string s; return f}) assoc
prefix s f
= Prefix (do{ string s; return f})

– Parses a string into an AST, using the parser defined above
parse s = case P.parse expr “” s of
Right ast → ast
Left e → error $ show e

– Take AST and evaluate (mostly for testing)
eval (Val n) = n
eval (Statement op left right)
| op == Mult = eval left * eval right
| op == Minus = eval left - eval right
| op == Plus = eval left + eval right
| op == Div = eval left div eval right
| op == Pow = eval left ^ eval right
| op == Mod = eval left mod eval right

– Takes an AST and turns it into a byte code list
generate stmt = generate’ stmt []
where
generate’ (Statement op left right) instr =
let
li = generate’ left instr
ri = generate’ right instr
lri = li ++ ri
in case op of
Plus → lri ++ [ADD]
Minus → lri ++ [SUB]
Mult → lri ++ [MUL]
Div → lri ++ [DIV]
Mod → lri ++ [MOD]
Pow → lri ++ [POW]
generate’ (Val n) instr =
if abs(n) > 32768
then LCONST n : instr
else CONST n : instr

– Takes a statement and converts it into a list of actual bytes to
– be interpreted
compile s = toBytes (generate $ parse s)

– Convert a list of byte codes to a list of integer codes. If LCONST or
CONST
– instruction are seen, correct byte representantion is produced
toBytes ((NOOP):xs) = 0 : toBytes xs
toBytes ((CONST n):xs) = 1 : (toConstBytes (fromInteger n)) ++ toBytes
xs
toBytes ((LCONST n):xs) = 2 : (toLConstBytes (fromInteger n)) ++ toBytes
xs
toBytes ((ADD):xs) = 0x0a : toBytes xs
toBytes ((SUB):xs) = 0x0b : toBytes xs
toBytes ((MUL):xs) = 0x0c : toBytes xs
toBytes ((POW):xs) = 0x0d : toBytes xs
toBytes ((DIV):xs) = 0x0e : toBytes xs
toBytes ((MOD):xs) = 0x0f : toBytes xs
toBytes ((SWAP):xs) = 0x0a : toBytes xs
toBytes [] = []

– Convert number to CONST representation (2 element list)
toConstBytes n = toByteList 2 n
toLConstBytes n = toByteList 4 n

– Convert a number into a list of 8-bit bytes (big-endian/network byte
order).
– Make sure final list is size elements long
toByteList :: Bits Int => Int → Int → [Int]
toByteList size n = reverse $ take size (toByteList’ n)
where
toByteList’ a = (a .&. 255) : toByteList’ (a shiftR 8)

– All tests defined by the quiz, with the associated values they
should evaluate to.
test1 = [(2+2, “2+2”), (2-2, “2-2”), (22, "22"), (2^2, “2^2”), (2
div 2, “2/2”),
(2 mod 2, “2%2”), (3 mod 2, “3%2”)]

test2 = [(2+2+2, “2+2+2”), (2-2-2, “2-2-2”), (222, “222”), (2^2^2,
“2^2^2”), (4 div 2 div 2, “4/2/2”),
(7mod2mod1, “7%2%1”)]

test3 = [(2+2-2, “2+2-2”), (2-2+2, “2-2+2”), (22+2, "22+2"), (2^2+2,
“2^2+2”),
(4 div 2+2, “4/2+2”), (7mod2+1, “7%2+1”)]

test4 = [(2+(2-2), “2+(2-2)”), (2-(2+2), “2-(2+2)”), (2+(22),
"2+(2
2)"), (2*(2+2), “2*(2+2)”),
(2^(2+2), “2^(2+2)”), (4 div (2+2), “4/(2+2)”), (7mod(2+1),
“7%(2+1)”)]

test5 = [(-2+(2-2), “-2+(2-2)”), (2-(-2+2), “2-(-2+2)”), (2+(2 * -2),
“2+(2*-2)”)]

test6 = [((3 div 3)+(8-2), “(3/3)+(8-2)”), ((1+3) div (2 div
2)(10-8), "(1+3)/(2/2)(10-8)“),
((13)4(56), “(13)4(56)”), ((10mod3)(2+2),
"(10%3)
(2+2)”), (2^(2+(3 div 2)^2), “2^(2+(3/2)^2)”),
((10 div (2+3)4), "(10/(2+3)4)"), (5+((54)mod(2+1)),
"5+((5
4)%(2+1))")]

– Evaluates the tests and makes sure the expressions match the expected
values
eval_tests = concat $ map eval_tests [test1, test2, test3, test4, test5,
test6]
where
eval_tests ((val, stmt):ts) =
let eval_val = eval $ parse stmt
in
if val == eval_val
then ("Passed: " ++ stmt) : eval_tests ts
else ("Failed: " ++ stmt ++ “(” ++ show eval_val ++ “)”) :
eval_tests ts
eval_tests [] = []

– Takes all the tests and displays symbolic bytes codes for each
generate_tests = concat $ map generate_all
[test1,test2,test3,test4,test5,test6]
where generate_all ((val, stmt):ts) = (stmt, generate (parse stmt))
: generate_all ts
generate_all [] = []

– Takes all tests and generates a list of bytes representing them
compile_tests = concat $ map compile_all
[test1,test2,test3,test4,test5,test6]
where compile_all ((val, stmt):ts) = (stmt, compile stmt) :
compile_all ts
compile_all [] = []

interpret_tests = concat $ map f’ [test1, test2, test3, test4, test5,
test6]
where
f’ tests = map f’’ tests
f’’ (expected, stmt) =
let value = fromIntegral $ interpret [] $ compile stmt
in
if value == expected
then "Passed: " ++ stmt
else "Failed: " ++ stmt ++ “(” ++ (show value) ++ “)”

fromBytes n xs =
let int16 = (fromIntegral ((fromIntegral int32) :: Int16)) :: Int
int32 = byte xs
byte xs = foldl (\accum byte → (accum shiftL 8) .|. (byte))
(head xs) (take (n - 1) (tail xs))
in
if n == 2
then int16
else int32

interpret [] [] = error “no result produced”
interpret (s1:s) [] = s1
interpret s (o:xs) | o < 10 = interpret ((fromBytes (o2) xs):s) (drop
(o
2) xs)
interpret (s1:s2:s) (o:xs)
| o == 16 = interpret (s2:s1:s) xs
| otherwise = interpret (((case o of 10 → (+); 11 → (-); 12 →
(*); 13 → (^); 14 → div; 15 → mod) s2 s1):s) xs

\end{code}

On Nov 9, 2006, at 12:58 PM, Justin B. wrote:

(throwaway comment: enabling ruby to run “literate code” would make
a cool quiz).

Great, write it up for us so we can all learn how it works!

[email protected]

James Edward G. II