For performance, write it in C - Part 3, Source code now ava

tekhne · August 3, 2006, 12:12pm

Ola B. wrote:

The point Charles made with saying “but it sets off any Java devs
warning flags” is that your Java coding conventions differ so much
from regular conventions that your Java coding capacity is put into
doubt. In plain speak; are you a good enough Java programmer to write
an honest benchmark version for Java?

The question is the code itself, the names can always be renamed. Does
the code suddenly become better if I had used the regular conventions?

Does the naming convention affect performance? NO!

So having brought up such a lame point we are now trying to deflect
attention from the code by rubbishing the author. Focus on the code, not
me or whatever fantasy you have about me.

tekhne · August 3, 2006, 12:13pm

On 8/2/06, Peter H. [email protected] wrote:

The question is the code itself, the names can always be renamed. Does
the code suddenly become better if I had used the regular conventions?

Does the naming convention affect performance? NO!

So having brought up such a lame point we are now trying to deflect
attention from the code by rubbishing the author. Focus on the code, not
me or whatever fantasy you have about me.

I already said it wasn’t a benchmark thing. Besides that, I think I’ve
proven my point. I don’t need to deflect anything. You’re just wrong.

tekhne · August 3, 2006, 12:36pm

Christian N. wrote:

… Which is extremely funny, since Common Lisp have had wicked fast
virtual machines for the last 15 years (on par with C in performance).

They should catch up with the 20th century first of all. =)

Sorry, but which CL has a “wicked fast” virtual machine and is on
par with C? AFAICS, they either are one or the other…

I don’t recall which of the “big four” open source CLs have virtual
machines, if any. IIRC the “standard Lisp benchmarks” are pretty much a
dead heat on GCL vs. CMUCL, with Clisp coming in third. When SBCL “grows
up”, it will probably be as fast as GCL or CMUCL. My recollection is
that the CMUCL compiler is on a par with C in terms of generating x86
machine code.

“scheme48” has a virtual machine, IIRC, and it’s regarded highly by the
folks who did the “vmgen” and “gforth” virtual machines, which are bred
for speed using some non-ANSI features of GCC. But that’s not Common
Lisp; Scheme is in some ways easier to make “wicked fast”.

tekhne · August 3, 2006, 12:48pm

On Tue, 01 Aug 2006 17:48:41 +0900, Peter H. wrote:

The source code is available from
http://peterhi.dyndns.org/write_it_in_c/index.html

Here is another version in Ruby. It uses a more clever algorithm.
It makes use of the fact that permuting the rows or the columns of
a latin square still gives a latin square. It also passes a constraint
on the columns that the given number can be in, to reduce the
backtracking
even more.

It runs in 15 seconds on my machine, which I find quite acceptable.

I am sure that this algorithm coded in a compiled functional or logic
language would be about as fast, or faster than your C code, but with
the
advantages of a higher level language.

I still think you benchmark is flawed, because

the C program works only for squares of size 5
It shows totally no relevance to the kind of problems that you
would use Ruby for. If you want raw speed for numerical
applications,
then I would suggest to use a functional language (i.e. ocaml).
If your application runs slowly, it is probably better to think
first
about why it is slow, if you can improve the code with faster
algorithms or removing bottlenecks.

Kristof B.

---------------- latin2.rb --------------------
require ‘permutation’

$size = (ARGV.shift || 5).to_i
MaxVal = $size-1

RowPermutations = Permutation.new($size).map{|p| p.value}
BaseStr = (2…$size).to_a.join
StrPermutations = Permutation.new($size-1).map{|p| p.project(BaseStr)}

StartColumns = (1…MaxVal).to_a
def init_columns(el)
a = StartColumns.dup
a.delete_at(el-1)
return a
end

def insert(sqr, num, row, columns)
insert(sqr, num, row+1, columns) if (row == num)
columns.each do |col|
next if sqr[row][col] != ?.
sqr[row][col] = num + ?1
if (row == MaxVal)
insert(sqr, num+1, 1, init_columns(num+1))
elsif (num == MaxVal && row == MaxVal - 1)
print_solution(sqr)
else
insert(sqr, num, row+1, columns - [col])
end
sqr[row][col] = ?.
end
end

def print_solution(sqr)
RowPermutations.each do |rp|
StrPermutations.each do |sp|
0.upto(MaxVal) do |r|
print sqr[rp[r]].tr(BaseStr, sp)
print “:”
end
puts
end
end
end

$square = [(“1” + BaseStr)] +
Array.new($size-1) { |i| (i+2).to_s + “.” * ($size - 1) }

insert($square, 0, 1, StartColumns)

tekhne · August 3, 2006, 12:15pm

Ola B. [email protected] writes:

… Which is extremely funny, since Common Lisp have had wicked fast
virtual machines for the last 15 years (on par with C in performance).

They should catch up with the 20th century first of all. =)

Sorry, but which CL has a “wicked fast” virtual machine and is on
par with C? AFAICS, they either are one or the other…

tekhne · August 3, 2006, 12:50pm

Here’s how it goes on my box:

[Latin]$ time ruby latin2.rb > r5

real 0m16.283s
user 0m15.380s
sys 0m0.498s

Which means that your computer is about as crap as mine I will add it
to the pages and make it a download. I think I need a table summarising
the timings.

tekhne · August 3, 2006, 1:05pm

Peter H. wrote:

Charles O Nutter wrote:

Ok, so there’s a bunch of problems with the Java version.

In addition to the addRow run and the Java startup time you’re also
benchmarking over 5200 array modifications to set Compared values to true

That was simple because I couldn’t define the array when I declared it
as I did in C.

I made some changes to the jen.pl script you used to generate the
Java program, it now splits apart the array initialization into many
different methods - and that will let you generate a Java program for
6x6. I emailed it to you.
By “benchmarking without analysis is bogus” I understand that
without analysis the programs might not be doing what we think they are
doing - what we think is an apples to apples comparison might not be an
apples to apples comparison.

In this case, I understand Charles O Nutter to be saying that in C all
those boolean arrays are initialized at compile time, but in Java all
those boolean arrays are initialized at runtime - so although we’re
writing a similar thing in the code, we aren’t doing the same thing
when the code is run.

In the same way, we have similar print statements in the C and Java
code, but they don’t do the same thing when the code is run - one deals
with bytes the other deals with Unicode double bytes.

So it depends which part of the computation we’re interested in - if
we’re interested in the addARow recursive calls and loops, that part of
the computation might be swamped by the time taken to to initialize the
arrays at runtime and naively print strings.

There seems to be a bigger problem with using this latin squares
algorithm for benchmarking - the computation explodes!
```
 0.496s gcc 5x5 (without print statements)
```

30055.098s gcc 6x6 (without print statements)

tekhne · August 3, 2006, 1:07pm

On 8/2/06, Isaac G. [email protected] wrote:

I made some changes to the jen.pl script you used to generate the
Java program, it now splits apart the array initialization into many
different methods - and that will let you generate a Java program for
6x6. I emailed it to you.

That’s pretty helpful…the old version was gigantic even for 5x5. A
better
way might be to write the Perl script in Java and generate the array at
runtime, rather than generating some pretty weird Java code. It would be
pretty fast for all sizes that are computable on modern machines with
this
algorithm.

By “benchmarking without analysis is bogus” I understand that

without analysis the programs might not be doing what we think they are
doing - what we think is an apples to apples comparison might not be an
apples to apples comparison.

Exactly my point. The Java version was automatically crippled because it
had
to do far more work than the C version, owing both to always-on Java
features (unicode support) and to the author’s attempts to “work around”
various restrictions in creative ways (initializing the array at
runtime).
The IO bit turned out to be the worst part, however, and when it was
removed
the Java version beat the C version handily (on my opteron box, anyway).

On a side note: holy crap is Java 6 fast. Those numbers surprised even
me.

There seems to be a bigger problem with using this latin squares

algorithm for benchmarking - the computation explodes!
    0.496s gcc 5x5 (without print statements)
30055.098s gcc 6x6 (without print statements)

You have more patience than I. I killed it after the first couple hours
and
after a visit to Wolfram told me it would take a long time.

tekhne · August 3, 2006, 1:08pm

BTW, I managed to resolve my speed issues for the C version. My machine
seems to default into “on demand” speed mode, so my runs were starting
off
at 1.0GHz before ramping up. With speed scaling off the C version does
much
better…though still not as fast as Java:

With IO:
headius@opteron:~/latin_in_c$ time ./latin > /dev/null 2>&1

real 0m0.707s
user 0m0.700s
sys 0m0.004s

Without IO:
headius@opteron:~/latin_in_c$ time ./latin

real 0m0.586s
user 0m0.580s
sys 0m0.004s

tekhne · August 3, 2006, 1:13pm

William J. wrote:

even more.
would use Ruby for. If you want raw speed for numerical applications,
then I would suggest to use a functional language (i.e. ocaml).

Here’s an OCaml version that runs in about 1.5 seconds when
output is redirected to a file on my faster computer (3.2 GHz).
It uses the same algorithm as the C program.
The C version takes 1.961 seconds when writing to /dev/null on
the o.p.'s computer. Since this is my first OCaml program, I’m
certain an expert could improve it.

An expert did suggest replacing a couple of lists with arrays.
This should be a bit faster.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe *)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms
let incompat = Array.make_matrix n n true ;;

for x = 0 to n - 1 do
for y = 0 to n - 1 do
incompat.(x).(y) <-
List.exists2 ( = ) perms.(x) perms.(y)
done
done;;

let join list = String.concat “” (List.map string_of_int list)
let output_strings = Array.map join perms ;;
let board = ( Array.make size 0 ) ;;

(* recursive function *)
let rec add_a_row row =
if row = size then
print_endline
( String.concat “:”
(List.map (fun i -> output_strings.(i) )
(Array.to_list board)) )
else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
while (!prev_row < row) &&
(not incompat.(latest).(board.(!prev_row))) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0 ;;

tekhne · August 3, 2006, 12:51pm

Kristof B. wrote:

 then I would suggest to use a functional language (i.e. ocaml).

Here’s an OCaml version that runs in about 1.5 seconds when
output is redirected to a file on my faster computer (3.2 GHz).
It uses the same algorithm as the C program.
The C version takes 1.961 seconds when writing to /dev/null on
the o.p.'s computer. Since this is my first OCaml program, I’m
certain an expert could improve it.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list
let perms = permute list

let n = List.length perms
let incompat = Array.make_matrix n n true ;;

for x = 0 to n - 1 do
for y = 0 to n - 1 do
incompat.(x).(y) <-
List.exists2 ( = ) (List.nth perms x) (List.nth perms y)
done
done;;

let join list = String.concat “” (List.map string_of_int list)
let output_strings = List.map join perms ;;
let board = ( Array.make size 0 ) ;;

let rec add_a_row row =
if row = size then
print_endline
( String.concat “:”
(List.map (fun i -> List.nth output_strings i)
(Array.to_list board)) )
else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
while (!prev_row < row) &&
(not incompat.(latest).(board.(!prev_row))) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0 ;;

tekhne · August 3, 2006, 1:24pm

M. Edward (Ed) Borasky wrote:

I don’t recall which of the “big four” open source CLs have virtual
machines, if any. IIRC the “standard Lisp benchmarks” are pretty much a
dead heat on GCL vs. CMUCL, with Clisp coming in third. When SBCL “grows
up”, it will probably be as fast as GCL or CMUCL. My recollection is
that the CMUCL compiler is on a par with C in terms of generating x86
machine code.

Actually SBCL is a fork from CMUCL (with a focus on maintanability) and
as such uses the same compiler.
Compiler improvements are often ported between the two implementations
and when it comes to actual performance there is not much between
the two.

tekhne · August 3, 2006, 2:23pm

M. Edward (Ed) Borasky wrote:

This expert suggests comparing performance between the C version and
the Ocaml version on the same machine!

On my machine (Athlon X2 4400+, Debian Linux, gcc 4.0.4, OCaml 3.09.2,
Sun
JDK 1.5):

    Compile Run

C 0.137s 0.386s gcc -O3 -Wall
OCaml 0.171s 0.668s ocamlopt
OCaml 0.161s 0.626s ocamlopt -inline 100 -unsafe
OCaml2 0.165s 0.415s ocamlopt
OCaml2 0.165s 0.401s ocamlopt -unsafe
Java 1.565s 1.688s javac

Some niggles:

A lot of precalculation has been done in the C and Java, to the
extent
that we’re breaking Java compilers (always a bad sign).
Java is nothing like as slow as the OP claimed.
OCaml is only 4% slower than C and only 10% slower with bounds
checking.
Whether Ruby is fast enough or not, you should be programming in
OCaml
and not in Ruby or C.

Here’s my OCaml2 (based upon William’s):

let rec fact n = if n=0 then 1 else n*fact(n-1)

let size = 5
let n = fact size

(* Permutation generator *)
let p = Array.make size 0
let xx = Array.init size (fun i -> if i<size-1 then 0 else -1)
let gen_perm() =
let i = ref (size - 1) in
while !i > 0 && xx.(!i) = !i do
xx.(!i) <- 0;
decr i
done;
if !i = 0 then 1 else begin
xx.(!i) <- xx.(!i) + 1;
p.(0) <- 1;
for i=0 to size - 1 do
p.(i) <- p.(i - xx.(i));
p.(i - xx.(i)) <- i+1
done;
0
end

(* Permutations *)
let perms = Array.init n (fun _ -> ignore(gen_perm()); Array.copy p)

let incompat =
let aux x y =
Array.iteri (fun i px -> if px = perms.(y).(i) then raise Exit)
perms
(x);
false in
Array.init n (fun x -> Array.init n (fun y -> try aux x y with Exit ->
true))

let join list = String.concat “” (Array.to_list (Array.map string_of_int
list))
let output_strings = Array.map join perms
let board = Array.make size 0

let op = String.make (size*(size+1)-1) ‘:’

let rec add_a_row row =
if row = size then begin
for i=0 to size-1 do
String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
done;
print_string op;
print_string “\n”
end else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
let incompat = incompat.(latest) in
while !prev_row < row && not incompat.(board.(!prev_row)) do
incr prev_row
done;
if !prev_row = row then begin
board.(row) <- latest;
add_a_row (row + 1)
end
done

let () = add_a_row 0

tekhne · August 3, 2006, 2:23pm

On 8/2/06, Jon H. [email protected] wrote:

    Compile Run
C 0.137s 0.386s gcc -O3 -Wall
OCaml 0.171s 0.668s ocamlopt
OCaml 0.161s 0.626s ocamlopt -inline 100 -unsafe
OCaml2 0.165s 0.415s ocamlopt
OCaml2 0.165s 0.401s ocamlopt -unsafe
Java 1.565s 1.688s javac

I assume this is the unmodified Java version…don’t make me come over
there
and smack you!

I’d be interested in seeing all of these run without IO as well…that
was
the big bottleneck for Java.

Oh, and maybe Java 6 too? It’s like 20-30% faster than Java 5…I don’t
know
how Sun does it.

tekhne · August 3, 2006, 2:23pm

William J. wrote:

This expert suggests comparing performance between the C version and
the Ocaml version on the same machine!

tekhne · August 3, 2006, 2:23pm

On Thu, Aug 03, 2006 at 01:30:04PM +0900, Jon H. wrote:

Whether Ruby is fast enough or not, you should be programming in OCaml
and not in Ruby or C.

Actually, OCaml is one of the languages in which I’ve been meaning to
get competent at some point in the hopefully-near future (for some
definition of “near”). I find that I pick up a language faster if
there’s a good community for it whose means of communication suits the
way I think, though. What sort of online OCaml community resources are
available? Perl’s biggest strength, in terms of interactive community,
seems (for my tastes) to be the PerlMonks site, and this list fills the
same need for Ruby. What comparable community interaction mechanism
could I use for OCaml?

I’m often disappointed by the options for a given language. For
instance, I recently (earlier this year) added another language to my
list of loves: the UCBLogo implementation of the Logo language (more a
“Lisp without parentheses” than any other Logo of which I’m aware, as it
provides macro support). It has no online community at all, except for
a bunch of half-baked educators who want to use Turtle Graphics to teach
elementary and middle school kids that computers aren’t scary, which is
definitely not the community I want for my own purposes. My issue with
the Python community is far less drastic – it actually has one of the
better programming language online communities, all things considered,
of course – but the fact that the cultural rigidity of the Python
community seems so pervasive drives me away (as does that
fork-in-the-eye sensation I get when reading Python source). Et cetera.
Both Ruby and Perl are stand-out examples of excellence for online
communities, and while I don’t need the perfection of either
necessarily, I hope OCaml provides something close.

So. Does it?

Wow, that was a lot of rambling for that question. Just to keep it
on-topic:

Thanks for being a good community, guys.

tekhne · August 3, 2006, 2:23pm

On 8/3/06, Chad P. [email protected] wrote:

there’s a good community for it whose means of communication suits the
provides macro support). It has no online community at all, except for
necessarily, I hope OCaml provides something close.
“A script is what you give the actors. A program
is what you give the audience.” - Larry Wall

This is a very important point and I would like to add I have
difficulties to grasp OCaml, maybe 26 years of Procedural Style and OO
programming are too difficult to set appart, maybe I am just too old now
and
maybe I am just plain stupid ( do not comment on this one ;).
I will accept that kind of affirmation happily but I really do have the
feeling that OCaml is not the kind of language which makes things easy
for
you.
I admire its conecptual beauty, I dislike it’s syntax and I hate its
semantics, probably I have not yet the slightest idea of its semantic.
But all that said, how can one invest time into learning the thing when
one
lacks the time to rite (I will not use any other spelling any more!!!)
Ruby
?
In order to come back to Chad’s POV, I am afraid without an excellent
community teaching the approach to OCaml most of us are simply lost and
give
up, as sad as that might be.

Just my nanomoney worth.

Robert

–
Deux choses sont infinies : l’univers et la bÃªtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

Albert Einstein

tekhne · August 3, 2006, 2:24pm

I only have this on my box.

[peterhickman]$ ocaml -version
The Objective Caml toplevel, version 3.09.2

Is this good enough for the version you wrote?

tekhne · August 3, 2006, 2:24pm

I have run the first Ocaml version on the same box as all the other
timings have come from and it sits between the C version with default
compiler settings and the C version optimised for speed. The web page
has been updated to include the Ocaml source and Kristof B.'s
faster Ruby version. Plus a simple table so that you can view all the
results.

I will take the revised Ocaml version and time it and check the output
tonight.

tekhne · August 3, 2006, 2:36pm

Peter H. wrote:

I only have this on my box.

[peterhickman]$ ocaml -version
The Objective Caml toplevel, version 3.09.2

Is this good enough for the version you wrote?

You probably also have:

$ ocamlopt -v
The Objective Caml native-code compiler, version 3.09.2
Standard library directory: /usr/lib/ocaml/3.09.2

In which case you just do:

$ ocamlopt latin.ml -o latin
$ time ./latin >/dev/null