For performance, write it in C - Part 3, Source code now ava

tekhne · August 4, 2006, 10:40pm

Isaac G. wrote:

Depending on the machine hardware, we should expect a vanilla Java
implementation to take 1.5x - 2.5x longer than C for this problem.

Only if by “vanilla” you mean optimised Java.

tekhne · August 5, 2006, 5:02am

Jon H. wrote:
-snip-

These languages are not the same. However, using “ordinary” IO via
System.out.println or cout or print_endline is the “vanilla” way of
outputting in all of these languages. Of these languages, only Java gives
appalling performance unless you switch to buffers.

I guess you’d fail a Java exam

(I’m not interested enough to find out, but I suspect stdout is at
minimum line-buffered.)

Even when you do use buffers in Java, it is still slower than unbuffered
C++, OCaml etc. on my machine.

No where near an order of magnitude slower - iirc you reported 1.85x C

I was using Sun’s J2SE 1.5. I also tried GNU’s GIJ, which is even slower
than OCaml’s bytecode.

You failed the Java exam again
For Java without JIT use the -Xint command line option with the Sun JVM
like this

time java -Xint Latin > /dev/null

And if the program ran for any amount of time we of course be using

time java -server Latin > /dev/null

tekhne · August 5, 2006, 8:08am

On Sat, Aug 05, 2006 at 07:22:57AM +0900, Thomas E Enebo wrote:

On Sat, 05 Aug 2006, Jon H. defenestrated me:

Only if by “vanilla” you mean optimised Java.

Did you look at Isaac’s implementation? It is almost exactly the same it
was before, but he is not printing out the latin squares as unicode (which
none of the other impls are doing to my knowledge) and it is writing to
a BufferedReader (which any Java programmer with a couple months of
experience would/should do).

Not to get embroiled in a nascent flamewar, or anything, but . . .

I don’t think the fact that everyone in one language’s community does
something that people in other languages’ communities don’t so they can
get better performance makes what they’re doing any less an
optimization. It just makes it an exceedingly common optimization (and
might indicate that language implementation defaults should be changed).

Then again, maybe not.

tekhne · August 5, 2006, 8:41am

Jon H. wrote:

Isaac G. wrote:

(I’m not interested enough to find out, but I suspect stdout is at
minimum line-buffered.)

Sure, the naive implementations in other languages are doing silly things as
well (like the OCaml flushing after every line) but they are still faster
than the most optimised Java to date.

afaict none of the OCaml implementations work with the data created by
that Perl script - like the C and Java implementations - why is that?

than OCaml’s bytecode.

You failed the Java exam again

For Java without JIT use the -Xint command line option with the Sun JVM
like this

time java -Xint Latin > /dev/null

That’s as slow as ocamlc.

as slow as? does that mean faster

tekhne · August 5, 2006, 5:06pm

Isaac G. wrote:

that Perl script - like the C and Java implementations - why is that?
The authors couldn’t be bothered because it is easier to generate that
data
in OCaml.

For Java without JIT use the -Xint command line option with the Sun JVM
like this

time java -Xint Latin > /dev/null

That’s as slow as ocamlc.

as slow as? does that mean faster

No, that means even slower.

tekhne · August 7, 2006, 10:42am

Thomas E Enebo wrote:

My only question on this thread about Java is what Java is being used
for the test (used on the webpage – a good addition to the page)? I ran
Java 1.4, Java 5 and Java 6 (beta but what the hell). The speedup was
surprising and exciting (obviously I do Java programming). I went from
1.1s to 0.84s between the versions.

-Tom

[peterhickman]$ javac -version
javac 1.5.0_06

I think that’s the current best version for the Mac. It does tend to lag
behind the rest of the world when it comes to releases.

I will stick a language versions page up tonight.

tekhne · August 5, 2006, 6:35am

Isaac G. wrote:

(I’m not interested enough to find out, but I suspect stdout is at
minimum line-buffered.)

Sure, the naive implementations in other languages are doing silly
things as
well (like the OCaml flushing after every line) but they are still
faster
than the most optimised Java to date.

Even when you do use buffers in Java, it is still slower than unbuffered
C++, OCaml etc. on my machine.

No where near an order of magnitude slower - iirc you reported 1.85x C

I get 0.604s unoptimised OCaml vs 1.740s unoptimised Java on x86_64.
That’s
2.9x slower.

I was using Sun’s J2SE 1.5. I also tried GNU’s GIJ, which is even slower
than OCaml’s bytecode.

You failed the Java exam again

For Java without JIT use the -Xint command line option with the Sun JVM
like this

time java -Xint Latin > /dev/null

That’s as slow as ocamlc.

And if the program ran for any amount of time we of course be using

time java -server Latin > /dev/null

I already tried and it improved performance slightly.

tekhne · August 8, 2006, 10:56am

New update to the web site. I have added the language versions to the
page and another timing.

Isaac G. created a Java version that output the data in binary format
so as to circumvent any overhead in the conversions. It didn’t seem to
improve much on his previous version but to be honest I no longer
believe that my system can reliably time such short runs.

The source code is available from
http://peterhi.dyndns.org/write_it_in_c/index.html

I started to get my AMD64 4400 box set up to the 6 x 6 grids. It is
probably going to take me a week or more running the tests to get the
timings for this. Is anyone actually that interested or shall we knock
this thread on the head?

I’ll leave the web page up.

tekhne · August 8, 2006, 11:56am

Chad P. wrote:

I don’t think the fact that everyone in one language’s community does
something that people in other languages’ communities don’t so they can
get better performance makes what they’re doing any less an
optimization. It just makes it an exceedingly common optimization (and
might indicate that language implementation defaults should be changed).

Precisely.

tekhne · August 9, 2006, 10:17pm

On 8/8/06, Jon H. [email protected] wrote:

Chad P. wrote:

I don’t think the fact that everyone in one language’s community does
something that people in other languages’ communities don’t so they can
get better performance makes what they’re doing any less an
optimization. It just makes it an exceedingly common optimization (and
might indicate that language implementation defaults should be changed).

Precisely.

It reminds me of Mark Twain’s observation about how stupid the French
were, since he could never make them understand their own language!

–
Rick DeNatale

IPMS/USA Region 12 Coordinator
http://ipmsr12.denhaven2.com/

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

tekhne · August 15, 2006, 2:22pm

On Sat, 5 Aug 2006, Peter H. wrote:

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

showing how it performs on a 2, 4, and 8 grid would show how it
scaled…

-a

tekhne · August 15, 2006, 2:22pm

[email protected] wrote:

On Sat, 5 Aug 2006, Peter H. wrote:

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

showing how it performs on a 2, 4, and 8 grid would show how it scaled…

-a
I’ll test the 6 x 6 in Ocaml, C and Java but I’m not sure that I will be
going for the best of 10 runs on this. 2 and 4 are simply too fast to
measure accurately except for the slowest code. Someone calculated that
some values higher than 6 would take months to run and 8 might be a
little too far into that ballpark.

I do have a very noisy and hot AMD 64 4400 with 2 Gb ram running Ubuntu.
Perhaps I will set this as the new benchmark machine.

tekhne · August 15, 2006, 2:22pm

[email protected] wrote:

On Sat, 5 Aug 2006, Peter H. wrote:

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

showing how it performs on a 2, 4, and 8 grid would show how it scaled…

-a

happiness is not something ready-made. it comes from your own actions.

h.h. the 14th dali lama

It scales badly

    0.496s gcc 5x5 (without print statements)

30055.098s gcc 6x6 (without print statements)

imo If Peter has figured out what he’s trying to do then it would be
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

tekhne · August 15, 2006, 2:22pm

[email protected] wrote:

On Sat, 5 Aug 2006, Peter H. wrote:

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

showing how it performs on a 2, 4, and 8 grid would show how it scaled…

-a

happiness is not something ready-made. it comes from your own actions.

h.h. the 14th dali lama

It scales badly

    0.496s gcc 5x5 (without print statements)

30055.098s gcc 6x6 (without print statements)

imo If Peter has figured out what he’s trying to do then it would be
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

tekhne · August 15, 2006, 2:22pm

On Sat, 5 Aug 2006, Isaac G. wrote:

happiness is not something ready-made. it comes from your own actions.

h.h. the 14th dali lama

It scales badly
   0.496s gcc 5x5 (without print statements)
30055.098s gcc 6x6 (without print statements)

imo If Peter has figured out what he’s trying to do then it would be
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

indeed - that’s my point exactly.

with that kind scaling a better ruby version could kill the c one!

-a

tekhne · August 15, 2006, 2:22pm

[email protected] wrote:

-a
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

indeed - that’s my point exactly.

At last, something we can agree about - that makes all this silliness
worthwhile

tekhne · August 15, 2006, 2:22pm

On Sat, 05 Aug 2006 08:31:30 +0900, ara.t.howard wrote:

-a
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

indeed - that’s my point exactly.

with that kind scaling a better ruby version could kill the c one!

Did you actually run the 6x6 version? I think my Ruby version might
actually be faster than the C one for large squares, since it only
computes normalised squares. A reasonable estimate indicates that
for 6x6 squares it will take about 23500 seconds on my computer. But I
didn’t test it, because that’s still longer than I am prepared to wait
for
it

Kristof

tekhne · August 15, 2006, 2:22pm

[email protected] wrote:

It scales badly
    0.496s gcc 5x5 (without print statements)
30055.098s gcc 6x6 (without print statements)

imo If Peter has figured out what he’s trying to do then it would be
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

Sure it would, though his web page implies that different algorithms
aren’t welcome. For example, http://jsnell.iki.fi/tmp/latin.lisp
will find the squares for a 6x6 in 180 seconds [*], compared to 13000
seconds for the C version with -O3. Both times are without output.

If the results are printed, the Lisp code will take about 1500
seconds, which doesn’t make for a very sensible “time taken for real
work” / “time taken for output” ratio.

[*] X2 3800+, SBCL 0.9.15.6

tekhne · August 15, 2006, 2:22pm

Actually this is a miss reading. The whole point of the first post was
to point out that the speed up from going from Perl (or any scripting
language) to C. I used the same Perl code to pre compute values to
create the C header file because I already had code at this point that
did the work and the impact of doing the pre compute was pretty much
insignificant to the overall run time. I then used to same technique to
get the Java implementation. I have never claimed that the C version was
the best possible implementations only that it delivered a massive
performance boost. I think that I have proven that.

There are optimisations that could be applied to the C version but they
are only going to get percentage improvements and are getting much
harder to reliably measure.

As to Ocaml versions, well they were submitted and proved to be faster
than my C version. C is quite a fragile language to develop in and so
any high level language that can deliver that performance is very
interesting. To me at least.

And all this Java stuff, well that comes from the post by Charles O
Nutter:

Write it in C is as valid as write it in Java (as someone else
mentioned).
Java is at least as fast as C for most algorithms.

It was the “at least as fast as” bit that I disputed and the whole Java
sub thread comes from that.

The only requirement is that I can get an implementation of the language
in question on my Macintosh.

tekhne · August 15, 2006, 2:22pm

Peter H. wrote:

Actually this is a miss reading.

What is a misreading?

The whole point of the first post was
to point out that the speed up from going from Perl (or any scripting
language) to C. I used the same Perl code to pre compute values to
create the C header file because I already had code at this point that
did the work and the impact of doing the pre compute was pretty much
insignificant to the overall run time. I then used to same technique to
get the Java implementation. I have never claimed that the C version was
the best possible implementations only that it delivered a massive
performance boost. I think that I have proven that.

Now I’m puzzled - isn’t that just conventional wisdom? Isn’t it
conventional wisdom that a program transformed by an optimizing
compiler will be faster than a program interpreted at runtime?

We can see that by looking at the same C program using both a compiler
and an interpreter like CINT.
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=cint&lang2=gcc

CINT scripts can call compiled code, which takes us that next step -
it’s faster when the script glues together compiled functions, so it’s
good to have a large library of highly-optimised compiled functions.

We can figure that out without a programming language flame war

Write it in C is as valid as write it in Java (as someone else
mentioned).
Java is at least as fast as C for most algorithms.

It was the “at least as fast as” bit that I disputed and the whole Java
sub thread comes from that.

Here’s what Charles O Nutter said
http://groups.google.com/group/ruby-talk-google/msg/1adf7bdc7ab8aae0

It’s not obvious to me what he meant by “Write it in C is as valid as
write it in Java” but the points I’d take from X is at least as fast as
Y for most algorithms are

it can be true that using the particular implementations of languages
X and Y on his computer X is at least as fast as Y, and using the same
or different implementations of languages X and Y on your computer X is
not as fast as Y
we don’t care about “most algorithms” just the particular ones we
deal with

Now I’ve read that old post, I understand why he called your Java
program bogus. With apology, it must have seemed that you were
deliberately writing bad Java to prove your point.

The only requirement is that I can get an implementation of the language
in question on my Macintosh.

D and Clean are other languages you might look at, but using a
different language won’t make what you’re trying to do any more
feasible.

In your original post, you listed grid sizes with the number of
permutations but you only listed N! - the permutations of a single row
1…N

afaict your approach is to enumerate every distinct latin square of
side N, from Latin Square -- from Wolfram MathWorld

size : squares
1 : 1! x 0! x 1
2 : 2! x 1! x 1
3 : 3! x 2! x 1
4 : 4! x 3! x 4
5 : 5! x 4! x 56
6 : 6! x 5! x 9408
7 : 7! x 6! x 16942080
8 : 8! x 7! x 535281401856
9 : 9! x 8! x 377597570964258816

I don’t know much about algorithm analysis but I’d guess that
generating 5524751496156892842531225600 latin squares is a bad idea.

Maybe you can use some of the tricks from this paper “Enumerating
possible Sudoku grids” - the authors programs are available here