For performance, write it in C - Part 3, Source code now ava

William J. wrote:

File “latin-squares-H.ml”, line 30, characters 4-15:
This function is applied to too many arguments,
maybe you forgot a `;’

It lost a . for array access. It should be (slightly reformatted
trying to lose nothing):

let incompat =
let aux x y =
Array.iteri
(fun i px -> if px = perms.(y).(i) then raise Exit)
perms.(x);

William J. wrote:

A small point.
print_endline op
is shorter. Is it faster?

Not with the interpreter; it’s slower. Haven’t tested the compiler yet.

If anything, it’ll be slower, as print_endline flushes the buffer every
call, whereas print_string and print_char do not.

William J. wrote:

A small point.
print_endline op
is shorter. Is it faster?

Not with the interpreter; it’s slower. Haven’t tested the compiler yet.

Peter H. wrote:

Any views on Practical Ocaml by Joshua B. Smith, does it look like it
might be a good book to learn from?

It looks very good but not as good as my book. :wink:

http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html

Isaac G. wrote:

Simply writing Java that converts the double-byte Unicode Java
outputStrings to single-byte /once/ instead of everytime there’s a
print statement, and following the usual Java pattern of wrapping
output with a BufferedOutputStream, reduces that to

1.396s user+sys, sun-jdk-1.5.0.07

Post your code here.

Jon H. wrote:

OCaml 0.161s 0.626s ocamlopt -inline 100 -unsafe
OCaml2 0.165s 0.415s ocamlopt
OCaml2 0.165s 0.401s ocamlopt -unsafe
Java 1.565s 1.688s javac

That was 64-bit. Here are my 32-bit timings (language versions are the
same):

    Compile  Run

C 0.126s 0.386s gcc -O3 -Wall
OCaml2 0.089s 0.391s ocamlopt
OCaml2 0.010s 10.111s ocamlc
Java 1.615s 12.745s javac

Again, OCaml is basically as fast as C. Note that C and OCaml get slower
moving to 64-bit but Java gets >7x faster. Amazingly, OCaml’s
interpreted
bytecode is faster than Java on 32-bit, whilst being >160x faster to
compile!

Here’s my latest OCaml (I think we could revert to the simpler
list-based
permuter because no significant time is spent generating the
permutations):

let rec fact n = if n=0 then 1 else n*fact(n-1)

let size = 5

(* Permutation generator *)
let p = Array.make size 0
let xx = Array.init size (fun i -> if i<size-1 then 0 else -1)
let gen_perm() =
let i = ref (size - 1) in
while !i > 0 && xx.(!i) = !i do
xx.(!i) <- 0;
decr i
done;
if !i = 0 then 1 else begin
xx.(!i) <- xx.(!i) + 1;
p.(0) <- 1;
for i=0 to size - 1 do
p.(i) <- p.(i - xx.(i));
p.(i - xx.(i)) <- i+1
done;
0
end

let n = fact size

(* Permutations *)
let perms = Array.init n (fun _ -> ignore(gen_perm()); Array.copy p)

let incompat =
let rec aux (px : int array) py i =
i<size && (px.(i) = py.(i) || aux px py (i+1)) in
Array.init n (fun x -> Array.init n (fun y -> aux perms.(x) perms.(y)
0))

let join list = String.concat “” (Array.to_list (Array.map string_of_int
list))
let output_strings = Array.map join perms
let board = Array.make size 0

let op = String.make (size*(size+1)) ‘:’

let rec add_a_row row =
if row=size then begin
for i=0 to size-1 do
String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
done;
print_string op
end else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
let incompat = incompat.(latest) in
while !prev_row < row && not incompat.(board.(!prev_row)) do
incr prev_row;
done;
if !prev_row = row then begin
board.(row) <- latest;
add_a_row (row + 1)
end
done

let () =
op.[size*(size+1)-1] <- ‘\n’;
add_a_row 0

Jon H. wrote:


Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

No.

That version is on Peter H.'s website, perhaps he’ll replace it
with the newer version on Friday.

Jon H. wrote:

let gen_perm() =
p.(i - xx.(i)) <- i+1
let rec aux (px : int array) py i =
let rec add_a_row row =
incr prev_row;
done;
if !prev_row = row then begin
board.(row) <- latest;
add_a_row (row + 1)
end
done

let () =
op.[size*(size+1)-1] <- ‘\n’;
add_a_row 0

Here’s a faster version of my program.

Eliminated a “not” in a loop by replacing the array of booleans
that shows incompatibility of two rows with an array that shows
compatibility.

Borrowed you idea of creating an output line ahead of time and
modifying it in place.

Timings:
1.11 yours
1.04 mine

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms

(* Used to determine if one row is compatible with another. *)
let compatible = Array.make_matrix n n true ;;
for x = 0 to n - 1 do
for y = 0 to n - 1 do
compatible.(x).(y) <-
List.for_all2 ( <> ) perms.(x) perms.(y)
done
done

let join list = String.concat “” (List.map string_of_int list)
let output_strings = Array.map join perms

(* For speed, create a string that’s the length of the lines
that we’ll print; the :'s that aren’t needed as separators
will later be overwritten. )
let output_line = String.make (size
(size+1)-1) ‘:’ ^ “\n”
let board = Array.make size 0

let rec add_a_row row =
if row = size then
( for i=0 to size-1 do
String.blit
output_strings.(board.(i)) 0 (* source )
output_line (i
(size+1)) (* dest )
size
done;
print_string output_line
)
else
for latest = 0 to n - 1 do
(
Create a changeable thing (variable). )
let prev_row = ref 0 in
(
The ! below fetches the variable’s value. *)
while (!prev_row < row) &&
(compatible.(latest).(board.(!prev_row))) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0

Isaac G. wrote:

That version is on Peter H.'s website, perhaps he’ll replace it
with the newer version on Friday.

Curiously the two Java implementations on Peter’s site perform the same
on
my machine in x86 and they are >32x slower than OCaml. I know Java is
slow
but that is ridiculous…

On Fri, 04 Aug 2006 01:53:41 +0900, Peter H. wrote:

have will be enough so I am looking at removing the rotational and
mirrored versions of the grids from the output.

Why would you want to store all this data? If you want to store the 7x7
version, you will need 3187.2 TB to store all the squares. To store
only the reduced squares you still need 920 MB. To store the reduced
squares for 8x8 you will need 35.6TB. (I used the wikipedia article
on Latin square - Wikipedia to compute these numbers).

My program can easily show only the reduced squares by removing the
permutations from the solutions method, and it will be a lot faster too.
If you want to do this for large squares it would be better to rewrite
my
algorithm in C. But I am interested how this data could ever be useful.

Regards,
Kristof

Jon H. wrote:

Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

You’re probably doing something wrong.

William J. wrote:

Here’s a faster version of my program.

Eliminated a “not” in a loop by replacing the array of booleans
that shows incompatibility of two rows with an array that shows
compatibility.

Heh, should’ve thought of that. :slight_smile:

Borrowed you idea of creating an output line ahead of time and
modifying it in place.

Timings:
1.11 yours
1.04 mine

I get the opposite order:

0.394s yours
0.390s mine

but your timings are probably more accurate because your computer is so
slow. :wink:

You can make it slightly faster by factoring out the CSE “compatible
(latest)”, giving:

0.382s

That’s actually faster than the C here. Woohoo! :slight_smile:

Using bitvectors may also improve things, depending if skipping a bunch
of
comparisons in that loop is significant. However, I think it is time we
searched for a new task to optimise. Ray tracer anyone? :wink:

Jon H. wrote:

modifying it in place.
but your timings are probably more accurate because your computer is so
slow. :wink:

It’s 800MHz. I ran each program 4 times and took the average.

You can make it slightly faster by factoring out the CSE “compatible
(latest)”, giving:

0.382s

That’s actually faster than the C here. Woohoo! :slight_smile:

Good point. That cuts my time to 1.01 seconds.
Also cleaned up the code a bit.
Here’s the final version.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
(hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ]

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms

(* Boolean array used to determine if one row is
compatible with another. *)
let compatible = Array.make_matrix n n true ;;
Array.iteri (fun x ex ->
Array.iteri (fun y ey ->
compatible.(x).(y) <- List.for_all2 (<>) ex ey) perms ) perms

let join list = String.concat “” (List.map string_of_int list)
let output_strings = Array.map join perms

(* For speed, create a string that’s the length of the lines
that we’ll print; the :'s that aren’t needed as separators
will later be overwritten. )
let output_line = String.make (size
(size+1)-1) ‘:’ ^ “\n”
let board = Array.make size 0

(* A recursive function. )
let rec add_a_row row =
if row = size then
( for i=0 to size-1 do
String.blit
output_strings.(board.(i)) 0 (
source )
output_line (i
(size+1)) (* dest )
size
done;
print_string output_line
)
else
for latest = 0 to n - 1 do
let compatible_slice = compatible.(latest) in
(
Create a changeable thing (variable). )
let prev_row = ref 0 in
(
The ! below fetches the variable’s value. *)
while !prev_row < row &&
compatible_slice.(board.(!prev_row)) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0

Isaac G. wrote:

You’re probably doing something wrong.

Other people have posted not entirely dissimilar timings for Java and
I’m
doing exactly the same thing on x86_64 where is runs much faster.

Has anyone managed to get Java close to OCaml or C?

Jon H. wrote:

Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

Peter H.'s measurements were displayed on his website before you
asked for the Java source code - they show the current Java program
taking ~2.15x the time of the current OCaml program, on his machine.

On 8/3/06, Jon H. [email protected] wrote:

Isaac G. wrote:

That version is on Peter H.'s website, perhaps he’ll replace it
with the newer version on Friday.

Curiously the two Java implementations on Peter’s site perform the same on
my machine in x86 and they are >32x slower than OCaml. I know Java is slow
but that is ridiculous…

Java is not slow.

Jon H. wrote:

Here are my 32-bit timings (language versions are the same):

    Compile  Run

C 0.126s 0.386s gcc -O3 -Wall
OCaml2 0.089s 0.391s ocamlopt
OCaml2 0.010s 10.111s ocamlc
Java 1.615s 12.745s javac

Here’s a C++ implementation:

C++ 0.652s 0.608s g++ -O3 -Wall

(95% of time is spent in addRow())

#include
#include
#include

using namespace std;

int fact(int n) { return n == 0 ? 1 : n*fact(n-1); }

int size=5;
vector list(size);
int n = fact(size);
vector< vector > perms(n);
vector< vector > compat(n);
vector board(size);
string out(size*(size+1), ‘:’);

void print() {
for (int i=0; i<size; ++i)
for (int j=0; j<size; ++j)
out[i*(size+1) + j] = ‘1’ + perms[board[i]][j];
cout << out;
}

int next(vector &compat, int row) {
int prev_row = 0;
while (prev_row < row && compat[board[prev_row]])
++prev_row;
return prev_row;
}

void addRow(int row) {
if (row == size) print(); else
for (int latest=0; latest<n; ++latest) {
if (next(compat[latest], row) == row) {
board[row] = latest;
addRow(row+1);
}
}
}

int main() {
for (int i=0; i<size; ++i) list[i] = i;

for (int i=0; i<n; ++i) {
perms[i] = vector(list.begin(), list.end());
next_permutation(list.begin(), list.end());
}

for (int i=0; i<n; ++i) {
compat[i] = vector(n);
for (int j=0; j<n; ++j) {
compat[i][j] = true;
for (int k=0; k<size; ++k)
if (perms[i][k] == perms[j][k]) {
compat[i][j] = false;
break;
}
}
}

out[size*(size+1)-1] = ‘\n’;

addRow(0);
}

Jon H. wrote:

C++ 0.652s 0.608s g++ -O3 -Wall
For Java2 from the site and using Sun’s Java 1.5 instead of GIJ 1.4.2
(d’oh!), I get:

Java 0.543s 0.711s javac

That’s a much more respectable running time although the compile time is
comparatively poor.

Time for another update.

Isaac G. provided a Java implementation based on mine (ie still pre
computes the tables in Perl) that brought the times down to sub 9
seconds.

real 0m8.966s
user 0m5.815s
sys 0m1.488s

But the big news is that William J.’ revision of his previous Ocaml
version is now the fastest.

real 0m3.660s
user 0m1.958s
sys 0m1.421s

The source code for both are available on the web site for you to
examine.

Thanks for that fix. We now have an even faster Ocaml version

real 0m2.260s
user 0m1.867s
sys 0m0.221s