Great discussion!
Im going to try to bring my contribution…
Im implementing a multi-dimensional array class (MDArray), which by the
way, Im thinking to give to the community if there is any interest (more
on another thread) in line with Numpy. The solution uses
multi-dimensional
java array implemented by unidata Java NetCDF library. This library is
freely available and the source code is released under the (MIT-style)
netCDF C library
licensehttp://www.unidata.ucar.edu/software/netcdf/copyright.html.
Ive being doing some performance testing and would like to share the
results. My machine: Intel Core i5-2400 CPU @ 3.10GHz. 4,00 GB Windows
7
64 on top of cygwin.
The code bellow creates a new multi-dimensional array of type double
with
4 dimensions with the sizes of 7, 500, 20, 320. So, the total number of
elements in the array is: 7 x 500 x 20 x 320 = 22.400.000. The
fromfunction block receives the dimensions and fills the given element
with
the resulting value. So in the example bellow @a[5, 10, 15, 20] = 5 +
10 +
15 + 20 = 50.
@a = MDArray.fromfunction(“double”, [7, 500, 20, 320]) do |x, y, z, k|
x + y + z + k
end
The relevant ruby code called to execute this method is:
def set_block(*args)
get_args(*args) do |op_iterator, shape, *other_args|
block = other_args[0]
while (op_iterator.has_next?)
op_iterator.next
op_iterator.set_current(block.call(op_iterator.get_current_counter))
end if block
end
end
Get_args just parses the arguments to fromfunction. In this
case
the argument is the multi-dimension array @a and it calls the internal
block giving an iterator over @a op_iterator, the shape of the array
and
the remaining args if there are any;
We then iterate over all elements of @a and set the current
value
by calling the block with the current_counter value, e.g., [0, 0, 0, 0],
[0, 0, 0, 1], etc.
The methods set_current and get_current_counter are calls to the NetCDF
java methods and should be fairly fast.
Running this code with Jruby 1.6.8 as: /jruby-1.6.8/bin/jruby --server
-J-Djruby.compile.frameless=true -J-Djruby.compile.fastops=true
-J-Xmn512m
-J-Xms1024m -J-Xmx1024m $1 takes between 36 and 38 seconds.
Running the same code with Jruby 1.7.2 as: /jruby-1.7.2/bin/jruby
–server
-Xinvokedynamic.constants=true -J-Xmn512m -J-Xms1024m -J-Xmx1024m $1
takes
between 33 and 35 seconds.
So, there is an improvement, but I wouldnt say it is very large and I
dont know if we can make any generalization.
35 seconds is actually a lot of time and in order to improve on this I
created java methods to execute the loop.
This is the Java method that does the same loop as the ruby method
above:
public static void setAll4(ArrayDouble array, D4 func) {
IndexIterator iterator = array.getIndexIterator();
int[] counter;
while (iterator.hasNext()) {
iterator.next();
counter = iterator.getCurrentCounter();
iterator.setDoubleCurrent(func.call(counter[0],
counter[1], counter[2],
counter[3]));
}
}
Unfortunately I had to create methods called setAll1, setAll2, setAll3
setAll7 for efficiency reasons as I dont think there is a way for Jruby
to
finding the proper method. In the example above, setAll4 will be called
as
my array is 4 dimensions.
Now running this code with Jruby 1.6.8 with the same flags as before
executes in 4.23 seconds with very minor differences between runs. So,
bringing the code to Java does actually make a huge difference.
Now, with Jruby 1.7.2 with invokedynamics it executes in 4.8 to 5.1
seconds. So, actually it performs worst than 1.6.8. Even changing the
flags to the same flags as in 1.6.8 does not improve performance.
So any comments and ideas why 1.7.2 is worst than 1.6.8 when the loop is
in
Java?
Are there other interesting flags that should be used in order to
improve
performance?
Thanks for all the comments and ideas…
Rodrigo