Announcement
MDArray version 0.5.4 has Just been released. MDArray is a multi
dimensional array implemented
for JRuby inspired by NumPy (www.numpy.org) and Masahiro Tanakas Narray
(
narray.rubyforge.org).
MDArray stands on the shoulders of Java-NetCDF and Parallel Colt. At
this
point MDArray has
libraries for mathematical, trigonometric and descriptive statistics
methods.
NetCDF-Java Library is a Java interface to NetCDF files, as well as to
many
other types of
scientific data formats. It is developed and distributed by Unidata
(http://www.unidata.ucar.edu).
Parallel Colt (
http://grepcode.com/snapshot/repo1.maven.org/maven2/net.sourceforge.parallelcolt/
parallelcolt/0.10.0/) is a multithreaded version of Colt (
redirect...).
Colt provides a set of Open Source Libraries for High Performance
Scientific and Technical
Computing in Java. Scientific and technical computing is characterized
by
demanding problem
sizes and a need for high performance at reasonably small memory
footprint.
For more information and (some) documentation please go to:
Whats new:
NetCDF-3 File Support
From Wikipedia, the free encyclopedia:
"NetCDF (Network Common Data Form) is a set of software libraries and
self-describing,
machine-independent data formats that support the creation, access, and
sharing of array-oriented
scientific data. The project homepage is hosted by the Unidata program
at
the University
Corporation for Atmospheric Research (UCAR). They are also the chief
source
of netCDF software,
standards development, updates, etc. The format is an open standard.
NetCDF
Classic and 64-bit
Offset Format are an international standard of the Open Geospatial
Consortium.
The project is actively supported by UCAR. Version 4.0 (released in
2008)
allows the use of the
HDF5 data file format. Version 4.1 (2010) adds support for C and Fortran
client access to
specified subsets of remote data via OPeNDAP.
The format was originally based on the conceptual model of the Common
Data
Format developed by
NASA, but has since diverged and is not compatible with it."
This version of MDArray implements NetCDF-3 file support only. NetCDF-4
is
not yet supported. At
the end of this announcement we show the MDArray implementation of the
NetCDF-3 file writing
from the tutorial at:
http://www.unidata.ucar.edu/software/netcdf-java/tutorial/NetcdfWriting.html
MDArray and SciRuby:
MDArray subscribes fully to the SciRuby Manifesto (http://sciruby.com/).
Ruby has for some time had no equivalent to the beautifully constructed
NumPy, SciPy, and
matplotlib libraries for Python.
We believe that the time for a Ruby science and visualization package
has
come. Sometimes
when a solution of sugar and water becomes super-saturated, from it
precipitates a pure,
delicious, and diabetes-inducing crystal of sweetness, induced by no
more
than the tap of a
finger. So is occurring now, we believe, with numeric and visualization
libraries for Ruby.
MDArray main properties are:
- Homogeneous multidimensional array, a table of elements (usually
numbers), all of the
same type, indexed by a tuple of positive integers; - Easy calculation for large numerical multi dimensional arrays;
- Basic types are: boolean, byte, short, int, long, float, double,
string,
structure; - Based on JRuby, which allows importing Java libraries;
- Operator: +,-,*,/,%,**, >, >=, etc.;
- Functions: abs, ceil, floor, truncate, is_zero, square, cube, fourth;
- Binary Operators: &, |, ^, ~ (binary_ones_complement), <<, >>;
- Ruby Math functions: acos, acosh, asin, asinh, atan, atan2, atanh,
cbrt,
cos, erf, exp,
gamma, hypot, ldexp, log, log10, log2, sin, sinh, sqrt, tan, tanh,
neg; - Boolean operations on boolean arrays: and, or, not;
- Fast descriptive statistics from Parallel Colt (complete list found
bellow); - Easy manipulation of arrays: reshape, reduce dimension, permute,
section, slice, etc.; - Support for reading and writing NetCDF-3 files;
- Reading of two dimensional arrays from CSV files (mainly for
debugging
and simple testing
purposes); - StatList: a list that can grow/shrink and that can compute Parallel
Colt
descriptive
statistics; - Experimental lazy evaluation (still slower than eager evaluation).
Descriptive statistics methods imported from Parallel Colt:
- auto_correlation, correlation, covariance, durbin_watson,
frequencies,
geometric_mean, - harmonic_mean, kurtosis, lag1, max, mean, mean_deviation, median,
min,
moment, moment3, - moment4, pooled_mean, pooled_variance, product, quantile,
quantile_inverse, - rank_interpolated, rms, sample_covariance, sample_kurtosis,
sample_kurtosis_standard_error, - sample_skew, sample_skew_standard_error, sample_standard_deviation,
sample_variance, - sample_weighted_variance, skew, split, standard_deviation,
standard_error, sum, - sum_of_inversions, sum_of_logarithms, sum_of_powers,
sum_of_power_deviations, - sum_of_squares, sum_of_squared_deviations, trimmed_mean, variance,
weighted_mean, - weighted_rms, weighted_sums, winsorized_mean.
Double and Float methods from Parallel Colt:
- acos, asin, atan, atan2, ceil, cos, exp, floor, greater,
IEEEremainder,
inv, less, lg, - log, log2, rint, sin, sqrt, tan.
Double, Float, Long and Int methods from Parallel Colt:
- abs, compare, div, divNeg, equals, isEqual (is_equal), isGreater
(is_greater), - isles (is_less), max, min, minus, mod, mult, multNeg (mult_neg),
multSquare (mult_square), - neg, plus (add), plusAbs (plus_abs), pow (power), sign, square.
Long and Int methods from Parallel Colt
- and, dec, factorial, inc, not, or, shiftLeft (shift_left),
shiftRightSigned
(shift_right_signed), shiftRightUnsigned (shift_right_unsigned),
xor.
MDArray installation and download:
- Install Jruby
- jruby S gem install mdarray
MDArray Homepages:
Contributors:
Contributors are welcome.
MDArray History:
- 07/08/2013: Version 0.5.4 - Support for reading and writing NetCDF-3
files - 24/06/2013: Version 0.5.3 Over 90% Performance improvements for
methods imported
from Parallel Colt and over 40% performance improvements for all
other methods
(implemented in Ruby); - 16/05/2013: Version 0.5.0 - All loops transferred to Java with over
50%
performance
improvements. Descriptive statistics from Parallel Colt; - 19/04/2013: Version 0.4.3 - Fixes a simple, but fatal bug in 0.4.2.
No
new features; - 17/04/2013: Version 0.4.2 - Adds simple statistics and boolean
operators; - 05/04/2013: Version 0.4.0 Initial release.
NetCDF-3 Writing with MDArray API
require ‘mdarray’
class NetCDF
attr_reader :dir, :filename, :max_strlen
#---------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------
def initialize
@dir = “~/tmp”
@filename1 = “testWriter”
@filename2 = “testWriteRecord2”
@max_strlen = 80
end
#---------------------------------------------------------------------------------------
Define the NetCDF-3 file
#---------------------------------------------------------------------------------------
def define_file
# We pass the directory, filename, filetype and optionaly the
outside_scope.
#
# I’m implementing in cygwin, so the need for method cygpath that
converts the
# directory name to a Windows name. In another environment, just
pass
the directory
# name.
#
# Inside a block we have another scope, so the block cannot access
any
variables, etc.
# from the ouside scope. If we pass the outside scope, in this case
we
are passing self,
# we can access variables in the outside scope by using
@outside_scope..
NetCDF.define(cygpath(@dir), @filename1, “netcdf3”, self) do
# add dimensions
dimension "lat", 64
dimension "lon", 128
# add variables and attributes
# add Variable double temperature(lat, lon)
variable "temperature", "double", [@dim_lat, @dim_lon]
variable_att @var_temperature, "units", "K"
variable_att @var_temperature, "scale", [1, 2, 3]
# add a string-value variable: char svar(80)
# note that this is created as a scalar variable although in
NetCDF-3
there is no
# string type and the string has to be represented as a char type.
variable “svar”, “string”, [], {:max_strlen =>
@outside_scope.max_strlen}
# add a 2D string-valued variable: char names(names, 80)
dimension "names", 3
variable "names", "string", [@dim_names], {:max_strlen =>
@outside_scope.max_strlen}
# add a scalar variable
variable "scalar", "double", []
# add global attributes
global_att "yo", "face"
global_att "versionD", 1.2, "double"
global_att "versionF", 1.2, "float"
global_att "versionI", 1, "int"
global_att "versionS", 2, "short"
global_att "versionB", 3, "byte"
end
end
#---------------------------------------------------------------------------------------
write data on the above define file
#---------------------------------------------------------------------------------------
def write_file
NetCDF.write(cygpath(@dir), @filename1, self) do
temperature = find_variable("temperature")
shape = temperature.shape
data = MDArray.fromfunction("double", shape) do |i, j|
i * 1_000_000 + j * 1_000
end
write(temperature, data)
svar = find_variable("svar")
write_string(svar, "Two pairs of ladies stockings!")
names = find_variable("names")
# careful here with the shape of a string variable. A string
variable has one
# more dimension than it should as there is no string type in
NetCDF-3. As such,
# if we look as names’ shape it has 2 dimensions, be we need to
create a one
# dimension string array.
data = MDArray.string([3], [“No pairs of ladies stockings!”,
“One pair of ladies stockings!”,
“Two pairs of ladies stockings!”])
write_string(names, data)
# write scalar data
scalar = find_variable("scalar")
write(scalar, 222.333 )
end
end
#---------------------------------------------------------------------------------------
Define a file for writing one record at a time
#---------------------------------------------------------------------------------------
def define_one_at_time
NetCDF.define(cygpath(@dir), @filename2, "netcdf3", self) do
dimension "lat", 3
dimension "lon", 4
# zero sized dimension is an unlimited dimension
dimension "time", 0
variable "lat", "float", [@dim_lat]
variable_att @var_lat, "units", "degree_north"
variable "lon", "float", [@dim_lon]
variable_att @var_lon, "units", "degree_east"
variable "rh", "int", [@dim_time, @dim_lat, @dim_lon]
variable_att @var_rh, "long_name", "relative humidity"
variable_att @var_rh, "units", "percent"
variable "T", "double", [@dim_time, @dim_lat, @dim_lon]
variable_att @var_t, "long_name", "surface temperature"
variable_att @var_t, "units", "degC"
variable "time", "int", [@dim_time]
variable_att @var_time, "units", "hours since 1990-01-01"
end
end
#---------------------------------------------------------------------------------------
Define a file for writing one record at a time
#---------------------------------------------------------------------------------------
def write_one_at_time
NetCDF.write(cygpath(@dir), @filename2, self) do
lat = find_variable("lat")
lon = find_variable("lon")
# write non recored data to the variables
write(lat, MDArray.float([3], [41, 40, 39]))
write(lon, MDArray.float([4], [-109, -107, -105, -103]))
# get record variables from file
rh = find_variable("rh")
time = find_variable("time")
t = find_variable("T")
# there is no method find_dimension for NetcdfFileWriter, so we
need
to get the
# dimension from a variable.
rh_shape = rh.shape
dim_lat = rh_shape[1]
dim_lon = rh_shape[2]
(0...10).each do |time_idx|
# fill rh_data array
rh_data = MDArray.fromfunction("int", [dim_lat, dim_lon]) do
|lat,
lon|
time_idx * lat * lon
end
# reshape rh_data so that it has the same shape as rh variable
# Method reshape! reshapes the array in-place without data
copying.
rh_data.reshape!([1, dim_lat, dim_lon])
# fill temp_data array
temp_data = MDArray.fromfunction("double", [dim_lat, dim_lon])
do
|lat, lon|
time_idx * lat * lon / 3.14159
end
# reshape temp_data array so that it has the same shape as temp
variable.
temp_data.reshape!([1, dim_lat, dim_lon])
# write the variables
write(time, MDArray.int([1], [time_idx * 12]), [time_idx])
write(rh, rh_data, [time_idx, 0, 0])
write(t, temp_data, [time_idx, 0, 0])
end # End time_idx loop
end
end
end
netcdf = NetCDF.new
netcdf.define_file
netcdf.write_file
netcdf.define_one_at_time
netcdf.write_one_at_time