I just watched the talk by Charles here http://confreaks.com/videos/1235-aloharuby2012-why-jruby It was
really interesting and Charles makes very strong case for jruby. I was
struck by one thing he said about running rails though. He mentioned
that if your rails app did a lot of database calls jruby might not be
faster. Maybe I got that wrong so I wanted to ask here.
Is this because the JDBC drivers are slower than the C drivers?
My guess (and it’s just a guess) is that it’s because the data base
would be the bottleneck, and would make any Ruby implementation
differences in performance insignificant.
Now that 1.9-mode encoding support is improved in JRuby, has the
possibility of optimizing Java-Ruby interop with UTF-16 internal
encoding been considered? In other words, calling a java method that
returns java.lang.String could return a ruby String in UTF-16 encoding,
wrapped instead of converted to bytes? Since java.lang.String is
immutable, it would still require some copy-on-write special handling.
Said another way, Ruby may support many internal encodings, but JRuby
might run most optimally when its internal encoding matches the JVM’s
UTF-16?
That is part of it, but we also sometimes pay a penalty with Java
Charset ↔ Ruby bytelist translation which can make us slower. From
my memory we were <2x slower for most generic queries (and at times
the same speed). These were queries where actual query time itself
was minimal. For more ordinary queries (where DB engine time becomes
most of the time) this difference drops off to insignificance pretty
quickly.
On Mon, Oct 29, 2012 at 6:12 AM, Thomas E Enebo [email protected]
wrote:
That is part of it, but we also sometimes pay a penalty with Java
Charset ↔ Ruby bytelist translation which can make us slower. From
my memory we were <2x slower for most generic queries (and at times
the same speed). These were queries where actual query time itself
was minimal. For more ordinary queries (where DB engine time becomes
most of the time) this difference drops off to insignificance pretty
quickly.
Thanks for the explanation. That’s quite interesting and I guess
things like number of fields, number of text fields etc would also
make a difference.
We know that we can save that one transcode internally by saving the
UTF-16 bytes, but if the DB is returning strings as UTF-8 and Java is
retranslating that back to UTF-16, then we only save part of the
transcoding work. Still it is something which could help a bit.
We have talked about native adapters which capture the bytes off the
wire and don’t bother to translate anything. This obviously means not
using JDBC’s abstraction.
Some JDBC drivers do provide access to the raw bytes, but I’m not sure
we’ve ever attempted to use them directly. For example:
on ResultSet:
byte[] getBytes(int columnIndex)
throws SQLException
Retrieves the value of the designated column in the current row of
this ResultSet object as a byte array in the Java programming
language. The bytes represent the raw values returned by the driver.
I wonder if those bytes are really the original from the wire, or if the
driver always decodes to characters, and in this case is re-encoding as
bytes?
Also a UTF-16 String optimized jruby could yield perf benefits outside
of JDBC. The Nokogiri java port comes to mind, if memory serves.
Yes. Number and type of fields can easily show differences in
microbenches. As I said before, in isolation, it always looks like a
huge difference but in the picture of more complicated queries,
networking, and the rest of the stack it is not so dramatic.
A little while ago I tried to get the lowdown on ActiveRecord caching.
From the scant documentation out there it seems like AR does not
execute the same query twice in the same request. If that’s true then
that may also effect the benchmarks.
I guess if you are doing aggressive caching then this whole
translation thing might be moot as you are (hopefully) caching
translated results.
This is always turning Java strings into UTF-8 or whatever the default
internal encoding in JRuby is set to. Generally, it will allocate a
byte[] for a single-byte encoding and transcode to that using Java’s
Charset stuff.
Here’s if we instead always transcode into UTF-16:
Performance is on average worse. I would guess this is because:
Transcoding still has to walk the same number of characters, even
though it can just dump them as two-byte chunks
UTF-16 allocates at least length * 2 bytes, where UTF-8 can allocate
something less than that usually
So the savings we get from not transcoding into a one-byte format
doesn’t seem to outweigh the fact that we have to allocate and
populate a larger array.
This does not say anything about cases where we can get the original
bytes first. That could be much faster.
I had a more radical (and possibly more misguided!) optimization in
mind: When calling java methods that return String (or any
CharSequence), wrap as a Ruby String without copy and serve at least the
character read operations via CharSequence.charAt(), etc. This Ruby
String would then need to support copy-on-write semantics for any
mutation. Also possibly any ruby String passed in as a Java String would
have its conversion preserved with the same representation, such that my
manual optimization in the second bench below would no longer yield any
benifit:
Yeah I would like to see us do this at some point. It’s tricky because
we have so much code built around byte[] (for obvious reasons) that
many/most things you might do to manipulate or search a String would
want to use the raw bytes anyway. We also cache (inconsistently) the
java.lang.String object created, which helps reduce overhead when
we’re not recreating the Ruby side every time as well.
I wish Ruby had just made literal Strings be immutable or
something…would make our job a lot easier
What was your experience wrt performance? I would hope the bytes
coming out of there are as close to “off the wire” as possible, but I
have not looked into the implementation of any specific JDBC driver to
see if that’s the case.
If we could get close-to-the-wire bytes to use for AR-JDBC, it could
mean a tremendous reduction in object overhead for walking a result
set.
I’ve used byte access with jdbc on H2 database (java based SQL
server). I was storing md5sum in bytes using Sequel as ORM, then later
on went down to jdbc prepared statements.
Please excuse my stupidity but could you not create a ImmutableString
type or something?
We certainly could, but no existing code uses it. If we were to use it
for literal strings, all code that expects a literal string in code to
create a new mutable String object would start to fail.
Mutable Strings by default just seems to be a bad decision.
Charlie
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.