The jruby talk on aloha ruby

aris · October 27, 2012, 9:53am

Hey Guys.

I just watched the talk by Charles here
http://confreaks.com/videos/1235-aloharuby2012-why-jruby It was
really interesting and Charles makes very strong case for jruby. I was
struck by one thing he said about running rails though. He mentioned
that if your rails app did a lot of database calls jruby might not be
faster. Maybe I got that wrong so I wanted to ask here.

Is this because the JDBC drivers are slower than the C drivers?

Tim_U · October 27, 2012, 5:11pm

My guess (and it’s just a guess) is that it’s because the data base
would be the bottleneck, and would make any Ruby implementation
differences in performance insignificant.

…so that it wouldn’t make JRuby faster or slower.

Keith

Keith R. Bennett

Tim_U · October 28, 2012, 7:08pm

Now that 1.9-mode encoding support is improved in JRuby, has the
possibility of optimizing Java-Ruby interop with UTF-16 internal
encoding been considered? In other words, calling a java method that
returns java.lang.String could return a ruby String in UTF-16 encoding,
wrapped instead of converted to bytes? Since java.lang.String is
immutable, it would still require some copy-on-write special handling.

Said another way, Ruby may support many internal encodings, but JRuby
might run most optimally when its internal encoding matches the JVM’s
UTF-16?

–David

Tim_U · October 28, 2012, 6:14pm

That is part of it, but we also sometimes pay a penalty with Java
Charset ↔ Ruby bytelist translation which can make us slower. From
my memory we were <2x slower for most generic queries (and at times
the same speed). These were queries where actual query time itself
was minimal. For more ordinary queries (where DB engine time becomes
most of the time) this difference drops off to insignificance pretty
quickly.

-Tom

On Sat, Oct 27, 2012 at 10:10 AM, Keith B.
[email protected] wrote:

faster. Maybe I got that wrong so I wanted to ask here.

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]

Tim_U · October 28, 2012, 9:28pm

On Mon, Oct 29, 2012 at 6:12 AM, Thomas E Enebo [email protected]
wrote:

That is part of it, but we also sometimes pay a penalty with Java
Charset ↔ Ruby bytelist translation which can make us slower. From
my memory we were <2x slower for most generic queries (and at times
the same speed). These were queries where actual query time itself
was minimal. For more ordinary queries (where DB engine time becomes
most of the time) this difference drops off to insignificance pretty
quickly.

Thanks for the explanation. That’s quite interesting and I guess
things like number of fields, number of text fields etc would also
make a difference.

Tim_U · October 29, 2012, 4:20pm

We know that we can save that one transcode internally by saving the
UTF-16 bytes, but if the DB is returning strings as UTF-8 and Java is
retranslating that back to UTF-16, then we only save part of the
transcoding work. Still it is something which could help a bit.

We have talked about native adapters which capture the bytes off the
wire and don’t bother to translate anything. This obviously means not
using JDBC’s abstraction.

-Tom

On Sun, Oct 28, 2012 at 1:07 PM, David K. [email protected]
wrote:

–David

On Sun, 2012-10-28 at 12:12 -0500, Thomas E Enebo wrote:

That is part of it, but we also sometimes pay a penalty with Java
Charset ↔ Ruby bytelist translation which can make us slower.

–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]

Tim_U · October 29, 2012, 7:57pm

Some JDBC drivers do provide access to the raw bytes, but I’m not sure
we’ve ever attempted to use them directly. For example:

on ResultSet:

byte[] getBytes(int columnIndex)
throws SQLException
Retrieves the value of the designated column in the current row of
this ResultSet object as a byte array in the Java programming
language. The bytes represent the raw values returned by the driver.

Charlie

Tim_U · October 29, 2012, 8:07pm

An example using Postgresql:

irb(main):029:0> conn.exec_sql_query ‘select * from hello;’
=> #Java::OrgPostgresqlJdbc4::Jdbc4ResultSet:0x4f38f663
irb(main):030:0> rs = _
=> #Java::OrgPostgresqlJdbc4::Jdbc4ResultSet:0x4f38f663
irb(main):031:0> rs.next
=> true
irb(main):032:0> rs.get_bytes(1)
=> byte[68, 117, 100, 101]@73dab220
irb(main):033:0> bytes = rs.get_bytes(1)
=> byte[68, 117, 100, 101]@73dab220
irb(main):034:0> bytes.each {|i| puts i.chr}
D
u
d
e
=> byte[68, 117, 100, 101]@73dab220

On Mon, Oct 29, 2012 at 1:55 PM, Charles Oliver N.

Tim_U · October 29, 2012, 8:54pm

I wonder if those bytes are really the original from the wire, or if the
driver always decodes to characters, and in this case is re-encoding as
bytes?

Also a UTF-16 String optimized jruby could yield perf benefits outside
of JDBC. The Nokogiri java port comes to mind, if memory serves.

–David

Tim_U · October 29, 2012, 4:24pm

Yes. Number and type of fields can easily show differences in
microbenches. As I said before, in isolation, it always looks like a
huge difference but in the picture of more complicated queries,
networking, and the rest of the stack it is not so dramatic.

-Tom

On Sun, Oct 28, 2012 at 3:28 PM, Tim U. [email protected] wrote:

Thanks for the explanation. That’s quite interesting and I guess
things like number of fields, number of text fields etc would also
make a difference.

To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email

–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]

Tim_U · October 29, 2012, 9:46pm

A little while ago I tried to get the lowdown on ActiveRecord caching.
From the scant documentation out there it seems like AR does not
execute the same query twice in the same request. If that’s true then
that may also effect the benchmarks.

I guess if you are doing aggressive caching then this whole
translation thing might be moot as you are (hopefully) caching
translated results.

Tim_U · October 29, 2012, 9:26pm

Oddly enough, using UTF-16 as the encoding for the Ruby String we get
out of a Java String seems to hurt perf.

Here’s numbers on Java 8 + perf patches for indy for a benchmark that
just causes a lot of Java String => RubyString conversion:

$ jruby -rbenchmark -e “java_import java.lang.System; 10.times { puts
Benchmark.measure { 100000.times { System.getProperty(‘java.home’) } }
}”
1.140000 0.030000 1.170000 ( 0.363000)
0.260000 0.010000 0.270000 ( 0.139000)
0.150000 0.010000 0.160000 ( 0.142000)
0.130000 0.000000 0.130000 ( 0.122000)
0.140000 0.020000 0.160000 ( 0.145000)
0.190000 0.010000 0.200000 ( 0.142000)
0.130000 0.000000 0.130000 ( 0.128000)
0.140000 0.000000 0.140000 ( 0.127000)
0.130000 0.000000 0.130000 ( 0.127000)
0.130000 0.000000 0.130000 ( 0.128000)

This is always turning Java strings into UTF-8 or whatever the default
internal encoding in JRuby is set to. Generally, it will allocate a
byte[] for a single-byte encoding and transcode to that using Java’s
Charset stuff.

Here’s if we instead always transcode into UTF-16:

$ jruby -rbenchmark -e “java_import java.lang.System; 10.times { puts
Benchmark.measure { 100000.times { System.getProperty(‘java.home’) } }
}”
1.570000 0.040000 1.610000 ( 0.510000)
0.410000 0.010000 0.420000 ( 0.192000)
0.220000 0.020000 0.240000 ( 0.172000)
0.160000 0.000000 0.160000 ( 0.152000)
0.170000 0.020000 0.190000 ( 0.177000)
0.290000 0.010000 0.300000 ( 0.176000)
0.150000 0.000000 0.150000 ( 0.150000)
0.160000 0.000000 0.160000 ( 0.145000)
0.150000 0.000000 0.150000 ( 0.140000)
0.190000 0.000000 0.190000 ( 0.138000)

Performance is on average worse. I would guess this is because:

Transcoding still has to walk the same number of characters, even
though it can just dump them as two-byte chunks
UTF-16 allocates at least length * 2 bytes, where UTF-8 can allocate
something less than that usually

So the savings we get from not transcoding into a one-byte format
doesn’t seem to outweigh the fact that we have to allocate and
populate a larger array.

This does not say anything about cases where we can get the original
bytes first. That could be much faster.

Charlie

Tim_U · October 30, 2012, 5:19pm

I had a more radical (and possibly more misguided!) optimization in
mind: When calling java methods that return String (or any
CharSequence), wrap as a Ruby String without copy and serve at least the
character read operations via CharSequence.charAt(), etc. This Ruby
String would then need to support copy-on-write semantics for any
mutation. Also possibly any ruby String passed in as a Java String would
have its conversion preserved with the same representation, such that my
manual optimization in the second bench below would no longer yield any
benifit:

% jruby -rbenchmark -rjava -e “java_import java.lang.System; 5.times {
puts Benchmark.measure { 1000000.times { System.getProperty(‘java.home’)
} } }”
1.699000 0.000000 1.699000 ( 1.699000)
0.741000 0.000000 0.741000 ( 0.741000)
0.612000 0.000000 0.612000 ( 0.612000)
0.615000 0.000000 0.615000 ( 0.615000)
0.614000 0.000000 0.614000 ( 0.614000)
% jruby -rbenchmark -rjava -e “java_import java.lang.System; 5.times {
jhome = ‘java.home’.to_java; puts Benchmark.measure { 1000000.times {
System.getProperty(jhome) } } }”
0.981000 0.000000 0.981000 ( 0.981000)
0.895000 0.000000 0.895000 ( 0.896000)
0.525000 0.000000 0.525000 ( 0.525000)
0.471000 0.000000 0.471000 ( 0.471000)
0.452000 0.000000 0.452000 ( 0.452000)

(Note I changed the number of iterations from your original.)

–David

Tim_U · October 30, 2012, 5:39pm

Yeah I would like to see us do this at some point. It’s tricky because
we have so much code built around byte[] (for obvious reasons) that
many/most things you might do to manipulate or search a String would
want to use the raw bytes anyway. We also cache (inconsistently) the
java.lang.String object created, which helps reduce overhead when
we’re not recreating the Ruby side every time as well.

I wish Ruby had just made literal Strings be immutable or
something…would make our job a lot easier

Charlie

Tim_U · October 30, 2012, 9:53pm

I wish Ruby had just made literal Strings be immutable or
something…would make our job a lot easier

Please excuse my stupidity but could you not create a ImmutableString
type or something?

Tim_U · October 31, 2012, 3:05am

What was your experience wrt performance? I would hope the bytes
coming out of there are as close to “off the wire” as possible, but I
have not looked into the implementation of any specific JDBC driver to
see if that’s the case.

If we could get close-to-the-wire bytes to use for AR-JDBC, it could
mean a tremendous reduction in object overhead for walking a result
set.

Charlie

On Tue, Oct 30, 2012 at 1:21 PM, Christian MICHON

Tim_U · October 30, 2012, 7:23pm

I’ve used byte access with jdbc on H2 database (java based SQL
server). I was storing md5sum in bytes using Sequel as ORM, then later
on went down to jdbc prepared statements.

On 10/29/12, Charles Oliver N. [email protected] wrote:

wire and don’t bother to translate anything. This obviously means not

java.lang.String could return a ruby String in UTF-16 encoding, wrapped
On Sun, 2012-10-28 at 12:12 -0500, Thomas E Enebo wrote:

–
Christian

Tim_U · October 31, 2012, 3:06am

On Tue, Oct 30, 2012 at 3:53 PM, Tim U. [email protected] wrote:

Please excuse my stupidity but could you not create a ImmutableString
type or something?

We certainly could, but no existing code uses it. If we were to use it
for literal strings, all code that expects a literal string in code to
create a new mutable String object would start to fail.

Mutable Strings by default just seems to be a bad decision.

Charlie