JRuby performance issues on thrift server benchmark

For those of you unaware of Thrift, it is an open-source language-
agnostic distributed messaging system created by Facebook (and
currently being incubated as an apache project). I recently re-wrote
the ruby libraries, including writing a brand new threaded server. I
also wrote benchmarking code that spawns the server in one processes,
then spawns 40 worker processes that each connect to the server 5
times and send 50 “messages” to the server. This means there’s a max
of 40 concurrent connections, and 200 total connections. It uses
TCPServer/TCPSocket to accomplish this.

Running the benchmark under MRI takes roughly 9 seconds. Here’s some
sample output from the benchmark:

Thrift::NonblockingServer Benchmark (MRI):
Connection failures: 0
Average time per call: 0.0297 seconds
Average time per client (50 calls): 1.4857 seconds
Total time for all calls: 296.8446 seconds
Real time for benchmarking: 9.2467 seconds
Shortest call time: 0.0009 seconds
Longest call time: 0.1499 seconds
Shortest client time (50 calls): 0.0867 seconds
Longest client time (50 calls): 1.9158 seconds

Unfortunately, when I switch to jruby, I see a significant slowdown.

Thrift::NonblockingServer Benchmark (jruby)
Connection failures: 0
Average time per call: 0.0681 seconds
Average time per client (50 calls): 3.4094 seconds
Total time for all calls: 681.3200 seconds
Real time for benchmarking: 18.2711 seconds
Shortest call time: 0.0013 seconds
Longest call time: 2.2829 seconds
Shortest client time (50 calls): 1.2270 seconds
Longest client time (50 calls): 8.1265 seconds

In this run, it took twice as long as MRI to run the benchmark. The
best number I’ve seen is just over 15 seconds, but usually I see
roughly 19 seconds. This is under the exact same environment and load
as the MRI benchmark.

Does anybody have any suggestions to explain this performance issues?

If you wish to try it yourself, you first need a patched version of
jruby. The following two patches need to be applied to jruby 1.1.2 or
jruby svn:

http://jira.codehaus.org/secure/attachment/35101/ChannelStream-readpartial.patch
http://jira.codehaus.org/secure/attachment/35098/jruby-select.patch

Hopefully these patches will be applied to svn trunk tomorrow.

Anyway, once you have that, you can download my own thrift code at

http://github.com/kballard/thrift

Inside the thrift clone (or tarball) you can cd to lib/rb. From there,
run rake benchmark to test the MRI benchmark and rake benchmark THRIFT_SERVER_INTERPRETER=jruby to test the jruby benchmark. Note
that when using the jruby benchmark, the worker clients are still run
using MRI. Attempting to run them under jruby doesn’t even work at the
moment.

-Kevin B.


Kevin B.
[email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Hi Kevin,

Thanks for the interesting report. I’ll try to play with this today.
More comments below.

On Tue, Jun 10, 2008 at 11:34 PM, Kevin B. [email protected]
wrote:

If you wish to try it yourself, you first need a patched version of jruby.
The following two patches need to be applied to jruby 1.1.2 or jruby svn:

http://jira.codehaus.org/secure/attachment/35101/ChannelStream-readpartial.patch
http://jira.codehaus.org/secure/attachment/35098/jruby-select.patch

As of now, the patches are not needed anymore, the reported bugs have
been fixed on SVN trunk. Kevin, thanks for the bug reports and initial
patches!

Thanks,
–Vladimir


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Hi Kevin,

On Tue, Jun 10, 2008 at 11:34 PM, Kevin B. [email protected]
wrote:

In this run, it took twice as long as MRI to run the benchmark. The best
number I’ve seen is just over 15 seconds, but usually I see roughly 19
seconds. This is under the exact same environment and load as the MRI
benchmark.

Hmm, I don’t see such behavior. I tested on my Ubuntu Linux 8.04, JDK
1.6, latest JRuby from trunk.

For me, MRI run is about 13.5 secs, while JRuby’s run is about 11 secs.

What OS do you use?

Thanks,
–Vladimir


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

On Jun 11, 2008, at 1:07 PM, Vladimir S. wrote:

Hmm, I don’t see such behavior. I tested on my Ubuntu Linux 8.04, JDK
1.6, latest JRuby from trunk.

For me, MRI run is about 13.5 secs, while JRuby’s run is about 11
secs.

What OS do you use?

OS X 10.5.3, JDK 1.5, latest JRuby from trunk.

Testing again with JDK 1.6 gets me 13.5 seconds (when run warm), which
is significantly better than 19 seconds but still worse than MRI’s 10
seconds. In addition, it has significantly worse stats, such as higher
Shortest call time, significantly higher Longest call time (2
seconds vs MRI’s 0.3 seconds), and roughly twice the Shortest client
time and Longest client time.

-Kevin B.


Kevin B.
[email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

That definitely slows it down for me, from 13.5 to 16.5 seconds.

-Kevin

On Jun 11, 2008, at 1:45 PM, Vladimir S. wrote:

–Vladimir

In this run, it took twice as long as MRI to run the benchmark.

seconds.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Kevin B.
[email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Can you lengthen the duration of your test run to see how that affects
things…If it doesn’t improve our numbers more relative to MRI then
it clearly points to a bottleneck. If it does then it may indicate
that we warm up too slowly. The fact that for Vladimir gets better
results with --client makes me think that things still have warmed up
enough :frowning:

-Tom

On Wed, Jun 11, 2008 at 3:48 PM, Kevin B. [email protected]
wrote:

significantly better than 19 seconds but still worse than MRI’s 10
Kevin B.

Kevin B.


Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: [email protected] , [email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Ok, that improves the numbers. If I bump to 10 clients, MRI goes to
17.5 seconds and jruby goes to 21 seconds. If I bump to 20 clients,
MRI goes to 33 seconds and jruby to 32.5.

So it looks like now I should focus all my attention on the fact that
jruby still can’t run the client part, just the server part. And a
significant chunk of my specs also fail under jruby.

-Kevin B.

On Jun 11, 2008, at 2:27 PM, Thomas E Enebo wrote:

wrote:

JRuby results:

best
For me, MRI run is about 13.5 secs, while JRuby’s run is about
In addition, it has significantly worse stats, such as higher

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Kevin B.
[email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Hi Kevin,

On Wed, Jun 11, 2008 at 11:51 PM, Kevin B. [email protected]
wrote:

So it looks like now I should focus all my attention on the fact that jruby
still can’t run the client part, just the server part. And a significant
chunk of my specs also fail under jruby.

I see that you’ve selected the most productive way of dealing with
this, opening up good bug reports with simple test cases, much
appreciated! :slight_smile:

I also noticed that you’ve raised a question about Fixnum range in
JRuby.

Yes, JRuby has a bigger Fixnum range than MRI (32bit), we have 64 bits
for it. This is in line with MRI (64 bit).

Essentially, you could think of JRuby as yet another 64-bit and
big-endian implementation of Ruby, no matter what underlying platform
is, whether it’s 32 bit platfrom or little-endian platform. In that
regard, JRuby’s behavior is in sync with MRI behavior on Solaris-Sparc
x64. :slight_smile:

Thanks,
–Vladimir


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Interesting…

I also noticed that on my system the following command produces better
JRuby results:

rake benchmark THRIFT_SERVER_INTERPRETER=‘jruby --client’

Not sure why though… :slight_smile:

Thanks,
–Vladimir

On Wed, Jun 11, 2008 at 10:29 PM, Kevin B. [email protected]
wrote:

significantly better than 19 seconds but still worse than MRI’s 10 seconds.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email