Introduction: I'm using JRuby in a Java servlet container to run an application …that uses ActiveRecord. This is a fully multithreaded environment where a thread pool services http requests, possibly in parallel. I've been having problems with timeouts when obtaining a database connection with only moderate concurrent requests.
(Reducing the size of the thread pool to match that of the connection pool may be one option, but I am running multiple apps in the servlet container all sharing one thread pool... So for now, the thread pool is 32 and the connection pool is 5).
I'm putting this patch out there to solicit comments. I think it needs more testing. I may have a chance this week to test in a real application, but for now I only have my two benchmark scripts ar_test.rb (for JRuby) and ar_test_mri.rb. Notably, I have not tested the behavior when connections are removed explicitly or removed by the reaper.
I also do not understand the distinction between PoolFullError and ConnectionTimeoutError. I also introduced an @available array of connections ready to be checked out. At the time this seemed like a reasonable idea, but I had to make some inelegant additions (essentially everywhere @connections is modified, @available must be also, e.g. #clear_reloadable_connections).
I apologize for the length of this report.
tl;dr **the fairness patch reduces outliers in time to acquire a connection but needs review and testing**
The test suite passes, tested on Linux/Ruby-1.9.2-p290.
```
activerecord$ bundle exec rake test_sqlite3
Finished tests in 193.646555s, 17.1859 tests/s, 52.2653 assertions/s.
3328 tests, 10121 assertions, 0 failures, 0 errors, 12 skips
```
---
This patch makes the connection pool "fair" in the first-come, first-served sense. It also avoids using the Timeout class (which spawns a dedicated thread for each connection checkout attempt that needs to to wait with a timeout).
I've added two tests that fail before applying this patch. The first ensures the connection pool is fair by queuing up several waiting threads and ensuring they acquire connections in order as connections are made available. The connections are trickled in one by one because we don't actually care when two connections become available that order is strictly preserved.
The second queues up two groups of waiting threads, then checks in enough connections for group 1 all at once. The test ensures that only the group 1 threads acquired connections.
A third test is the money test but was removed because it was unreliable. It attempted to test latency in a Java-servlet-like environment by setting up a thread pool and having each thread check connections in and out.
Instead, I used ar_test.rb to obtain histograms of the time per simulated request (checkout connection; sleep 0.01s; checkin connection).
# Tests with ar_test.rb and JRuby 1.6.6
## ActiveRecord 3.2.3, JRuby 1.6.6
```
$ bundle exec jruby --1.9 ar_test.rb
10000 reqs in 24.666 = 405.4 req/s
min/max/mean 0.01/2.37/0.08
histogram
[0.00,0.03) ooooooooooooooo
[0.03,0.05) o
[0.05,0.08) oooooooooooooooooooooooooooooooo
[0.08,0.10) ooooo
[0.10,0.13) o
[0.13,0.15) oooooo
[0.15,0.18) o
[0.18,0.20) oo
[0.20,0.22) o
[0.22,0.25) oo
```
## ActiveRecord edge without fairness patch JRuby 1.6.6
```
Gave up after long time. Many errors like this (notice time much longer than 5 s timeout):
took 120.067s to acquire and release
execution expired /home/pat/dev/ruby/rails/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb:106:in `await'
```
## ActiveRecord edge with fairness patch, JRuby 1.6.6
```
$ bundle exec jruby --1.9 ar_test.rb
ActiveRecord::VERSION 4.0.0.beta
10000 reqs in 24.8 = 403.7 req/s
min/max/mean 0.06/1.48/0.08
histogram
[0.00,0.03)
[0.03,0.05)
[0.05,0.08) ooooooooooooooooooooooooooooooooooooooooooooooooo
[0.08,0.10) ooooooooooo
[0.10,0.13) o
[0.13,0.15) o
[0.15,0.18) o
[0.18,0.20) o
[0.20,0.22) o
[0.22,0.25) o
```
# Tests with ar_test_mri.rb and JRuby 1.6.6
## ActiveRecord 3.2.3, Ruby 1.9.2-p290
```
Sometimes errors like this:
could not obtain a database connection within 5 seconds (waited 5.000125524999999 seconds). The max pool size is currently 5; consider increasing it. /home/pat/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/activerecord-3.2.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:248:in `block (2 levels) in checkout'
$ bundle exec ruby ar_test_mri.rb # ruby 1.9.2-p290
ActiveRecord::VERSION::STRING 3.2.3
10000 reqs in 21.0 = 475.7 req/s
min/max/mean 0.01/4.74/0.07
histogram
[0.00,0.03) ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
[0.03,0.05) o
[0.05,0.08) o
[0.08,0.10)
[0.10,0.12) o
[0.12,0.15) o
[0.15,0.17) o
[0.17,0.20) o
[0.20,0.22) o
[0.22,0.25) ooo
```
## ActiveRecord edge without fairness patch, Ruby 1.9.2-p290
```
Again, many "execution expired" errors.
```
## ActiveRecord edge with fairness patch, Ruby 1.9.2-p290
```
$ bundle exec ruby ar_test_mri.rb # ruby 1.9.2-p290
ActiveRecord::VERSION::STRING 4.0.0.beta
10000 reqs in 21.147658059 = 472.86559921202286 req/s
min/max/mean 0.01/0.12/0.07
histogram
[0.00,0.03) o
[0.03,0.05) o
[0.05,0.08) ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
[0.08,0.10) o
[0.10,0.12) o
[0.12,0.15)
[0.15,0.17)
[0.17,0.20)
[0.20,0.22)
[0.22,0.25)
```
So in both Ruby 1.9.2 and JRuby, ActiveRecord 3.2.3 has perhaps a better median but more variation with a few requests taking significantly longer than the others.
I'm only including ar_test_mri.rb here. The ar_test.rb script has only had cosmetic changes and is available in this other pull request: https://github.com/rails/rails/pull/6398
```
require 'active_record'
require 'thread'
SHUTDOWN = Object.new.freeze # used to shutdown threads in pool
config = {
'adapter' => 'sqlite3',
'database' => ':memory:'
}
ActiveRecord::Base.establish_connection(config)
ActiveRecord::Base.clear_active_connections!
times = []
times.extend(MonitorMixin)
work = Queue.new
thread_pool_size = 32
nrequests = 10000
threads = (1..thread_pool_size).map do |i|
Thread.new do
while work.pop != SHUTDOWN
t0 = Time.now
begin
ActiveRecord::Base.connection_pool.connection
sleep 0.010
ActiveRecord::Base.clear_active_connections!
rescue => err
puts "#{err.message} #{err.backtrace.join("\n\t")}"
ensure
times.synchronize { times << (Time.now - t0) }
end
end
end
end
t0 = Time.now
nrequests.times { |i| work.push i }
thread_pool_size.times { work.push SHUTDOWN }
threads.each(&:join)
elapsed = Time.now - t0
puts "ActiveRecord::VERSION::STRING #{ActiveRecord::VERSION::STRING}"
puts "#{nrequests} reqs in #{elapsed} = #{nrequests/elapsed} req/s"
puts "min/max/mean %0.2f/%0.2f/%0.2f" % [times.min, times.max, times.reduce(&:+)/times.size]
def clamp(n, min, max)
[[n, min].max, max].min
end
def hist(values, nbuckets = 10, min = values.min, max = values.max)
step = (max - min) / nbuckets.to_f
buckets = Array.new(nbuckets, 0)
values.each do |v|
i = clamp(((v - min) / step).floor, 0, nbuckets-1)
buckets[i] += 1
end
start = 0
buckets.each do |count|
print "[%0.2f,%0.2f) " % [start, start + step]
ntics = (count.to_f / values.size * 60).ceil # 60 screen cols
puts 'o' * ntics
start += step
end
end
puts 'histogram'
hist(times, 10, 0.00, 0.25)
```