I am just wondering why different hash keys result in such a
significantly different performance.
I have written 11 simple test cases, all using the following template:
num=ARGV[0].to_i
var=:key
a={:key => “test”}
num.times {
b=a[:key][0]
}
Beside this test_symbol.rb variant, I replace :key in the example
above with false (test_false.rb), an integer value 1 (test_int.rb),
with nil (test_nil.rb), with “id” (test_string.rb). In six other
variants I replace the key with var, and set var to a symbol
(test_var_symbol.rb), false (test_var_false.rb), an integer value
(test_var_int.rb), nil (test_var_nil.rb), “id” (test_var_string.rb),
and “id”.frozen (test_var_frozen.rb).
This is my performance chart using time (dual-core CPU, called with
num=30_000_000). I ran it once (dropped timing values), then three
consecutive times and added the times (which is the sort key as well):
test_int.rb 37.029
test_var_int.rb 37.659
test_symbol.rb 37.773
test_var_symbol.rb 37.785
test_var_frozen.rb 37.852
test_var_string.rb 40.515
test_var_false.rb 45.468
test_false.rb 45.687
test_var_nil.rb 45.894
test_nil.rb 46.068
test_string.rb 53.032
No surprise is that repeated lookup of a string as a key results in
the worst performance, and using a Fixnum gives best performance.
More interesting is a bulk of tests that relate to using symbols and
variables for int, symbol and frozen string, all within timing
accuracy. A noticable, but still small gap can be seen for a
nonfrozen string in a variable.
But here it comes: Using false and, even worse, nil (in either
literal or variable form) is way slower (a good 20%) than using
integers and symbols, even if they follow the same “immutable
object” philosophy of fixed object_ids.
Why is that? What’s so special about false and nil to have such a
degraded performance as a hash key?
- Matthias