Hash key performance

I am just wondering why different hash keys result in such a
significantly different performance.

I have written 11 simple test cases, all using the following template:

num=ARGV[0].to_i

var=:key

a={:key => “test”}

num.times {
b=a[:key][0]
}

Beside this test_symbol.rb variant, I replace :key in the example
above with false (test_false.rb), an integer value 1 (test_int.rb),
with nil (test_nil.rb), with “id” (test_string.rb). In six other
variants I replace the key with var, and set var to a symbol
(test_var_symbol.rb), false (test_var_false.rb), an integer value
(test_var_int.rb), nil (test_var_nil.rb), “id” (test_var_string.rb),
and “id”.frozen (test_var_frozen.rb).

This is my performance chart using time (dual-core CPU, called with
num=30_000_000). I ran it once (dropped timing values), then three
consecutive times and added the times (which is the sort key as well):

test_int.rb 37.029
test_var_int.rb 37.659
test_symbol.rb 37.773
test_var_symbol.rb 37.785
test_var_frozen.rb 37.852
test_var_string.rb 40.515
test_var_false.rb 45.468
test_false.rb 45.687
test_var_nil.rb 45.894
test_nil.rb 46.068
test_string.rb 53.032

No surprise is that repeated lookup of a string as a key results in
the worst performance, and using a Fixnum gives best performance.

More interesting is a bulk of tests that relate to using symbols and
variables for int, symbol and frozen string, all within timing
accuracy. A noticable, but still small gap can be seen for a
nonfrozen string in a variable.

But here it comes: Using false and, even worse, nil (in either
literal or variable form) is way slower (a good 20%) than using
integers and symbols, even if they follow the same “immutable
object” philosophy of fixed object_ids.

Why is that? What’s so special about false and nil to have such a
degraded performance as a hash key?

  • Matthias

Matthias Wächter wrote:

But here it comes: Using false and, even worse, nil (in either
literal or variable form) is way slower (a good 20%) than using
integers and symbols, even if they follow the same “immutable
object” philosophy of fixed object_ids.

Why is that? What’s so special about false and nil to have such a
degraded performance as a hash key?

hash.c:rb_any_hash is the answer to your question (funcall(‘hash’) is
discarded only for Fixnums, Symbols and Strings. Note that String
numbers would be way better if it’s hash was cached until first
destructive operation.

lopex

On 02.10.2007 16:55, Marcin Mielżyński wrote:

hash.c:rb_any_hash is the answer to your question (funcall(‘hash’) is
discarded only for Fixnums, Symbols and Strings.

So the first two cases of “switch (TYPE(a))” should be enhanced with
checks for nil, true and false like this:

switch (TYPE(a)) {
  case T_FIXNUM:
  case T_SYMBOL:
  • case T_NIL:
    
  • case T_FALSE:
    
  • case T_TRUE:
    
    return (int)a;
    break;

Objections?

  • Matthias

On Oct 2, 2007, at 23:46 , Matthias Wächter wrote:

  • case T_NIL:
    
  • case T_FALSE:
    
  • case T_TRUE:
    
    return (int)a;
    break;

Send that and your original timings to ruby-core@ and/or file a bug
on rubyforge in the ruby project.