Hash order bug?

I have this piece of simple code:


def foo
return 5
end

a = {:gr => false, :und => false, :det => true, :sta => false, :inv =>
false}
puts a.inspect

when I execute it i get a totally unordered hash:


ruby pro.rb
{:inv=>false, :gr=>false, :und=>false, :det=>true, :sta=>false}

Now, i delete the foo function from the code (the foo function don’t do
nothing at all), and i get a well ordered hash:


ruby pro.rb
{:gr=>false, :und=>false, :det=>true, :sta=>false, :inv=>false}

What’s happening? all my code is behaving wrong because of that.

Hi –

On Tue, 25 Jul 2006, Javier V. wrote:


ruby pro.rb
{:gr=>false, :und=>false, :det=>true, :sta=>false, :inv=>false}

What’s happening? all my code is behaving wrong because of that.

Hashes are unordered. If you need an ordered collection, you’ll need
to use an array.

David

On 7/25/06, Javier V. [email protected] wrote:


when I execute it i get a totally unordered hash:

by definition a hash table is not ordered.

[…]

[email protected] wrote:


What’s happening? all my code is behaving wrong because of that.

Hashes are unordered. If you need an ordered collection, you’ll need
to use an array.

David

Oh my god, i didn’t know it, sorry. Is that a missing feature?

No, its just not a feature. You can check
http://rubyforge.org/projects/arrayfields if you want a ordered hash.

j`ey
http://www.eachmapinject.com

Hi –

On Tue, 25 Jul 2006, Javier V. wrote:

return 5
ruby pro.rb

Oh my god, i didn’t know it, sorry. Is that a missing feature?

Debate rages on this point :slight_smile: It’s mostly a speed issue, I think.
At least my memory is that Matz’s latest statement was to the effect
that he would do it if it could be done efficiently.

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I’m
afraid I don’t remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It’s a case where the feature would be purely additive.

David

Nicolas Desprès wrote:

false}
puts a.inspect

when I execute it i get a totally unordered hash:

by definition a hash table is not ordered.

[…]

Well, i don’t mind ordering alphabetically, just to maintain their
construction order, like an array.
Anyway, thanks all for quick responses, hope to see that implemented in
core-ruby soon :slight_smile:

Thanks

[snip]

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I’m
afraid I don’t remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It’s a case where the feature would be purely additive.

I guess there won’t be many ppl using algorithms that use the unsorted
aspect of a Hash. Using Kernel#rand instead of Object#hash seems
better. printing a few hash-values suggest they’re not that random
anyway…

Other than that, would you sort the hash (or iterator) on its keys, its
values or insertion sequence?

NB: why did I read “addictive” at first pass? The question has been
raised
more often; and I admit I have overridden Hash#keys (and Hash each_pair,
iirc) to achieve a (partially) sorted effect… The problem is that you
can
not generally sort keys (specifically not symbols).

On 25/07/06, [email protected] [email protected] wrote:

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I’m
afraid I don’t remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It’s a case where the feature would be purely additive.

I think that would depend on whether {:a => 1, :b => 2} == {:b => 2,
:a => 1}. If that’s the same, everything should be fine. If it’s
different, I know that it would break a number of my tests, if nothing
else.

Paul.

On Wed, Jul 26, 2006 at 03:50:05AM +0900, William J. wrote:

What language has ordered hashes?

Java, in the form of the TreeMap collection. The usefulness of them is
limited though, As anecdotal evidence, I’ve used it twice in the past
four years.

K.

Javier V. wrote:

return 5
ruby pro.rb

Oh my god, i didn’t know it, sorry. Is that a missing feature?

No. What language has ordered hashes?

When you say
a = {:gr => false, :und => false, :det => true, :sta => false, :inv
=> false}
you are saying that you want to access the values in this fashion:
a[:und]
So the order in which they are stored is irrelevant.
If you want to access them this way
a[1]
then you set them up like this
[ false, false, true, false, false ]

If you want to get a sequence of values in a certain order:
a.values_at( :und, :det, :sta )

If you want to have your cake and eat it too, you could
use an association list:

as = [[:foo,22], [:bar,33], [:baz,44]]
=> [[:foo, 22], [:bar, 33], [:baz, 44]]

as.assoc(:bar)
=> [:bar, 33]

as.rassoc( 44 )
=> [:baz, 44]

as[0]
=> [:foo, 22]

If the list has a large number of elements, access will be slow.

Javier,

I don’t believe it’s a missing feature. In every language I’ve used,
Hashes
are not ordered. It has to do with how they are implemented under the
hood. If you really need an ordered Hash, you could create a class that
uses an Array to maintain the order of the keys, and store the values in
a
Hash.

/Doug

On 7/25/06, Javier V. - [email protected] <+ruby-

Kero wrote:

Weirich, Pat Eyler, and a fellow named Nicolas whose last name I’m
afraid I don’t remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It’s a case where the feature would be purely additive.

I guess there won’t be many ppl using algorithms that use the unsorted
aspect of a Hash.
It’s not that it’s unordered, it’s that it’s anordered. The concept
doesn’t apply. It’s a bucket, not a queue. There are many, many
applications that don’t care about entry order.

Using Kernel#rand instead of Object#hash seems
better. printing a few hash-values suggest they’re not that random anyway…
The order is unpredictable and implementation dependent.

No. What language has ordered hashes?

PHP (shudder). Since PHP only has a single array type that supports
both sequential and keyed accessing any string keys remain in order.
It can be handy as PHP’s array construction is needlessly verbose, for
example in Ruby I’d do this:

[[‘item1’, 1],[‘item2’,2]]

In PHP it’d look like:
array(array(‘item1’, 1), array(‘item2’, 2));
or
array(‘item1’=>1, ‘item2’=>2);

Which is easier to read?

On 7/25/06, Keith G. [email protected] wrote:

On Wed, Jul 26, 2006 at 03:50:05AM +0900, William J. wrote:

What language has ordered hashes?

Java, in the form of the TreeMap collection. The usefulness of them is
limited though, As anecdotal evidence, I’ve used it twice in the past
four years.

And even in the TreeMap case, you sacrifice performance for features. An
unordered hash will be the fastest for lookup and insertion. Adding any
additional features will slow it down out of the necessity of imposing
an
order on either keys or values. With TreeMap, I think it’s something
like
O(log n) lookup times rather than the theoretical O(1) with a straight
hash,
and that’s not counting any tree rebalancing needed after future
insertions.
At minimum, you’ll double up memory storing an array and a hash
together, or
you’ll sacrifice performance maintaining and traversing a balanced tree.

Leave hash alone; custom data structures are fine when the peculiar
beast
known as an “ordered hash” is actually useful.

On Wed, Jul 26, 2006 at 05:03:50AM +0900, Charles O Nutter wrote:

And even in the TreeMap case, you sacrifice performance for features.
Very true. While a peek at the source code will reveal that the
Red-Black tree it uses is well-implemented, nothing comes for free.

An unordered hash will be the fastest for lookup and insertion.

That depends, though it is generally true. However, your
datastructures should suit the problem domain and the constraints you’re
working under.

Adding any
additional features will slow it down out of the necessity of imposing an
order on either keys or values. With TreeMap, I think it’s something like
O(log n) lookup times rather than the theoretical O(1) with a straight hash,
and that’s not counting any tree rebalancing needed after future insertions.

Theoretically. However, while looking at speed through the lens of time
complexity can be a good rule of thumb, it’s sometimes somewhat
misleading.

For instance, say you had a dataset that you had to do lookups upon, and
you’d two choices as to the method to use for lookups. The first is a
linear scan of the unordered data, and the other is some keyed lookup.
The former would give you O(n) performance, while the latter would give
O(1) performance.

If you didn’t have any more information, you’d be forgiven for choosing
the latter method, but say scanning each element takes only .01ms,
whereas the calculation required in the latter method takes 2s per
element, it’s a no-brainer to pick the the O(n) method over the O(1)
method if your dataset is of the right size.

In the case of hashes, you only get O(1) if (a) the hash can be
calculated in constant time and (b) you’re using a perfect hash
function. Strictly speaking, just as Quicksort’s performance is O(n^2),
insertion and lookup for hashtables is O(n+m) where the hash degrades to
something resembling a bunch of keyed linked lists and n is the mean
time to convert the hash key to an index, and m is the mean time
required to do a linear scan of the appropriate linked list. But such
cases are rare, but inherent in the way hashes are implemented.

Big-O notation is only meant as a guide to algorithm choice and is no
substitute for and understanding of the tradeoffs involved with each
potential algorithm and how they actually perform when used with your
datasets.

Trees are good when you (a) want to guarantee how long it will take to
add,
lookup, or remove an element or (b) need the keys to be ordered by some
property. Most of the time, we don’t need these properties.

At minimum, you’ll double up memory storing an array and a hash together, or
you’ll sacrifice performance maintaining and traversing a balanced tree.

Leave hash alone; custom data structures are fine when the peculiar beast
known as an “ordered hash” is actually useful.

Agreed. There’s no sense in adding additional complexity to the language
to cope with something like that which most of us only use rarely, if
ever. That is, after all, what libraries are for.

K.

On Wed, Jul 26, 2006 at 05:41:49AM +0900, Phillip H. wrote:

array(array(‘item1’, 1), array(‘item2’, 2));
or
array(‘item1’=>1, ‘item2’=>2);

Which is easier to read?

That’s not really fair on either Ruby or PHP. You’re taking a
syntactical element and using it to bash an unrelated part of the
language. If PHP was to introduce the use of square brackets as
syntactic sugar for array(), this argument would go away in a puff of
smoke.

K.

P.S. My .sig generator seems to think I’m talking about ASP.NET :slight_smile:

That’s not really fair on either Ruby or PHP. You’re taking a
syntactical element and using it to bash an unrelated part of the
language. If PHP was to introduce the use of square brackets as
syntactic sugar for array(), this argument would go away in a puff of
smoke.

Disclaimer: My day job is working with a large PHP webapp. Maybe I’m
just jaded.

I believe it’s fair. It’s one of my reasons for liking Ruby (and
Python, and JavaScript for that matter). Arrays and hashes are one of
the most widely used structures in my development, and to have such a
clumsy constructor is a pain.

However, there are much deeper issues in PHP’s implementation. There
are a few array operations (eg array_merge) that will preserve key
mappings on string keys, but not integer keys. Guess what happens if
you want a hash with integer keys?

On 2006-07-25, Nicolas Desprès [email protected] wrote:

puts a.inspect

when I execute it i get a totally unordered hash:

by definition a hash table is not ordered.

Indeed. Here’s one reference:

I’m not sure I like the idea (discussed elsewhere in this thread) that
the
Hash class may be changed in the future to be ordered.

Paul B. wrote:

I think that would depend on whether {:a => 1, :b => 2} == {:b => 2,
:a => 1}. If that’s the same, everything should be fine. If it’s
different, I know that it would break a number of my tests, if nothing
else.

In previous discussions, Matz said these would still be equal.

That suits me: I could still do h1.to_a == h2.to_a if needed.

Hal