Sort_by is not stable?

Dobai-Pataky_BSSSSl · November 7, 2010, 11:22am

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

_md

micheldogger · November 7, 2010, 11:48am

On Sun, Nov 7, 2010 at 12:22 PM, Michel D. [email protected]
wrote:

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

Without code, I have to guess. If I understood your issue correctly,
then you need to specify both fields in sort_by block:

a = [[“g”, 3], [“c”, 3], [“a”, 3], [“f”, 3], [“e”, 2], [“d”, 3], [“b”, 1]]
=> [[“g”, 3], [“c”, 3], [“a”, 3], [“f”, 3], [“e”, 2], [“d”, 3], [“b”,
1]]
a.sort_by {|e| [e.last, e.first]}
=> [[“b”, 1], [“e”, 2], [“a”, 3], [“c”, 3], [“d”, 3], [“f”, 3], [“g”,
3]]

Regards,
Ammar

micheldogger · November 7, 2010, 12:15pm

Ammar A. wrote in post #959889:

If I understood your issue correctly,
Yes, it is exactly that. Your example shows it :

a = [[“g”, 3], [“c”, 3], [“a”, 3], [“f”, 3], [“e”, 2], [“d”, 3], [“b”,
1]]
b = a.sort_by{ |letter, number| letter}

= > [[“a”, 3], [“b”, 1], [“c”, 3], [“d”, 3], [“e”, 2], [“f”, 3], [“g”,

3]]
c = b.sort_by{ |letter, number| number}

= > [[“b”, 1], [“e”, 2], [“d”, 3], [“c”, 3], [“a”, 3], [“f”, 3], [“g”,

3]]

Without code, I have to guess
The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).
_md

micheldogger · November 7, 2010, 1:09pm

On 07.11.2010 12:39, Ammar A. wrote:

3]]
Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby’s qsort is naive.
AFAIK, the majority of qsort implementations are.

IIRC quicksort is instable by definition. There may be stable variants
though.

In this case it’s not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don’t want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

robert

micheldogger · November 7, 2010, 12:39pm

On Sun, Nov 7, 2010 at 1:15 PM, Michel D. [email protected]
wrote:

c = b.sort_by{ |letter, number| number}

= > [[“b”, 1], [“e”, 2], [“d”, 3], [“c”, 3], [“a”, 3], [“f”, 3], [“g”,

3]]

Without code, I have to guess
The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby’s qsort is naive.
AFAIK, the majority of qsort implementations are.

Maybe this is a question/request for the core mailing list.

Regards,
Ammar

micheldogger · November 7, 2010, 2:08pm

Phillip G. wrote in post #959912:

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
[“a”,3] and [“b”,3] aren’t equal as far as a computer is concerned,
resulting in the OP’s problem.

@Phillip : I am not sure I agree. When sorting on the second half, the
couple ([“a”, 3], [“b”, 3]) falls under the “equal” case of the choice.

Robert K. wrote :

In this case it’s not really needed IMHO

@Robert : yes in this precise case with two fields, you can go around.
But in general, if you work with many fields and have to sort by one of
them, you should not have to remember the previous sorts and specify all
the unconcerned fields.
_md

micheldogger · November 7, 2010, 3:10pm

On Sun, Nov 7, 2010 at 2:08 PM, Michel D. [email protected]
wrote:

Phillip G. wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple ([“a”, 3], [“b”, 3]) falls under the “equal” case of the choice.

“a” != “b” results in true, thus the couplet is not equal.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

micheldogger · November 7, 2010, 4:10pm

On 07.11.2010 15:10, Phillip G. wrote:

On Sun, Nov 7, 2010 at 2:08 PM, Michel D.[email protected] wrote:

Phillip G. wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple ([“a”, 3], [“b”, 3]) falls under the “equal” case of the choice.

“a” != “b” results in true, thus the couplet is not equal.

No, Michael is right. Every sorting algorithm only ever looks at sort
keys. So since our sort key is the second array member (the number)
both mentioned records are equivalent. If you include the other field a
stable sort isn’t really interesting here since then all fields are part
of the sort key - stable and instable algorithms will return the same
then.

Kind regards

robert

micheldogger · November 7, 2010, 6:13pm

Robert K. wrote in post #959935:

No, Michael is right. Every sorting algorithm only ever looks at sort
keys. So since our sort key is the second array member (the number)
both mentioned records are equivalent.

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot). Even the parallel version is stable.
In NESL :

Which means that ruby’s is badly coded.
_md

micheldogger · November 7, 2010, 1:18pm

Please, people, trim your quotes.

On Sun, Nov 7, 2010 at 1:07 PM, Robert K.
[email protected] wrote:

IIRC quicksort is instable by definition. There may be stable variants
though.

It’s only unstable if the implementation is inefficient (says
Wikipedia, so apply your truthiness filters).

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
[“a”,3] and [“b”,3] aren’t equal as far as a computer is concerned,
resulting in the OP’s problem.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

micheldogger · November 7, 2010, 7:20pm

On Nov 7, 4:22am, Michel D. [email protected] wrote:

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

I’ve noticed that it’s unstable under 1.8.7:

%w(foo bar car war raw rat cat or bat is).sort.
sort_by{|s| s.size}
==>[“is”, “or”, “bar”, “cat”, “foo”, “car”, “bat”, “rat”, “raw”,
“war”]
VERSION
==>“1.8.7”

micheldogger · November 7, 2010, 7:18pm

On Sun, Nov 7, 2010 at 8:07 PM, Robert K.
[email protected]wrote:

x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards
   robert
–
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From a practical viewpoint, stable sort should be implemented as stable
sort.
Not a multiple keys sort… or something else.

micheldogger · November 7, 2010, 7:35pm

Good Morning,

On Sun, Nov 7, 2010 at 9:13 AM, Michel D. [email protected]
wrote:

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot).

…

Which means that ruby’s is badly coded.

The rest of the sane world begs to differ.

Sorting algorithm - Wikipedia - “…Efficient
implementations of quicksort (with in-place partitioning) are typically
unstable sorts…”

sort - perl pragma to control sort() behaviour - Perldoc Browser - “…Mergesort is stable, quicksort
is
not…”

http://www.codecogs.com/reference/c/stdlib.h/qsort.php - “…The
algorithms
implemented by qsort, qsort_r and heapsort are not stable…”

Linux Man page for qsort - “…If two members compare as equal, their
order
in the sorted array is undefined…”

Python in all fairness says they have stable sorts - but they really
make no
mention of the algorithm used so I’m unclear as to whether they are
using
quicksort at all.

I’d rather have the efficient implementation that one can easily make
stable
by using the secondary comparison (as you can easily do in this case)
than
an inefficient version that really accomplishes nothing that the
efficient
version cannot.

John

P.S. There is nothing in the world stopping anyone from submitting a
patch
to add a stable sort option to Ruby. But to pronounce that the hard work
of
others is badly coded when it clearly conforms to the normal standard of
multiple mainstream languages is just plain ignorant at best.

micheldogger · November 7, 2010, 8:55pm

John W Higgins wrote in post #959968:

… just plain ignorant at best.

Message received.
_md

micheldogger · November 7, 2010, 10:00pm

On Nov 7, 12:34pm, John W. Higgins [email protected] wrote:

I’d rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

I’d rather have a stable and efficient sort. If Python can do it,
then there’s no excuse for Ruby.

micheldogger · November 7, 2010, 8:58pm

In fact, python uses Timsort. Java SE and Android too.
(wikipedia::Timsort http://en.wikipedia.org/wiki/Timsort)
It’s very efficient, like quicksort.
(btw, quicksort isn’t most efficient compare-based sort. see
wikipedia::introsort. http://en.wikipedia.org/wiki/Introsort)

micheldogger · November 7, 2010, 10:38pm

Good Afternoon

On Sun, Nov 7, 2010 at 1:00 PM, w_a_x_man [email protected] wrote:

I’d rather have a stable and efficient sort. If Python can do it,
then there’s no excuse for Ruby.

First, there are two issues here - one is whether or not to use
quicksort as
the default sort and then whether or not there is an expectation as to
whether or not quicksort should be stable.

The issue I took was the statement that Ruby’s quicksort was “badly
coded”.
This is not a correct statement given the many other instances of
quicksort
that follow the same principal as Ruby’s version in that it’s not
stable. I
think Ruby is just fine with respect to its quicksort implementation.

As to the concept of whether or not another algorithm should be used -
it’s
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the
comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to
work
with and still went with an unstable sort. It’s sparse on details and
I’m
not trying to put words in his mouth either.

Again, anyone is free to put a patch on the table and really start a
good
conversation as to how it would fit in - and quite possibly a
stable_sort
method would be an outstanding option here. But there just is a huge
difference between someone coming to the table patch in hand to begin a
conversation - and just saying Ruby is “badly coded”. This is a sort
algorithm - it shouldn’t be rocket science for someone to create a
different
algorithm patch if they wanted to. There are plenty of C reference
implementations available as starting points.

John

micheldogger · November 8, 2010, 10:31am

Robert K.:

You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don’t want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Or use #nonzero? and #<=> chaining like this:

x.sort { |(la, da), (lb, db)| (da <=> db).nonzero? or la <=> lb }

— Piotr S.

micheldogger · November 19, 2010, 7:43am

On Sun, Nov 7, 2010 at 3:37 PM, John W Higgins [email protected]
wrote:

As to the concept of whether or not another algorithm should be used - it’s
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It’s sparse on details and I’m
not trying to put words in his mouth either.

I can clarify.

JRuby used to just use the JVM-provided mergesort, which was stable
but markedly slower than Ruby 1.8.6/7’s unstable quicksort. Because we
got a couple reports about sort being slower, we opted to do the
“wrong thing” and have a contributor implement a hybrid sort that
performed better than Ruby 1.8.6/7 but was unstable.

About a month later, someone raised the issue about MRI not having a
stable sort, and it was decided to make it stable. AARGH. I have not
kept up with discussions, however, and I believe sorts are still
unstable in 1.9.2, correct?

At this point I almost want there to be both options, since our
contributor (qbproger … sorry man, your name escapes me at the
moment!) went to a lot of effort to get our current sort running well.

In any case, it wasn’t an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I’ve taken a good
ten years off my life due to the related stress

Charlie

micheldogger · November 19, 2010, 9:25am

Charles Nutter wrote in post #962516:

In any case, it wasn’t an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I’ve taken a good
ten years off my life due to the related stress

Just to come back to my initial post : it was not about sort, but about
sort_by.

For sort_by, you need to compute the keys, and sort the indices
according to the keys. It is not in-place sorting. Yes, optimal in-place
quicksort is unstable. But for sort_by, one could use a stable version.

Am I wrong ?
_md