This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author elliot.gorokhovsky
Recipients elliot.gorokhovsky, mdk, serhiy.storchaka, tim.peters, vstinner
Date 2017-03-10.17:19:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CANZJz4iQp1=WU4ERn8HQTvK_HGNrVCZPhfPP0cONnWwqySx+HA@mail.gmail.com>
In-reply-to <1489130764.0.0.155408111751.issue28685@psf.upfronthosting.co.za>
Content
On Fri, Mar 10, 2017 at 12:26 AM Serhiy Storchaka <report@bugs.python.org>
wrote:

>
> Serhiy Storchaka added the comment:
>
> The issue shouldn't be closed until it resolved or rejected.
>

Ya, sorry about that. This is my first time contributing.

>
>
I like the idea, and benchmarking results for randomized lists look nice.
> But could you please run benchmarks for already sorted lists?
>

David Mertz asked for the same thing on python-ideas. Here's what I replied
(you can also find these numbers in my pull request description):

***
You are entirely correct, as the benchmarks below demonstrate. I used the
benchmark lists from Objects/listsort.txt, which are:

      \sort: descending data
      /sort: ascending data
      3sort: ascending, then 3 random exchanges
      +sort: ascending, then 10 random at the end
      %sort: ascending, then randomly replace 1% of elements w/ random
values
      ~sort: many duplicates
      =sort: all equal

My results are below (the script can be found at
https://github.com/embg/python-fastsort-benchmark/blob/master/bench-structured.py
):
Homogeneous ([int]):
\sort: 54.6%
/sort: 56.5%
3sort: 53.5%
+sort: 55.3%
%sort: 52.4%
~sort: 48.0%
=sort: 45.2%

Heterogeneous ([int]*n + [0.0]):
\sort: -17.2%
/sort: -19.8%
3sort: -18.0%
+sort: -18.8%
%sort: -10.0%
~sort: -2.1%
=sort: -21.3%

As you can see, because there's a lot less non-comparison overhead in the
structured lists, the impact of the optimization is much greater, both in
performance gain and in worst-case cost. However, I would argue that these
data do not invalidate the utility of my patch: the probability of
encountering a type-heterogeneous list is certainly less than 5% in
practice. So the expected savings, even for structured lists, is something
like (5%)(-20%) + (95%)(50%) = 46.5%. And, of course, not *all* the lists
one encounters in practice are structured; certainly not *this* structured.
So, overall, I would say the numbers above are extremely encouraging.
Thanks for pointing out the need for this benchmark, though!
***

Thanks for the feedback!
History
Date User Action Args
2017-03-10 17:19:55elliot.gorokhovskysetrecipients: + elliot.gorokhovsky, tim.peters, vstinner, serhiy.storchaka, mdk
2017-03-10 17:19:55elliot.gorokhovskylinkissue28685 messages
2017-03-10 17:19:54elliot.gorokhovskycreate