Message 289394 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	elliot.gorokhovsky
Recipients	elliot.gorokhovsky, mdk, serhiy.storchaka, tim.peters, vstinner
Date	2017-03-10.17:19:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CANZJz4iQp1=WU4ERn8HQTvK_HGNrVCZPhfPP0cONnWwqySx+HA@mail.gmail.com>
In-reply-to	<1489130764.0.0.155408111751.issue28685@psf.upfronthosting.co.za>

Content
On Fri, Mar 10, 2017 at 12:26 AM Serhiy Storchaka <report@bugs.python.org> wrote: > > Serhiy Storchaka added the comment: > > The issue shouldn't be closed until it resolved or rejected. > Ya, sorry about that. This is my first time contributing. > > I like the idea, and benchmarking results for randomized lists look nice. > But could you please run benchmarks for already sorted lists? > David Mertz asked for the same thing on python-ideas. Here's what I replied (you can also find these numbers in my pull request description): *** You are entirely correct, as the benchmarks below demonstrate. I used the benchmark lists from Objects/listsort.txt, which are: \sort: descending data /sort: ascending data 3sort: ascending, then 3 random exchanges +sort: ascending, then 10 random at the end %sort: ascending, then randomly replace 1% of elements w/ random values ~sort: many duplicates =sort: all equal My results are below (the script can be found at https://github.com/embg/python-fastsort-benchmark/blob/master/bench-structured.py ): Homogeneous ([int]): \sort: 54.6% /sort: 56.5% 3sort: 53.5% +sort: 55.3% %sort: 52.4% ~sort: 48.0% =sort: 45.2% Heterogeneous ([int]n + [0.0]): \sort: -17.2% /sort: -19.8% 3sort: -18.0% +sort: -18.8% %sort: -10.0% ~sort: -2.1% =sort: -21.3% As you can see, because there's a lot less non-comparison overhead in the structured lists, the impact of the optimization is much greater, both in performance gain and in worst-case cost. However, I would argue that these data do not invalidate the utility of my patch: the probability of encountering a type-heterogeneous list is certainly less than 5% in practice. So the expected savings, even for structured lists, is something like (5%)(-20%) + (95%)(50%) = 46.5%. And, of course, not all* the lists one encounters in practice are structured; certainly not this structured. So, overall, I would say the numbers above are extremely encouraging. Thanks for pointing out the need for this benchmark, though! *** Thanks for the feedback!

On Fri, Mar 10, 2017 at 12:26 AM Serhiy Storchaka <report@bugs.python.org>
wrote:

>
> Serhiy Storchaka added the comment:
>
> The issue shouldn't be closed until it resolved or rejected.
>

Ya, sorry about that. This is my first time contributing.

>
>
I like the idea, and benchmarking results for randomized lists look nice.
> But could you please run benchmarks for already sorted lists?
>

David Mertz asked for the same thing on python-ideas. Here's what I replied
(you can also find these numbers in my pull request description):

***
You are entirely correct, as the benchmarks below demonstrate. I used the
benchmark lists from Objects/listsort.txt, which are:

      \sort: descending data
      /sort: ascending data
      3sort: ascending, then 3 random exchanges
      +sort: ascending, then 10 random at the end
      %sort: ascending, then randomly replace 1% of elements w/ random
values
      ~sort: many duplicates
      =sort: all equal

My results are below (the script can be found at
https://github.com/embg/python-fastsort-benchmark/blob/master/bench-structured.py
):
Homogeneous ([int]):
\sort: 54.6%
/sort: 56.5%
3sort: 53.5%
+sort: 55.3%
%sort: 52.4%
~sort: 48.0%
=sort: 45.2%

Heterogeneous ([int]*n + [0.0]):
\sort: -17.2%
/sort: -19.8%
3sort: -18.0%
+sort: -18.8%
%sort: -10.0%
~sort: -2.1%
=sort: -21.3%

As you can see, because there's a lot less non-comparison overhead in the
structured lists, the impact of the optimization is much greater, both in
performance gain and in worst-case cost. However, I would argue that these
data do not invalidate the utility of my patch: the probability of
encountering a type-heterogeneous list is certainly less than 5% in
practice. So the expected savings, even for structured lists, is something
like (5%)(-20%) + (95%)(50%) = 46.5%. And, of course, not *all* the lists
one encounters in practice are structured; certainly not *this* structured.
So, overall, I would say the numbers above are extremely encouraging.
Thanks for pointing out the need for this benchmark, though!
***

Thanks for the feedback!

History
Date	User	Action	Args
2017-03-10 17:19:55	elliot.gorokhovsky	set	recipients: + elliot.gorokhovsky, tim.peters, vstinner, serhiy.storchaka, mdk
2017-03-10 17:19:55	elliot.gorokhovsky	link	issue28685 messages
2017-03-10 17:19:54	elliot.gorokhovsky	create