This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: list.sort() should do quick exit if len(list) <= 1
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Sergey.Kirpichev, benjamin.peterson, mark.dickinson, matrixise, paul.moore, r.david.murray, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2015-04-29 13:30 by Sergey.Kirpichev, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
trivial-sorting-py3.patch Sergey.Kirpichev, 2015-04-29 13:30 patch for python 3.[2-6]
0001-list.sort-Add-quick-exit-if-length-of-list-1.patch Sergey.Kirpichev, 2015-04-29 20:27 updated patch (with test) review
Messages (16)
msg242222 - (view) Author: Sergey B Kirpichev (Sergey.Kirpichev) * Date: 2015-04-29 13:30
If there is nothing to sort (i.e. one item), why call key function at all?

In my practical situation, simplest key() function will lead to recursion in case of such trivial lists.  I can make similar cmp-type function (i.e. mycmp=lambda a, b: cmp(key(a), key(b))) and then wrap this with cmp_to_key.  But that looks silly.

Simple test case:
$ cat a.py
a = [1]

def spam(x):
    raise Exception

a.sort(key=spam)
print(a)
$ python a.py 
Traceback (most recent call last):
  File "a.py", line 6, in <module>
    a.sort(key=spam)
  File "a.py", line 4, in spam
    raise Exception
Exception
msg242223 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2015-04-29 13:47
The patch is ok for me,
msg242226 - (view) Author: Sergey B Kirpichev (Sergey.Kirpichev) * Date: 2015-04-29 14:23
should I add a regression test?

If so, where?  ./Lib/test/test_sort.py?
msg242227 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2015-04-29 14:31
Yep, add a regression test.
msg242228 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-04-29 14:32
Why does your key function depend on the size of the list? That seems like the root of the problem. Considering calling the key function is observable behavior, I don't think this should be changed. The patch makes behavior list.sort() inconsistent.
msg242230 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-04-29 14:54
FWIW, I don't think this is worth special casing.
msg242231 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-04-29 14:55
> Considering calling the key function is observable behavior, I don't think this should be changed.

+1
msg242232 - (view) Author: Sergey B Kirpichev (Sergey.Kirpichev) * Date: 2015-04-29 15:03
On Wed, Apr 29, 2015 at 02:32:34PM +0000, Benjamin Peterson wrote:
> Why does your key function depend on the size of the list?

Because it's a real life.  Here is the code:
https://github.com/skirpichev/omg/blob/gruntz-use-subs/sympy/series/gruntz.py#L337

Algorithm is recursive and computation of MRV set will be finished only
if I add an exceptional case for len(Omega).  Or even more ugly solution
with cmp-style function:
https://github.com/skirpichev/omg/blob/efc70377639f78fc0f246629989056cb80a71923/sympy/series/gruntz.py#L339

> Considering calling the key function is observable behavior, I don't think this
> should be changed. The patch makes behavior list.sort() inconsistent.

Yes, in some sense.

On another hand, it's reasonable to believe that key function will be
called iff we need one.  But if there is nothing to sort at all - why do
sorting, why you want to call key function?  Looks irrational to me.

Last by not least.  Why the return value of the key function *must*
be defined in this case?  Just a hidden, undocumented restriction and
has no practical value.

BTW, why this issue was closed?
msg242233 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-04-29 15:25
On Wed, Apr 29, 2015, at 11:03, Sergey B Kirpichev wrote:
> 
> Sergey B Kirpichev added the comment:
> 
> On Wed, Apr 29, 2015 at 02:32:34PM +0000, Benjamin Peterson wrote:
> > Why does your key function depend on the size of the list?
> 
> Because it's a real life.  Here is the code:
> https://github.com/skirpichev/omg/blob/gruntz-use-subs/sympy/series/gruntz.py#L337
> 
> Algorithm is recursive and computation of MRV set will be finished only
> if I add an exceptional case for len(Omega).  Or even more ugly solution
> with cmp-style function:
> https://github.com/skirpichev/omg/blob/efc70377639f78fc0f246629989056cb80a71923/sympy/series/gruntz.py#L339

So, basically you need a base case for recursion? What's wrong with
explicitly writing that out?

> 
> > Considering calling the key function is observable behavior, I don't think this
> > should be changed. The patch makes behavior list.sort() inconsistent.
> 
> Yes, in some sense.
> 
> On another hand, it's reasonable to believe that key function will be
> called iff we need one.  But if there is nothing to sort at all - why do
> sorting, why you want to call key function?  Looks irrational to me.
> 
> Last by not least.  Why the return value of the key function *must*
> be defined in this case?  Just a hidden, undocumented restriction and
> has no practical value.

It's practical if you have a broken key function and test it with a one
element list.

> 
> BTW, why this issue was closed?

3 of us agreed this doesn't seem like a suitable change.
msg242236 - (view) Author: Sergey B Kirpichev (Sergey.Kirpichev) * Date: 2015-04-29 17:25
On Wed, Apr 29, 2015 at 03:25:19PM +0000, Benjamin Peterson wrote:
> So, basically you need a base case for recursion? What's wrong with
> explicitly writing that out?

Because it's complex (and costly).  This is not a trivial test and
I don't see reasons to fix that is not broken.  And it will be difficult
to explain for readers: remember, I need this exceptional case only in
the world with a strange Python's convention (Python try to sort a list
when it doesn't make sense).

Mathematical algorithm is not broken - programming language is.

Here is C:
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c;#l45
Here is Ruby:
https://github.com/ruby/ruby/blob/trunk/array.c#L2454

> It's practical if you have a broken key function and test it with a one
> element list.

It's silly to test key function on a single-element list *only*.

> > BTW, why this issue was closed?
> 
> 3 of us agreed this doesn't seem like a suitable change.

And 1 seems to be ok with patch.  Is this just a question of
number of votes?

At least, please consider this as a documentation issue.  That ...
feature may be obvious for a Python developer, but not for
mathematician (as well as ordinary Python user).

When key function value has no sense at all - it's not clear from
the documentation, that the key function will be called.
msg242237 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015-04-29 17:44
I think the documentation is fine:

"""
The key corresponding to each item in the list is calculated once and then used for the entire sorting process.
"""

This corresponds with the standard "decorate-sort-undecorate" approach to handling key functions in sorts. It's a common computer science technique, possibly not as familiar to a mathematician, but regardless, the docs clearly state that the key is calculated for each item.

Your existing code, with a check for Omega having length 1 and omitting the sort in that case, looks entirely reasonable to me. Maybe you could add a comment "Avoid a costly calculation of the key when length is 1, as we know we don't need to sort then" and leave it at that.
msg242239 - (view) Author: Sergey B Kirpichev (Sergey.Kirpichev) * Date: 2015-04-29 18:42
On Wed, Apr 29, 2015 at 05:44:22PM +0000, Paul Moore wrote:
> I think the documentation is fine:
> """
> The key corresponding to each item in the list is calculated once and then used for the entire sorting process.
> """

Does any "sorting process" make sense for [1] or []?!  No, it
isn't.  So, it's not clear if this "process" started at all.

This not just mine opinion - most computer languages
implement the quick exit in question (see examples above).

> It's a common computer science technique

Could you provide any language that avoid this optimization?

Here is Perl 5:
http://perl5.git.perl.org/perl.git/blob/HEAD:/pp_sort.c#l367

(third example)

> Your existing code, with a check for Omega having length 1 and omitting
> the sort in that case, looks entirely reasonable to me.

(Well, then I should look for other languages, if Python devs
insist in doing useless work...)

> Maybe you could add a comment "Avoid a costly calculation of the
> key when length is 1, as we know we don't need to sort then"

I sure, for most people - the idea of sorting list with one
element will look crazy.  There is no room for any "costly
calculations".  (Common sense, nothing more.)  So, such comment
will add more questions...
msg242240 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015-04-29 18:51
On 29 April 2015 at 19:42, Sergey B Kirpichev <report@bugs.python.org> wrote:
>> It's a common computer science technique
>
> Could you provide any language that avoid this optimization?
>
> Here is Perl 5:
> http://perl5.git.perl.org/perl.git/blob/HEAD:/pp_sort.c#l367
>
> (third example)

But that's a sort without a key. In Perl you do a key sort via:

@sorted = map  { $_->[0] }
          sort { $a->[1] <=> $b->[1] } # use numeric comparison
          map  { [$_, length($_)] }    # calculate the length of the string
               @unsorted;

(From http://en.wikipedia.org/wiki/Schwartzian_transform). That
computes the keys first, and would compute the key for a list of
length 1, just like Python does. It's just that Python bundles that
whole construct into the "key=" argument.

But it's your choice - if this is a big enough deal to put you off
Python, I guess no-one will be able to stop you. The fact of the
matter is that what Python does is documented behaviour, and the
benefit (small) isn't worth the cost of making a change (which would
only be in Python 3.5 and later anyway, as it's a backward
incompatible change, not a bug fix).
msg242245 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-04-29 19:40
On Wed, Apr 29, 2015, at 13:25, Sergey B Kirpichev wrote:
> 
> Sergey B Kirpichev added the comment:
> 
> On Wed, Apr 29, 2015 at 03:25:19PM +0000, Benjamin Peterson wrote:
> > So, basically you need a base case for recursion? What's wrong with
> > explicitly writing that out?
> 
> Because it's complex (and costly).  This is not a trivial test and
> I don't see reasons to fix that is not broken.  And it will be difficult
> to explain for readers: remember, I need this exceptional case only in
> the world with a strange Python's convention (Python try to sort a list
> when it doesn't make sense).
> 
> Mathematical algorithm is not broken - programming language is.
> 
> Here is C:
> https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c;#l45
> Here is Ruby:
> https://github.com/ruby/ruby/blob/trunk/array.c#L2454

I don't understand the analogy, since neither of these two have key
functions.

> 
> > It's practical if you have a broken key function and test it with a one
> > element list.
> 
> It's silly to test key function on a single-element list *only*.
> 
> > > BTW, why this issue was closed?
> > 
> > 3 of us agreed this doesn't seem like a suitable change.
> 
> And 1 seems to be ok with patch.  Is this just a question of
> number of votes?

I should also clarify that Raymond and Mark and responsible for
maintaining most of the algorithmic/data structure code in Python.

> 
> At least, please consider this as a documentation issue.  That ...
> feature may be obvious for a Python developer, but not for
> mathematician (as well as ordinary Python user).

This is probably impossible to prove either way.
msg242246 - (view) Author: Sergey B Kirpichev (Sergey.Kirpichev) * Date: 2015-04-29 20:25
On Wed, Apr 29, 2015 at 06:51:21PM +0000, Paul Moore wrote:
> But that's a sort without a key.

Why it does matter?  It have quick exit.  For same reasons - Python could...

> In Perl you do a key sort via:

That's just your implementation.  But we could add here a
quick exit as well.

> The fact of the matter is that what Python does is documented behaviour

No.  Unless you absolutely sure - all readers think that "sorting
process" starts even for trivial lists.  No reasons to believe in that
nonsense - as you could see from sorting implementations in other languages.

> benefit (small) isn't worth the cost of making a change (which would
> only be in Python 3.5 and later anyway

It's easy for users (i.e. me) to backport this feature (i.e. make wrapper for
sorted()).  Benefit is small, I admit, but why not remove unnecessary
restrictions from the language?  I hope, I did my best to explain why.
msg242258 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-04-30 03:31
> I should also clarify that Raymond and Mark and responsible for
maintaining most of the algorithmic/data structure code in Python.

Well, Raymond at least.  I plead not guilty; I think you're confusing me with someone else. :-)

But for this issue, this mathematician prefers the simple invariant that the key function is called exactly once per list item. If your key function doesn't work for one of those items for some reason, that sounds like a problem with the key function rather than a reason to change the sorting implementation.
History
Date User Action Args
2022-04-11 14:58:16adminsetgithub: 68263
2015-04-30 03:31:33mark.dickinsonsetmessages: + msg242258
2015-04-29 20:27:49Sergey.Kirpichevsetfiles: + 0001-list.sort-Add-quick-exit-if-length-of-list-1.patch
2015-04-29 20:25:30Sergey.Kirpichevsetmessages: + msg242246
2015-04-29 19:40:07benjamin.petersonsetmessages: + msg242245
2015-04-29 18:51:21paul.mooresetmessages: + msg242240
2015-04-29 18:42:18Sergey.Kirpichevsetmessages: + msg242239
2015-04-29 17:44:22paul.mooresetnosy: + paul.moore
messages: + msg242237
2015-04-29 17:25:36Sergey.Kirpichevsetmessages: + msg242236
2015-04-29 15:25:19benjamin.petersonsetmessages: + msg242233
2015-04-29 15:03:54Sergey.Kirpichevsetmessages: + msg242232
2015-04-29 14:59:59benjamin.petersonsetstatus: open -> closed
resolution: rejected
2015-04-29 14:55:50mark.dickinsonsetnosy: + mark.dickinson
messages: + msg242231
2015-04-29 14:54:14rhettingersetmessages: + msg242230
2015-04-29 14:32:34benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg242228
2015-04-29 14:31:41matrixisesetmessages: + msg242227
2015-04-29 14:27:14serhiy.storchakasetnosy: + tim.peters, rhettinger
stage: patch review
type: behavior -> performance

versions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.6
2015-04-29 14:23:13Sergey.Kirpichevsetmessages: + msg242226
2015-04-29 13:47:16matrixisesetnosy: + r.david.murray, matrixise
messages: + msg242223
2015-04-29 13:30:43Sergey.Kirpichevcreate