Issue24075
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-04-29 13:30 by Sergey.Kirpichev, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
trivial-sorting-py3.patch | Sergey.Kirpichev, 2015-04-29 13:30 | patch for python 3.[2-6] | ||
0001-list.sort-Add-quick-exit-if-length-of-list-1.patch | Sergey.Kirpichev, 2015-04-29 20:27 | updated patch (with test) | review |
Messages (16) | |||
---|---|---|---|
msg242222 - (view) | Author: Sergey B Kirpichev (Sergey.Kirpichev) * | Date: 2015-04-29 13:30 | |
If there is nothing to sort (i.e. one item), why call key function at all? In my practical situation, simplest key() function will lead to recursion in case of such trivial lists. I can make similar cmp-type function (i.e. mycmp=lambda a, b: cmp(key(a), key(b))) and then wrap this with cmp_to_key. But that looks silly. Simple test case: $ cat a.py a = [1] def spam(x): raise Exception a.sort(key=spam) print(a) $ python a.py Traceback (most recent call last): File "a.py", line 6, in <module> a.sort(key=spam) File "a.py", line 4, in spam raise Exception Exception |
|||
msg242223 - (view) | Author: Stéphane Wirtel (matrixise) * | Date: 2015-04-29 13:47 | |
The patch is ok for me, |
|||
msg242226 - (view) | Author: Sergey B Kirpichev (Sergey.Kirpichev) * | Date: 2015-04-29 14:23 | |
should I add a regression test? If so, where? ./Lib/test/test_sort.py? |
|||
msg242227 - (view) | Author: Stéphane Wirtel (matrixise) * | Date: 2015-04-29 14:31 | |
Yep, add a regression test. |
|||
msg242228 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2015-04-29 14:32 | |
Why does your key function depend on the size of the list? That seems like the root of the problem. Considering calling the key function is observable behavior, I don't think this should be changed. The patch makes behavior list.sort() inconsistent. |
|||
msg242230 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-04-29 14:54 | |
FWIW, I don't think this is worth special casing. |
|||
msg242231 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2015-04-29 14:55 | |
> Considering calling the key function is observable behavior, I don't think this should be changed. +1 |
|||
msg242232 - (view) | Author: Sergey B Kirpichev (Sergey.Kirpichev) * | Date: 2015-04-29 15:03 | |
On Wed, Apr 29, 2015 at 02:32:34PM +0000, Benjamin Peterson wrote: > Why does your key function depend on the size of the list? Because it's a real life. Here is the code: https://github.com/skirpichev/omg/blob/gruntz-use-subs/sympy/series/gruntz.py#L337 Algorithm is recursive and computation of MRV set will be finished only if I add an exceptional case for len(Omega). Or even more ugly solution with cmp-style function: https://github.com/skirpichev/omg/blob/efc70377639f78fc0f246629989056cb80a71923/sympy/series/gruntz.py#L339 > Considering calling the key function is observable behavior, I don't think this > should be changed. The patch makes behavior list.sort() inconsistent. Yes, in some sense. On another hand, it's reasonable to believe that key function will be called iff we need one. But if there is nothing to sort at all - why do sorting, why you want to call key function? Looks irrational to me. Last by not least. Why the return value of the key function *must* be defined in this case? Just a hidden, undocumented restriction and has no practical value. BTW, why this issue was closed? |
|||
msg242233 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2015-04-29 15:25 | |
On Wed, Apr 29, 2015, at 11:03, Sergey B Kirpichev wrote: > > Sergey B Kirpichev added the comment: > > On Wed, Apr 29, 2015 at 02:32:34PM +0000, Benjamin Peterson wrote: > > Why does your key function depend on the size of the list? > > Because it's a real life. Here is the code: > https://github.com/skirpichev/omg/blob/gruntz-use-subs/sympy/series/gruntz.py#L337 > > Algorithm is recursive and computation of MRV set will be finished only > if I add an exceptional case for len(Omega). Or even more ugly solution > with cmp-style function: > https://github.com/skirpichev/omg/blob/efc70377639f78fc0f246629989056cb80a71923/sympy/series/gruntz.py#L339 So, basically you need a base case for recursion? What's wrong with explicitly writing that out? > > > Considering calling the key function is observable behavior, I don't think this > > should be changed. The patch makes behavior list.sort() inconsistent. > > Yes, in some sense. > > On another hand, it's reasonable to believe that key function will be > called iff we need one. But if there is nothing to sort at all - why do > sorting, why you want to call key function? Looks irrational to me. > > Last by not least. Why the return value of the key function *must* > be defined in this case? Just a hidden, undocumented restriction and > has no practical value. It's practical if you have a broken key function and test it with a one element list. > > BTW, why this issue was closed? 3 of us agreed this doesn't seem like a suitable change. |
|||
msg242236 - (view) | Author: Sergey B Kirpichev (Sergey.Kirpichev) * | Date: 2015-04-29 17:25 | |
On Wed, Apr 29, 2015 at 03:25:19PM +0000, Benjamin Peterson wrote: > So, basically you need a base case for recursion? What's wrong with > explicitly writing that out? Because it's complex (and costly). This is not a trivial test and I don't see reasons to fix that is not broken. And it will be difficult to explain for readers: remember, I need this exceptional case only in the world with a strange Python's convention (Python try to sort a list when it doesn't make sense). Mathematical algorithm is not broken - programming language is. Here is C: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c;#l45 Here is Ruby: https://github.com/ruby/ruby/blob/trunk/array.c#L2454 > It's practical if you have a broken key function and test it with a one > element list. It's silly to test key function on a single-element list *only*. > > BTW, why this issue was closed? > > 3 of us agreed this doesn't seem like a suitable change. And 1 seems to be ok with patch. Is this just a question of number of votes? At least, please consider this as a documentation issue. That ... feature may be obvious for a Python developer, but not for mathematician (as well as ordinary Python user). When key function value has no sense at all - it's not clear from the documentation, that the key function will be called. |
|||
msg242237 - (view) | Author: Paul Moore (paul.moore) * | Date: 2015-04-29 17:44 | |
I think the documentation is fine: """ The key corresponding to each item in the list is calculated once and then used for the entire sorting process. """ This corresponds with the standard "decorate-sort-undecorate" approach to handling key functions in sorts. It's a common computer science technique, possibly not as familiar to a mathematician, but regardless, the docs clearly state that the key is calculated for each item. Your existing code, with a check for Omega having length 1 and omitting the sort in that case, looks entirely reasonable to me. Maybe you could add a comment "Avoid a costly calculation of the key when length is 1, as we know we don't need to sort then" and leave it at that. |
|||
msg242239 - (view) | Author: Sergey B Kirpichev (Sergey.Kirpichev) * | Date: 2015-04-29 18:42 | |
On Wed, Apr 29, 2015 at 05:44:22PM +0000, Paul Moore wrote: > I think the documentation is fine: > """ > The key corresponding to each item in the list is calculated once and then used for the entire sorting process. > """ Does any "sorting process" make sense for [1] or []?! No, it isn't. So, it's not clear if this "process" started at all. This not just mine opinion - most computer languages implement the quick exit in question (see examples above). > It's a common computer science technique Could you provide any language that avoid this optimization? Here is Perl 5: http://perl5.git.perl.org/perl.git/blob/HEAD:/pp_sort.c#l367 (third example) > Your existing code, with a check for Omega having length 1 and omitting > the sort in that case, looks entirely reasonable to me. (Well, then I should look for other languages, if Python devs insist in doing useless work...) > Maybe you could add a comment "Avoid a costly calculation of the > key when length is 1, as we know we don't need to sort then" I sure, for most people - the idea of sorting list with one element will look crazy. There is no room for any "costly calculations". (Common sense, nothing more.) So, such comment will add more questions... |
|||
msg242240 - (view) | Author: Paul Moore (paul.moore) * | Date: 2015-04-29 18:51 | |
On 29 April 2015 at 19:42, Sergey B Kirpichev <report@bugs.python.org> wrote: >> It's a common computer science technique > > Could you provide any language that avoid this optimization? > > Here is Perl 5: > http://perl5.git.perl.org/perl.git/blob/HEAD:/pp_sort.c#l367 > > (third example) But that's a sort without a key. In Perl you do a key sort via: @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } # use numeric comparison map { [$_, length($_)] } # calculate the length of the string @unsorted; (From http://en.wikipedia.org/wiki/Schwartzian_transform). That computes the keys first, and would compute the key for a list of length 1, just like Python does. It's just that Python bundles that whole construct into the "key=" argument. But it's your choice - if this is a big enough deal to put you off Python, I guess no-one will be able to stop you. The fact of the matter is that what Python does is documented behaviour, and the benefit (small) isn't worth the cost of making a change (which would only be in Python 3.5 and later anyway, as it's a backward incompatible change, not a bug fix). |
|||
msg242245 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2015-04-29 19:40 | |
On Wed, Apr 29, 2015, at 13:25, Sergey B Kirpichev wrote: > > Sergey B Kirpichev added the comment: > > On Wed, Apr 29, 2015 at 03:25:19PM +0000, Benjamin Peterson wrote: > > So, basically you need a base case for recursion? What's wrong with > > explicitly writing that out? > > Because it's complex (and costly). This is not a trivial test and > I don't see reasons to fix that is not broken. And it will be difficult > to explain for readers: remember, I need this exceptional case only in > the world with a strange Python's convention (Python try to sort a list > when it doesn't make sense). > > Mathematical algorithm is not broken - programming language is. > > Here is C: > https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c;#l45 > Here is Ruby: > https://github.com/ruby/ruby/blob/trunk/array.c#L2454 I don't understand the analogy, since neither of these two have key functions. > > > It's practical if you have a broken key function and test it with a one > > element list. > > It's silly to test key function on a single-element list *only*. > > > > BTW, why this issue was closed? > > > > 3 of us agreed this doesn't seem like a suitable change. > > And 1 seems to be ok with patch. Is this just a question of > number of votes? I should also clarify that Raymond and Mark and responsible for maintaining most of the algorithmic/data structure code in Python. > > At least, please consider this as a documentation issue. That ... > feature may be obvious for a Python developer, but not for > mathematician (as well as ordinary Python user). This is probably impossible to prove either way. |
|||
msg242246 - (view) | Author: Sergey B Kirpichev (Sergey.Kirpichev) * | Date: 2015-04-29 20:25 | |
On Wed, Apr 29, 2015 at 06:51:21PM +0000, Paul Moore wrote: > But that's a sort without a key. Why it does matter? It have quick exit. For same reasons - Python could... > In Perl you do a key sort via: That's just your implementation. But we could add here a quick exit as well. > The fact of the matter is that what Python does is documented behaviour No. Unless you absolutely sure - all readers think that "sorting process" starts even for trivial lists. No reasons to believe in that nonsense - as you could see from sorting implementations in other languages. > benefit (small) isn't worth the cost of making a change (which would > only be in Python 3.5 and later anyway It's easy for users (i.e. me) to backport this feature (i.e. make wrapper for sorted()). Benefit is small, I admit, but why not remove unnecessary restrictions from the language? I hope, I did my best to explain why. |
|||
msg242258 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2015-04-30 03:31 | |
> I should also clarify that Raymond and Mark and responsible for maintaining most of the algorithmic/data structure code in Python. Well, Raymond at least. I plead not guilty; I think you're confusing me with someone else. :-) But for this issue, this mathematician prefers the simple invariant that the key function is called exactly once per list item. If your key function doesn't work for one of those items for some reason, that sounds like a problem with the key function rather than a reason to change the sorting implementation. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:16 | admin | set | github: 68263 |
2015-04-30 03:31:33 | mark.dickinson | set | messages: + msg242258 |
2015-04-29 20:27:49 | Sergey.Kirpichev | set | files: + 0001-list.sort-Add-quick-exit-if-length-of-list-1.patch |
2015-04-29 20:25:30 | Sergey.Kirpichev | set | messages: + msg242246 |
2015-04-29 19:40:07 | benjamin.peterson | set | messages: + msg242245 |
2015-04-29 18:51:21 | paul.moore | set | messages: + msg242240 |
2015-04-29 18:42:18 | Sergey.Kirpichev | set | messages: + msg242239 |
2015-04-29 17:44:22 | paul.moore | set | nosy:
+ paul.moore messages: + msg242237 |
2015-04-29 17:25:36 | Sergey.Kirpichev | set | messages: + msg242236 |
2015-04-29 15:25:19 | benjamin.peterson | set | messages: + msg242233 |
2015-04-29 15:03:54 | Sergey.Kirpichev | set | messages: + msg242232 |
2015-04-29 14:59:59 | benjamin.peterson | set | status: open -> closed resolution: rejected |
2015-04-29 14:55:50 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg242231 |
2015-04-29 14:54:14 | rhettinger | set | messages: + msg242230 |
2015-04-29 14:32:34 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg242228 |
2015-04-29 14:31:41 | matrixise | set | messages: + msg242227 |
2015-04-29 14:27:14 | serhiy.storchaka | set | nosy:
+ tim.peters, rhettinger stage: patch review type: behavior -> performance versions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.6 |
2015-04-29 14:23:13 | Sergey.Kirpichev | set | messages: + msg242226 |
2015-04-29 13:47:16 | matrixise | set | nosy:
+ r.david.murray, matrixise messages: + msg242223 |
2015-04-29 13:30:43 | Sergey.Kirpichev | create |