classification
Title: SystemError or crash in sorted(iterable=[])
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: python-dev, rhettinger, serhiy.storchaka, vstinner
Priority: high Keywords: 3.6regression, patch

Created on 2017-01-19 17:31 by serhiy.storchaka, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
sorted.diff serhiy.storchaka, 2017-01-19 18:29 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (17)
msg285817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-19 17:31
In Python 3.6:

>>> sorted(iterable=[])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Objects/tupleobject.c:81: bad argument to internal function

In Python 3.5 and 2.7:

>>> sorted(iterable=[])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'iterable' is an invalid keyword argument for this function
msg285826 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-19 18:08
Seems this regression was introduced in issue27809.
msg285827 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-19 18:11
In debug build Python is crashed.

>>> sorted(iterable=[])
python: Objects/abstract.c:2317: _PyObject_FastCallDict: Assertion `nargs >= 0' failed.
Aborted (core dumped)
msg285829 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-19 18:29
Proposed patch fixes the bug.
msg285848 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-01-19 21:55
Please be careful with all of these AC changes.  They need to have tests.  They need to not change APIs.  They need to not introduce bugs.  The whole effort seems to be being pushed forward in a cavalier and aggressive manner.

In this case, there was an API change and bugs introduced for basically zero benefit.
msg285849 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-19 22:20
Oh, this issue is very subtle.

Since the list.sorted() class method became a builtin sorted() method (Python 2.4.0, change c06b570adf12 by Raymond Hettinger), the sorted() function accepts an iterable as a keyword argument, whereas list.sort() doesn't. sorted(iterable=[]) fails on calling internally list.sort(iterable=[]), not when sorted() parses its arguments.

The change 6219aa966a5f (issue #20184) converted sorted() to Argument Clinic. This change was released in Python 3.5.3 and 3.6.0... but it didn't introduce the bug. It's not the fault of Argument Clinic!

The change b34d2ef5c412 (issue #27809) replacing METH_VARARGS|METH_KEYWORDS with METH_FASTCALL didn't introduce the bug neither.

It's the change 15eab21bf934 (issue #27809) which replaced PyEval_CallObjectWithKeywords() with _PyObject_FastCallDict(). This change also replaces PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1) which introduced the bug. I didn't notice that args can be an empty tuple. I never tried to call sorted(iterable=seq), I didn't know the name of the first parameter :-)

I also replaced PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1) in methoddescr_call(), but this function make sure that we have at least one positional argument and so doesn't have this bug. Moreover, this method doesn't use Argument Clinic (yet? ;-)). I'm quite sure that I didn't replace PyTuple_GetSlice() in other functions.
msg285851 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-19 22:34
While Python 3.5 doesn't crash, I consider that it has also the bug. So I added Python 3.5 in Versions.

sorted.diff LGTM. And thank you for the unit test!
msg285855 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-19 22:50
Raymond: "Please be careful with all of these AC changes.  They need to have tests.  They need to not change APIs.  They need to not introduce bugs."

The bug was not introduced by an Argument Clinic change.

I agree that switching to AC must not change the API.

I didn't look much at the "recent" AC changes (it started somethings like 2 years ago, no?), but it seems like the most common trap are default values of optional positional arguments. Sometimes, there are inconsistencies between .rst doc (in Doc/ directory), the docstring and the C implementation. The trap is when the default is documented as None, the C code uses NULL and passing None behaves differently...


Raymond: "The whole effort seems to be being pushed forward in a cavalier and aggressive manner. In this case, there was an API change and bugs introduced for basically zero benefit."

I converted a few functions to AC. I used regular reviews for that, since I noticed that it's easy to make mistakes. My hope is that the AC conversion will also help to fix inconsistencies.

The benefit of switching to AC is a better docstring and a correct signature for inspect.signature(func). Python 3.5:
---
sorted(...)
    sorted(iterable, key=None, reverse=False) --> new sorted list
---
versus Python 3.7
---
sorted(iterable, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.
---

IHMO Python 3.7 docstring looks better. (Sadly, sorted() doesn't use AC yet, the "AC code" was genereted manually. AC should support **kwargs, issue #20291.)



I guess "cavalier" and "aggressive" is more related to my FASTCALL work. I'm waiting for reviews for major API changes. I agree that I pushed a lot of code without reviews when I considered that the change was safe and simple. It became hard to get a review, there are too few reviewers, especially on the C code.

The benefit of FASTCALL are better performances. I'm trying to provide benchmarks whenever possible, but since I modified dozens of functions, sometimes I didn't publish all benchmark results (sometimes even to not spam an issue).


Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):
---
haypo@smithers$ ./python -m perf timeit 'seq=list(range(10))' 'sorted(seq)' --compare-to=../3.5/python -v
Median +- std dev: [3.5] 1.07 us +- 0.06 us -> [3.7] 958 ns +- 15 ns: 1.12x faster (-11%)

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v
Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)
---

It's not easy nor interesting to measure the speedup of the changes limited to sorted(). Sadly, FASTCALL requires to modify a lot of code to show its full power. Otherwise, the gain is much smaller.


The latest major "API change" was made by you 14 years ago. Yes, I introduced a bug, and I'm sorry about that. Shit happens.

Let's be more constructive: to avoid bugs, the best is to get reviews. I have dozens of patches waiting for your review, so please come to help me to spot bugs before releases ;-)
msg285873 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-01-20 03:50
A few random thoughts that may or may not be helpful:

* We now have two seasoned developers and one new core developer that collectively are creating many non-trivial patches to core parts of Python at an unprecedented rate of change.  The patches are coming in much faster than they can reasonably be reviewed and carefully considered, especially by devs such as myself who have very limited time available.  IMO, taken as whole, these changes are destabilizing the language.  Python is so successful and widely adopted that we can't afford a "shit happens" attitude.  Perhaps that works in corners of the language, infrequently used modules, but it makes less sense when touching the critical paths that have had slow and careful evolution over 26 years.

* Besides the volume of patches, one other reason that reviews are hard to come by is that they apply new APIs that I don't fully understand yet.  There are perhaps two people on the planet who could currently give thoughtful, correct, and critical evaluation of all those patches.  Everyone else is just watching them all fly by and hoping that something good is happening.

* One other reason for the lack of review comments in the enthusiasm and fervor surrounding the patches.  I feel like there is a cost of questioning whether the patches should be done or how they are done, like I am burning little karma every time.  Sometimes it feels safest and most cordial to just say nothing and let you make hundreds of semi-reviewed changes to just about every critical part of the language.

* Historically, if there was creator or maintainer of the code who was still active, that person would always be consulted and have a final say on whether a change should be applied.  Now, we have code constantly being changed without consulting the original author (for example, the recent and catastrophic random initialization bug was due to application of a patch without consulting the author of _randommodule.c and the maintainer of random.py, or this change to sorted(), or the changes to decimal, etc).

* In general, Guido has been opposed to sweeping changes across the code base for only tiny benefits.  Of late, that rule seems to have been lost.

* The benefits of FASTCALL mainly apply to fine grained functions which only do a little work and tend to be called frequently in loops.  For functions such as sorted(), the calling overhead is dominated by the cost of actually doing the sort.  For sorted(), FASTCALL is truly irrelevant and likely wasn't worth the complexity, or the actual bug, or any of the time we've now put in it.  There was no actual problem being solved, just a desire to broadly apply new optimizations.

* Historically, we've relied on core developers showing restraint.  Not every idea that pops into their head is immediately turned into a patch accompanied by pressure to apply it.  Devs tended to restrict themselves to parts of the code they knew best through long and careful study rather sweeping through modules and altering other people's carefully crafted code.

* FWIW, I applaud your efforts to reduce call overhead -- that has long been a sore spot for the language.

* Guido has long opposed optimizations that increase risk of bugs, introduce complexity, or that affect long-term maintainability.   In some places, it looks like FASTCALL is increasing the complexity (replacing something simple and well-understood with a wordier, more intricate API that I don't yet fully understand and will affect my ability to maintain the surrounding code).

* It was no long ago that you fought tooth-and-nail against a single line patch optimization I submitted.  The code was clearly correct and had a simple disassembly to prove its benefit.  Your opposition was based on "it increases the complexity of the code, introduces a maintenance cost, and increases the risk of bugs".  In the end, your opposition killed the patch.  But now, the AC and FASTCALL patches don't seem to mind any of these considerations.

* AC is supposed to be a CPython-only concept.  But along the way APIs are being changed without discussion.  I don't mind that sorted() now exposes *iterable* as a keyword argument, but it was originally left out on purpose (Tim opined that code would look worse with iterable as a keyword argument).  That decision was reversed unilaterally without consulting the author and without a test.  Also as AC is being applied, the variable names are being changed.  I never really liked the "mp" that used in dicts and prefer the use of "self" better, but it is a gratuitous change that unilaterally reverses the decisions of the authors and makes the code not match any of the surrounding code that uses the prior conventions.

* FWIW, the claim that the help is much better is specious.  AFAICT, there has never been the slightest problem with "sorted(iterable, key=None, reverse=False) --> new sorted list" which has been clear since the day it was released.   It is some of the new strings the are causing problems with users (my students frequently are tripped-up by the / notation for example; no one seems to be able to intuit what it means without it being explained first).

* FWIW, I'm trying to be constructive and contribute where I can, but frankly I can't keep up with the volume of churn.   Having seen bugs being introduced, it is not inappropriate to ask another dev to please be careful, especially when that dev has been prolific to an unprecedented degree and altering core parts of the language for function calls, to new opcodes, the memory allocators, etc.  Very few people on the planet are competent to review these changes, make reasonable assessments about whether the complexity and churn are worth it.  An fewer still have the time to keep up with the volume of changes.

* Please do continue your efforts to improve the language, but also please moderate the rate of change, mitigate the addition complexity, value stability over micro-optimizations, consult the authors and maintainers of code, take special care without code that hasn't been reviewed because that lacks a safety net, and remember that newer devs may be taking cues from you (do you want them making extensive changes to long existing stable code without consulting the authors and with weak LGTM reviews?)
msg285880 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-20 06:04
> While Python 3.5 doesn't crash, I consider that it has also the bug. So I
> added Python 3.5 in Versions.

The patch uses new feature of 3.6 -- supporting positional-only parameters in 
PyArg_ParseTupleAndKeywords()  (see issue26282). Since passing an iterable as 
a keyword argument never worked, it is safe to make the first parameter 
positional-only.

Actually I think that argument parsing code of sorted() and list.sort() can be 
simplified, but this is separate issue. I tried to make the bugfix patch simple.
msg285881 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-01-20 06:16
Serhiy, this patch passes tests and looks fine.  I think you can go ahead with this patch.
msg285883 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-20 06:24
> Serhiy, this patch passes tests and looks fine.  I think you can go ahead
> with this patch.

Right now I'm building 3.6 with applied patch before final testing and 
committing. My netbook is very slow. :(
msg285884 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-20 06:29
The title is spoiled when reply by email. I suppose [] has special meaning in email subject.
msg285885 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2017-01-20 06:35
New changeset 1827b64cfce8 by Serhiy Storchaka in branch '3.6':
Issue #29327: Fixed a crash when pass the iterable keyword argument to sorted().
https://hg.python.org/cpython/rev/1827b64cfce8

New changeset f44f44b14dfc by Serhiy Storchaka in branch 'default':
Issue #29327: Fixed a crash when pass the iterable keyword argument to sorted().
https://hg.python.org/cpython/rev/f44f44b14dfc
msg285889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-20 07:49
"The patch uses new feature of 3.6 -- supporting positional-only
parameters in PyArg_ParseTupleAndKeywords()  (see issue26282)."

Oh, I didn't recall that it's a new feature. It's a nice feature by
the way, thanks for implementing it :-)

In that case, yeah it's ok to leave Python 3.5 unchanged. I proposed
to fix 3.5 because I expected that you could use exactly the same
patch.
msg285899 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-20 10:57
Raymond Hettinger: "A few random thoughts that may or may not be helpful: (...)"

I replied on the python-committers mailing list:
https://mail.python.org/pipermail/python-committers/2017-January/004129.html

I'm not sure that this specific issue is the best place to discuss, I propose to continue the discussion there.
msg285900 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-20 11:01
Thank you for your reviews Victor and Raymond.

As for more general Raymond comments, I agreed with many of them in principle, but this isn't directly related to this issue. Victor opened a topic on the python-committers mailing list:

https://mail.python.org/pipermail/python-committers/2017-January/004129.html
History
Date User Action Args
2017-03-31 16:36:14dstufftsetpull_requests: + pull_request895
2017-01-20 11:02:15vstinnersettitle: SystemError or crash in sorted(iterable= -> SystemError or crash in sorted(iterable=[])
2017-01-20 11:01:32serhiy.storchakasetstatus: open -> closed
versions: - Python 3.5
messages: + msg285900

resolution: fixed
stage: patch review -> resolved
2017-01-20 10:57:37vstinnersetmessages: + msg285899
2017-01-20 07:49:46vstinnersetmessages: + msg285889
title: SystemError or crash in sorted(iterable=[]) -> SystemError or crash in sorted(iterable=
2017-01-20 06:35:37python-devsetnosy: + python-dev
messages: + msg285885
2017-01-20 06:29:27serhiy.storchakasetmessages: + msg285884
title: SystemError or crash in sorted(iterable= -> SystemError or crash in sorted(iterable=[])
2017-01-20 06:24:59serhiy.storchakasetmessages: + msg285883
title: SystemError or crash in sorted(iterable=[]) -> SystemError or crash in sorted(iterable=
2017-01-20 06:16:22rhettingersetassignee: serhiy.storchaka
messages: + msg285881
2017-01-20 06:09:34serhiy.storchakasettitle: SystemError or crash in sorted(iterable= -> SystemError or crash in sorted(iterable=[])
2017-01-20 06:04:20serhiy.storchakasetmessages: + msg285880
title: SystemError or crash in sorted(iterable=[]) -> SystemError or crash in sorted(iterable=
2017-01-20 03:50:50rhettingersetmessages: + msg285873
2017-01-19 22:50:38vstinnersetmessages: + msg285855
2017-01-19 22:34:58vstinnersetmessages: + msg285851
versions: + Python 3.5
2017-01-19 22:20:43vstinnersetmessages: + msg285849
2017-01-19 21:55:08rhettingersetpriority: normal -> high
nosy: + rhettinger
messages: + msg285848

2017-01-19 18:29:16serhiy.storchakasetfiles: + sorted.diff

title: SystemError in sorted(iterable=[]) -> SystemError or crash in sorted(iterable=[])
keywords: + patch
nosy: + vstinner

messages: + msg285829
stage: patch review
2017-01-19 18:11:24serhiy.storchakasettype: behavior -> crash
messages: + msg285827
2017-01-19 18:08:46serhiy.storchakasetmessages: + msg285826
2017-01-19 17:31:08serhiy.storchakacreate