Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemError or crash in sorted(iterable=[]) #73513

Closed
serhiy-storchaka opened this issue Jan 19, 2017 · 17 comments
Closed

SystemError or crash in sorted(iterable=[]) #73513

serhiy-storchaka opened this issue Jan 19, 2017 · 17 comments
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

BPO 29327
Nosy @rhettinger, @vstinner, @serhiy-storchaka
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • sorted.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-01-20.11:01:32.442>
    created_at = <Date 2017-01-19.17:31:08.008>
    labels = ['interpreter-core', '3.7', 'type-crash']
    title = 'SystemError or crash in sorted(iterable=[])'
    updated_at = <Date 2017-03-31.16:36:14.142>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:14.142>
    actor = 'dstufft'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-01-20.11:01:32.442>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2017-01-19.17:31:08.008>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['46344']
    hgrepos = []
    issue_num = 29327
    keywords = ['patch', '3.6regression']
    message_count = 17.0
    messages = ['285817', '285826', '285827', '285829', '285848', '285849', '285851', '285855', '285873', '285880', '285881', '285883', '285884', '285885', '285889', '285899', '285900']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = ['552']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue29327'
    versions = ['Python 3.6', 'Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    In Python 3.6:

    >>> sorted(iterable=[])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    SystemError: Objects/tupleobject.c:81: bad argument to internal function

    In Python 3.5 and 2.7:

    >>> sorted(iterable=[])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'iterable' is an invalid keyword argument for this function

    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jan 19, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    Seems this regression was introduced in bpo-27809.

    @serhiy-storchaka
    Copy link
    Member Author

    In debug build Python is crashed.

    >>> sorted(iterable=[])
    python: Objects/abstract.c:2317: _PyObject_FastCallDict: Assertion `nargs >= 0' failed.
    Aborted (core dumped)

    @serhiy-storchaka serhiy-storchaka added type-crash A hard crash of the interpreter, possibly with a core dump and removed type-bug An unexpected behavior, bug, or error labels Jan 19, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    Proposed patch fixes the bug.

    @serhiy-storchaka serhiy-storchaka changed the title SystemError in sorted(iterable=[]) SystemError or crash in sorted(iterable=[]) Jan 19, 2017
    @rhettinger
    Copy link
    Contributor

    Please be careful with all of these AC changes. They need to have tests. They need to not change APIs. They need to not introduce bugs. The whole effort seems to be being pushed forward in a cavalier and aggressive manner.

    In this case, there was an API change and bugs introduced for basically zero benefit.

    @vstinner
    Copy link
    Member

    Oh, this issue is very subtle.

    Since the list.sorted() class method became a builtin sorted() method (Python 2.4.0, change c06b570adf12 by Raymond Hettinger), the sorted() function accepts an iterable as a keyword argument, whereas list.sort() doesn't. sorted(iterable=[]) fails on calling internally list.sort(iterable=[]), not when sorted() parses its arguments.

    The change 6219aa966a5f (issue bpo-20184) converted sorted() to Argument Clinic. This change was released in Python 3.5.3 and 3.6.0... but it didn't introduce the bug. It's not the fault of Argument Clinic!

    The change b34d2ef5c412 (issue bpo-27809) replacing METH_VARARGS|METH_KEYWORDS with METH_FASTCALL didn't introduce the bug neither.

    It's the change 15eab21bf934 (issue bpo-27809) which replaced PyEval_CallObjectWithKeywords() with _PyObject_FastCallDict(). This change also replaces PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1) which introduced the bug. I didn't notice that args can be an empty tuple. I never tried to call sorted(iterable=seq), I didn't know the name of the first parameter :-)

    I also replaced PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1) in methoddescr_call(), but this function make sure that we have at least one positional argument and so doesn't have this bug. Moreover, this method doesn't use Argument Clinic (yet? ;-)). I'm quite sure that I didn't replace PyTuple_GetSlice() in other functions.

    @vstinner
    Copy link
    Member

    While Python 3.5 doesn't crash, I consider that it has also the bug. So I added Python 3.5 in Versions.

    sorted.diff LGTM. And thank you for the unit test!

    @vstinner
    Copy link
    Member

    Raymond: "Please be careful with all of these AC changes. They need to have tests. They need to not change APIs. They need to not introduce bugs."

    The bug was not introduced by an Argument Clinic change.

    I agree that switching to AC must not change the API.

    I didn't look much at the "recent" AC changes (it started somethings like 2 years ago, no?), but it seems like the most common trap are default values of optional positional arguments. Sometimes, there are inconsistencies between .rst doc (in Doc/ directory), the docstring and the C implementation. The trap is when the default is documented as None, the C code uses NULL and passing None behaves differently...

    Raymond: "The whole effort seems to be being pushed forward in a cavalier and aggressive manner. In this case, there was an API change and bugs introduced for basically zero benefit."

    I converted a few functions to AC. I used regular reviews for that, since I noticed that it's easy to make mistakes. My hope is that the AC conversion will also help to fix inconsistencies.

    The benefit of switching to AC is a better docstring and a correct signature for inspect.signature(func). Python 3.5:
    ---

    sorted(...)
        sorted(iterable, key=None, reverse=False) --> new sorted list

    versus Python 3.7
    ---

    sorted(iterable, key=None, reverse=False)
        Return a new list containing all items from the iterable in ascending order.
        
        A custom key function can be supplied to customize the sort order, and the
        reverse flag can be set to request the result in descending order.

    IHMO Python 3.7 docstring looks better. (Sadly, sorted() doesn't use AC yet, the "AC code" was genereted manually. AC should support **kwargs, issue bpo-20291.)

    I guess "cavalier" and "aggressive" is more related to my FASTCALL work. I'm waiting for reviews for major API changes. I agree that I pushed a lot of code without reviews when I considered that the change was safe and simple. It became hard to get a review, there are too few reviewers, especially on the C code.

    The benefit of FASTCALL are better performances. I'm trying to provide benchmarks whenever possible, but since I modified dozens of functions, sometimes I didn't publish all benchmark results (sometimes even to not spam an issue).

    Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):
    ---
    haypo@smithers$ ./python -m perf timeit 'seq=list(range(10))' 'sorted(seq)' --compare-to=../3.5/python -v
    Median +- std dev: [3.5] 1.07 us +- 0.06 us -> [3.7] 958 ns +- 15 ns: 1.12x faster (-11%)

    haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v
    Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)
    ---

    It's not easy nor interesting to measure the speedup of the changes limited to sorted(). Sadly, FASTCALL requires to modify a lot of code to show its full power. Otherwise, the gain is much smaller.

    The latest major "API change" was made by you 14 years ago. Yes, I introduced a bug, and I'm sorry about that. Shit happens.

    Let's be more constructive: to avoid bugs, the best is to get reviews. I have dozens of patches waiting for your review, so please come to help me to spot bugs before releases ;-)

    @rhettinger
    Copy link
    Contributor

    A few random thoughts that may or may not be helpful:

    • We now have two seasoned developers and one new core developer that collectively are creating many non-trivial patches to core parts of Python at an unprecedented rate of change. The patches are coming in much faster than they can reasonably be reviewed and carefully considered, especially by devs such as myself who have very limited time available. IMO, taken as whole, these changes are destabilizing the language. Python is so successful and widely adopted that we can't afford a "shit happens" attitude. Perhaps that works in corners of the language, infrequently used modules, but it makes less sense when touching the critical paths that have had slow and careful evolution over 26 years.

    • Besides the volume of patches, one other reason that reviews are hard to come by is that they apply new APIs that I don't fully understand yet. There are perhaps two people on the planet who could currently give thoughtful, correct, and critical evaluation of all those patches. Everyone else is just watching them all fly by and hoping that something good is happening.

    • One other reason for the lack of review comments in the enthusiasm and fervor surrounding the patches. I feel like there is a cost of questioning whether the patches should be done or how they are done, like I am burning little karma every time. Sometimes it feels safest and most cordial to just say nothing and let you make hundreds of semi-reviewed changes to just about every critical part of the language.

    • Historically, if there was creator or maintainer of the code who was still active, that person would always be consulted and have a final say on whether a change should be applied. Now, we have code constantly being changed without consulting the original author (for example, the recent and catastrophic random initialization bug was due to application of a patch without consulting the author of _randommodule.c and the maintainer of random.py, or this change to sorted(), or the changes to decimal, etc).

    • In general, Guido has been opposed to sweeping changes across the code base for only tiny benefits. Of late, that rule seems to have been lost.

    • The benefits of FASTCALL mainly apply to fine grained functions which only do a little work and tend to be called frequently in loops. For functions such as sorted(), the calling overhead is dominated by the cost of actually doing the sort. For sorted(), FASTCALL is truly irrelevant and likely wasn't worth the complexity, or the actual bug, or any of the time we've now put in it. There was no actual problem being solved, just a desire to broadly apply new optimizations.

    • Historically, we've relied on core developers showing restraint. Not every idea that pops into their head is immediately turned into a patch accompanied by pressure to apply it. Devs tended to restrict themselves to parts of the code they knew best through long and careful study rather sweeping through modules and altering other people's carefully crafted code.

    • FWIW, I applaud your efforts to reduce call overhead -- that has long been a sore spot for the language.

    • Guido has long opposed optimizations that increase risk of bugs, introduce complexity, or that affect long-term maintainability. In some places, it looks like FASTCALL is increasing the complexity (replacing something simple and well-understood with a wordier, more intricate API that I don't yet fully understand and will affect my ability to maintain the surrounding code).

    • It was no long ago that you fought tooth-and-nail against a single line patch optimization I submitted. The code was clearly correct and had a simple disassembly to prove its benefit. Your opposition was based on "it increases the complexity of the code, introduces a maintenance cost, and increases the risk of bugs". In the end, your opposition killed the patch. But now, the AC and FASTCALL patches don't seem to mind any of these considerations.

    • AC is supposed to be a CPython-only concept. But along the way APIs are being changed without discussion. I don't mind that sorted() now exposes *iterable* as a keyword argument, but it was originally left out on purpose (Tim opined that code would look worse with iterable as a keyword argument). That decision was reversed unilaterally without consulting the author and without a test. Also as AC is being applied, the variable names are being changed. I never really liked the "mp" that used in dicts and prefer the use of "self" better, but it is a gratuitous change that unilaterally reverses the decisions of the authors and makes the code not match any of the surrounding code that uses the prior conventions.

    • FWIW, the claim that the help is much better is specious. AFAICT, there has never been the slightest problem with "sorted(iterable, key=None, reverse=False) --> new sorted list" which has been clear since the day it was released. It is some of the new strings the are causing problems with users (my students frequently are tripped-up by the / notation for example; no one seems to be able to intuit what it means without it being explained first).

    • FWIW, I'm trying to be constructive and contribute where I can, but frankly I can't keep up with the volume of churn. Having seen bugs being introduced, it is not inappropriate to ask another dev to please be careful, especially when that dev has been prolific to an unprecedented degree and altering core parts of the language for function calls, to new opcodes, the memory allocators, etc. Very few people on the planet are competent to review these changes, make reasonable assessments about whether the complexity and churn are worth it. An fewer still have the time to keep up with the volume of changes.

    • Please do continue your efforts to improve the language, but also please moderate the rate of change, mitigate the addition complexity, value stability over micro-optimizations, consult the authors and maintainers of code, take special care without code that hasn't been reviewed because that lacks a safety net, and remember that newer devs may be taking cues from you (do you want them making extensive changes to long existing stable code without consulting the authors and with weak LGTM reviews?)

    @serhiy-storchaka
    Copy link
    Member Author

    While Python 3.5 doesn't crash, I consider that it has also the bug. So I
    added Python 3.5 in Versions.

    The patch uses new feature of 3.6 -- supporting positional-only parameters in
    PyArg_ParseTupleAndKeywords() (see bpo-26282). Since passing an iterable as
    a keyword argument never worked, it is safe to make the first parameter
    positional-only.

    Actually I think that argument parsing code of sorted() and list.sort() can be
    simplified, but this is separate issue. I tried to make the bugfix patch simple.

    @serhiy-storchaka serhiy-storchaka changed the title SystemError or crash in sorted(iterable=[]) SystemError or crash in sorted(iterable= Jan 20, 2017
    @serhiy-storchaka serhiy-storchaka changed the title SystemError or crash in sorted(iterable= SystemError or crash in sorted(iterable=[]) Jan 20, 2017
    @rhettinger
    Copy link
    Contributor

    Serhiy, this patch passes tests and looks fine. I think you can go ahead with this patch.

    @serhiy-storchaka
    Copy link
    Member Author

    Serhiy, this patch passes tests and looks fine. I think you can go ahead
    with this patch.

    Right now I'm building 3.6 with applied patch before final testing and
    committing. My netbook is very slow. :(

    @serhiy-storchaka serhiy-storchaka changed the title SystemError or crash in sorted(iterable=[]) SystemError or crash in sorted(iterable= Jan 20, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    The title is spoiled when reply by email. I suppose [] has special meaning in email subject.

    @serhiy-storchaka serhiy-storchaka changed the title SystemError or crash in sorted(iterable= SystemError or crash in sorted(iterable=[]) Jan 20, 2017
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 20, 2017

    New changeset 1827b64cfce8 by Serhiy Storchaka in branch '3.6':
    Issue bpo-29327: Fixed a crash when pass the iterable keyword argument to sorted().
    https://hg.python.org/cpython/rev/1827b64cfce8

    New changeset f44f44b14dfc by Serhiy Storchaka in branch 'default':
    Issue bpo-29327: Fixed a crash when pass the iterable keyword argument to sorted().
    https://hg.python.org/cpython/rev/f44f44b14dfc

    @vstinner
    Copy link
    Member

    "The patch uses new feature of 3.6 -- supporting positional-only
    parameters in PyArg_ParseTupleAndKeywords() (see bpo-26282)."

    Oh, I didn't recall that it's a new feature. It's a nice feature by
    the way, thanks for implementing it :-)

    In that case, yeah it's ok to leave Python 3.5 unchanged. I proposed
    to fix 3.5 because I expected that you could use exactly the same
    patch.

    @vstinner vstinner changed the title SystemError or crash in sorted(iterable=[]) SystemError or crash in sorted(iterable= Jan 20, 2017
    @vstinner
    Copy link
    Member

    Raymond Hettinger: "A few random thoughts that may or may not be helpful: (...)"

    I replied on the python-committers mailing list:
    https://mail.python.org/pipermail/python-committers/2017-January/004129.html

    I'm not sure that this specific issue is the best place to discuss, I propose to continue the discussion there.

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for your reviews Victor and Raymond.

    As for more general Raymond comments, I agreed with many of them in principle, but this isn't directly related to this issue. Victor opened a topic on the python-committers mailing list:

    https://mail.python.org/pipermail/python-committers/2017-January/004129.html

    @vstinner vstinner changed the title SystemError or crash in sorted(iterable= SystemError or crash in sorted(iterable=[]) Jan 20, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants