Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique #73546

Closed
vstinner opened this issue Jan 24, 2017 · 16 comments
Closed
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

BPO 29360
Nosy @vstinner, @methane, @serhiy-storchaka
Files
  • pystack_asdict.patch
  • pystack_asdict-2.patch
  • bug.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2017-01-24.15:24:20.202>
    created_at = <Date 2017-01-24.08:49:44.278>
    labels = ['interpreter-core', '3.7']
    title = "_PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique"
    updated_at = <Date 2017-01-24.15:24:20.201>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-01-24.15:24:20.201>
    actor = 'vstinner'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2017-01-24.15:24:20.202>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2017-01-24.08:49:44.278>
    creator = 'vstinner'
    dependencies = []
    files = ['46403', '46404', '46408']
    hgrepos = []
    issue_num = 29360
    keywords = ['patch']
    message_count = 16.0
    messages = ['286157', '286158', '286159', '286162', '286163', '286164', '286166', '286169', '286175', '286176', '286177', '286180', '286183', '286185', '286186', '286187']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'methane', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue29360'
    versions = ['Python 3.7']

    @vstinner
    Copy link
    Member Author

    While running the test suite on issues bpo-29259 (tp_fastcall) and bpo-29358 (tp_fastnew and tp_fastinit), a few test failed on the following assertion of _PyStack_AsDict():

    assert(PyUnicode_CheckExact(key));

    For example, test_dict calls dict(**{1: 2}) which must raise a TypeError. I'm not sure who is responsible to check if all keys are strings: _PyStack_AsDict(), PyArg_ParseXXX() or the final function?

    The check is already implemented in dict_init(), dict_update_common() calls PyArg_ValidateKeywordArguments():

    >>> dict(**{1:2})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: keyword arguments must be strings

    In Python 3.7, PyArg_ParseTupleAndKeywords(), _PyArg_ParseTupleAndKeywordsFast() and _PyArg_ParseStackAndKeywords() and PyArg_ValidateKeywordArguments() raise an error if a key of keyword argumens is not a string:

        if (!PyUnicode_Check(key)) {
            PyErr_SetString(PyExc_TypeError,
                            "keywords must be strings");
            return cleanreturn(0, &freelist);
        }

    IMHO the CALL_FUNCTION_EX instruction and the FASTCALL calling convention must not check if all keys are string: the check must only be done in the function (which can be a type constructor like dict_init()), for performance. Almost all functions use one the PyArg_ParseXXX() function. Only very special cases like dict_init() use more specific code to handle keyword arguments.

    By the way, _PyObject_FastCallKeywords() already contains the following comment:

    /* kwnames must only contains str strings, no subclass, and all keys must
       be unique: these checks are implemented in [Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c) and
       _PyArg_ParseStackAndKeywords(). */
    

    Note: Technically, I know that it's possible to put non-string keys in a dictionary and in a type dictionary. PyPy (and other Python implementations?) don't support non-string in type dictionary. Do you know use cases where func(**kwargs) must accept non-string keys?

    At the end, this long issue is a simple patch replacing an assertion with a comment in _PyStack_AsDict(): see attached patch ;-)

    @vstinner vstinner added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jan 24, 2017
    @methane
    Copy link
    Member

    methane commented Jan 24, 2017

    patch LGTM.

    Python 2 supported dict(**{1:3}). But I don't know there are any functions
    supporting non-string keyword argument.

    PyPy (and other Python implementations?) don't support non-string in type dictionary.

    Wow! I thought it's Python's language.
    If we can prohibit non string name in all namespace, there might be possible
    optimization.

    @serhiy-storchaka
    Copy link
    Member

    The assertion was valid while all keywords did came from a constant keywords tuple in CALL_FUNCTION_KW. But if the FASTCALL protocol is extended for var-keyword arguments, keywords can be arbitrary and the assertion can fail. The assertion on the next line can fail too.

    Calling _PyStack_AsDict() with non-string or non-unique keywords means that the var-keyword argument was first unpacked to arrays of keyword names and values and then converted back to a dict. Seems something is done non-efficiently.

    @vstinner
    Copy link
    Member Author

    Serhiy Storchaka: "The assertion was valid while all keywords did came from a constant keywords tuple in CALL_FUNCTION_KW. But if the FASTCALL protocol is extended for var-keyword arguments, keywords can be arbitrary and the assertion can fail. The assertion on the next line can fail too."

    Oh right, I didn't notice that. Since it's an issue of the tp_fastnew/tp_fastinit patch, I replied there:
    http://bugs.python.org/issue29358#msg286161

    @vstinner
    Copy link
    Member Author

    Serhiy Storchaka: "The assertion was valid while all keywords did came from a constant keywords tuple in CALL_FUNCTION_KW. But if the FASTCALL protocol is extended for var-keyword arguments, keywords can be arbitrary and the assertion can fail. The assertion on the next line can fail too."

    The second _PyStack_AsDict() assertion is:

    assert(PyDict_GetItem(kwdict, key) == NULL);

    This assertion fails if kwnames (tuple) argument of _PyStack_AsDict() contains duplicated keys, or at least two keys which have the same hash value and are equal.

    If someone pass kwnames with duplicates keys on purpose (for whatever) reasons, the assertion fails. I'm not sure that the assertion makes sense since it's a feature of Python to be able to replace dictionary values:

    >>> def func(**kw): print(kw)
    ... 
    >>> func(**{'x': 1, 'x': 2})
    {'x': 2}
    >>> {'x': 1, 'x': 2}
    {'x': 2}

    Are you suggesting to remove "assert(PyDict_GetItem(kwdict, key) == NULL);" too?

    Note: In Python 3.6, CALL_FUNCTION_EX already uses FASTCALL for C functions using METH_FASTCALL. In this case, _PyCFunction_FastCallDict() uses _PyStack_UnpackDict(). For example, open() already uses METH_FASTCALL in Python 3.6. In there an issue with _PyStack_UnpackDict()? (I don't think so.)

    @vstinner
    Copy link
    Member Author

    Victor> PyPy (and other Python implementations?) don't support non-string in type dictionary.
    INADA Naoki> Wow! I thought it's Python's language. If we can prohibit non string name in all namespace, there might be possible optimization.

    That's a different topic :-) It would be a backward incompatible change. Well... I don't think that anyone relies on this strange feature :-) IMHO it's more a side effect of the implementation, not a deliberate feature.

    @serhiy-storchaka
    Copy link
    Member

    We should either ensure that _PyStack_AsDict() is called only by CALL_FUNCTION_KW, and not by other way, or remove both assertions. I prefer the second option. They were here only for self-testing.

    @serhiy-storchaka
    Copy link
    Member

    func(**{'x': 1, 'x': 2})

    In this example the dictionary contains only one item. You need more complex example for making the assertion failing. Keys with non-constant hashes or equality, so first they are different, and later they become equal.

    @vstinner
    Copy link
    Member Author

    Attached pystack_asdict-2.patch removes the two assertions and rewrites _PyStack_AsDict() documentation to describe better how it handles non-string and duplicated keys (don't check keys type and merge duplicated keys).

    We should either ensure that _PyStack_AsDict() is called only by CALL_FUNCTION_KW, and not by other way, or remove both assertions. I prefer the second option. They were here only for self-testing.

    CALL_FUNTION_EX doesn't use _PyStack_AsDict() (at least, currently, I should carefully review my tp_fast{new,init,call} patches).

    When I wrote _PyStack_AsDict(), I added the GetItem==NULL assertion because I didn't know how to handle this corner case.

    For best performances, I'm in favor of not calling GetItem in _PyStack_AsDict() and so merge duplicated keys. But before removing the assertion, I wanted to make sure that it doesn't break the Python semantics.

    The problem is that I'm unable to find any concrete tcase in Python 3.6 nor Python 3.7 where it would be possible to call _PyStack_AsDict() with duplicate keys. See my analysis below.

    I agree to remove both assertion. I just checked, _PyStack_AsDict() already explicitly explain that keys must be strings and must be unique, but are not checked. My patch now also mentions that duplicated keys are merged.

    --

    (1) When a Python is called by _PyFunction_FastCallDict(), a TypeError is raised if an argument is passed as a positional argument and a keyword argument. Example: "def func(x): pass; func(1, x=2)".

    (2) If for some reasons, two keys of keyword arguments are equal (same hash value and are equals) but go into "**kwargs", the last value is kept: _PyEval_EvalCodeWithName() calls PyDict_SetItem() which replaces existing key to the new value. I failed to find an example here.

    (3) Calling other functions using kwargs dictionary from a FASTCALL function converts stack+kwnames arguments to kwargs dictionary using _PyStack_AsDict(). _PyCFunction_FastCallKeywords() implements that for the METH_VARARGS|METH_KEYWORDS calling convention. For example, round(3.14, ndigits=1) calls _PyCFunction_FastCallKeywords() with kwnames=('ndigits',) and args=[3.14, 1], _PyStack_AsDict() creates {'ndigits': 1} dictionary.

    The tricky question is how (3) handles duplicated keys in dictionary.

    "round(3.14, ndigits=1)" uses CALL_FUNCTION_KW which creates kwnames=('ndigits,') constant tuple. The compiler raises a SyntaxError if a function is called with a duplicate key. Without modifying a code object, I don't see any obvious way to pass duplicate keys in kwnames to CALL_FUNCTION_KW.

    CALL_FUNCTION_EX expects a dictionary on the stack. For METH_VARARGS|METH_KEYWORDS C functions, the dictionary is passed through, there is not conversion.

    If CALL_FUNCTION_EX gets a mapping which is not exactly the dict type, PyDict_New() + PyDict_Update() is used to convert the mapping into a regular dict. This function merges duplicated keys.

    Without modifying code objects nor writing buggy C code using _PyObject_FastCallKeywords(), I'm not sure that it's possible to pass duplicated keys to _PyStack_AsDict(). It seems that the issue is more theorical, no?

    @vstinner vstinner changed the title Don't check if all keys are strings in _PyStack_AsDict() _PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique Jan 24, 2017
    @serhiy-storchaka
    Copy link
    Member

    Try following keys:

    class Key:
        def __eq__(self, other):
            return self is other or random.getrandbits(1)
        def __hash__(self):
            return random.getrandbits(1)

    @vstinner
    Copy link
    Member Author

    Serhiy Storchaka: "Try following keys: (...)"

    To which function am I supposed to pass such keys? I was unable to find a simple way how to pass arbitrary keys to _PyStack_AsDict().

    @serhiy-storchaka
    Copy link
    Member

    I don't know. To that that is crashed when pass non-string keys.

    @vstinner
    Copy link
    Member Author

    I don't know. To that that is crashed when pass non-string keys.

    Oh ok. I found a way to create a dictionary with equal keys and pass it to dict(**kw) in my fast_init branch (I already removed the PyUnicode_Check() asssertion in this branch):

    haypo@selma$ ./python bug.py
    call with: {'a': 1, 'b': 2}
    python: Objects/abstract.c:2471: _PyStack_AsDict: Assertion `PyDict_GetItem(kwdict, key) == NULL' failed.
    Aborted

    So yes, with special objects, the PyDict_GetItem()==NULL assertion fails. With pystack_asdict-2.patch, the duplicated (equal) keys are merged as expected:

    $ ./python bug.py 
    call with: {'a': 1, 'b': 2}
    got: {'a': 2}

    Again, there is also a bug in my fast_init branch, I should avoid double conversion dict=>kwnames=>dict to call a type.

    @serhiy-storchaka
    Copy link
    Member

    pystack_asdict-2.patch LGTM.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 24, 2017

    New changeset 2d2210b36b25 by Victor Stinner in branch 'default':
    Issue bpo-29360: _PyStack_AsDict() doesn't check kwnames
    https://hg.python.org/cpython/rev/2d2210b36b25

    @vstinner
    Copy link
    Member Author

    Thanks Serhiy and Naoki. I really wanted more eyes on this weird issue :-)

    It's also a reminder that we should be careful when converting keyword arguments, especially when duplicates are not allowed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants