classification
Title: _PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique
Type: Stage: commit review
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: haypo Nosy List: haypo, inada.naoki, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-01-24 08:49 by haypo, last changed 2017-01-24 15:24 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
pystack_asdict.patch haypo, 2017-01-24 08:49 review
pystack_asdict-2.patch haypo, 2017-01-24 11:18 review
bug.py haypo, 2017-01-24 13:33
Messages (16)
msg286157 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 08:49
While running the test suite on issues #29259 (tp_fastcall) and #29358 (tp_fastnew and tp_fastinit), a few test failed on the following assertion of _PyStack_AsDict():

   assert(PyUnicode_CheckExact(key));

For example, test_dict calls dict(**{1: 2}) which must raise a TypeError. I'm not sure who is responsible to check if all keys are strings: _PyStack_AsDict(), PyArg_ParseXXX() or the final function?


The check is already implemented in dict_init(), dict_update_common() calls PyArg_ValidateKeywordArguments():

>>> dict(**{1:2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings


In Python 3.7, PyArg_ParseTupleAndKeywords(), _PyArg_ParseTupleAndKeywordsFast() and _PyArg_ParseStackAndKeywords() and PyArg_ValidateKeywordArguments() raise an error if a key of keyword argumens is not a string:

    if (!PyUnicode_Check(key)) {
        PyErr_SetString(PyExc_TypeError,
                        "keywords must be strings");
        return cleanreturn(0, &freelist);
    }


IMHO the CALL_FUNCTION_EX instruction and the FASTCALL calling convention must not check if all keys are string: the check must only be done in the function (which can be a type constructor like dict_init()), for performance. Almost all functions use one the PyArg_ParseXXX() function. Only very special cases like dict_init() use more specific code to handle keyword arguments.

By the way, _PyObject_FastCallKeywords() already contains the following comment:

    /* kwnames must only contains str strings, no subclass, and all keys must
       be unique: these checks are implemented in Python/ceval.c and
       _PyArg_ParseStackAndKeywords(). */


Note: Technically, I know that it's possible to put non-string keys in a dictionary and in a type dictionary. PyPy (and other Python implementations?) don't support non-string in type dictionary. Do you know use cases where func(**kwargs) must accept non-string keys?


At the end, this long issue is a simple patch replacing an assertion with a comment in _PyStack_AsDict(): see attached patch ;-)
msg286158 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-01-24 09:02
patch LGTM.

Python 2 supported `dict(**{1:3})`.  But I don't know there are any functions
supporting non-string keyword argument.

> PyPy (and other Python implementations?) don't support non-string in type dictionary.

Wow! I thought it's Python's language.
If we can prohibit non string name in all namespace, there might be possible
optimization.
msg286159 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 09:06
The assertion was valid while all keywords did came from a constant keywords tuple in CALL_FUNCTION_KW. But if the FASTCALL protocol is extended for var-keyword arguments, keywords can be arbitrary and the assertion can fail. The assertion on the next line can fail too.

Calling _PyStack_AsDict() with non-string or non-unique keywords means that the var-keyword argument was first unpacked to arrays of keyword names and values and then converted back to a dict. Seems something is done non-efficiently.
msg286162 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 09:41
Serhiy Storchaka: "The assertion was valid while all keywords did came from a constant keywords tuple in CALL_FUNCTION_KW. But if the FASTCALL protocol is extended for var-keyword arguments, keywords can be arbitrary and the assertion can fail. The assertion on the next line can fail too."

Oh right, I didn't notice that. Since it's an issue of the tp_fastnew/tp_fastinit patch, I replied there:
http://bugs.python.org/issue29358#msg286161
msg286163 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 09:42
Serhiy Storchaka: "The assertion was valid while all keywords did came from a constant keywords tuple in CALL_FUNCTION_KW. But if the FASTCALL protocol is extended for var-keyword arguments, keywords can be arbitrary and the assertion can fail. The assertion on the next line can fail too."

The second _PyStack_AsDict() assertion is:

   assert(PyDict_GetItem(kwdict, key) == NULL);

This assertion fails if kwnames (tuple) argument of _PyStack_AsDict() contains duplicated keys, or at least two keys which have the same hash value and are equal.

If someone pass kwnames with duplicates keys on purpose (for whatever) reasons, the assertion fails. I'm not sure that the assertion makes sense since it's a feature of Python to be able to replace dictionary values:

>>> def func(**kw): print(kw)
... 
>>> func(**{'x': 1, 'x': 2})
{'x': 2}
>>> {'x': 1, 'x': 2}
{'x': 2}

Are you suggesting to remove "assert(PyDict_GetItem(kwdict, key) == NULL);" too?


Note: In Python 3.6, CALL_FUNCTION_EX already uses FASTCALL for C functions using METH_FASTCALL. In this case, _PyCFunction_FastCallDict() uses _PyStack_UnpackDict(). For example, open() already uses METH_FASTCALL in Python 3.6. In there an issue with _PyStack_UnpackDict()? (I don't think so.)
msg286164 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 09:45
Victor> PyPy (and other Python implementations?) don't support non-string in type dictionary.
INADA Naoki> Wow! I thought it's Python's language. If we can prohibit non string name in all namespace, there might be possible optimization.

That's a different topic :-) It would be a backward incompatible change. Well... I don't think that anyone relies on this strange feature :-) IMHO it's more a side effect of the implementation, not a deliberate feature.
msg286166 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 09:52
We should either ensure that _PyStack_AsDict() is called only by CALL_FUNCTION_KW, and not by other way, or remove both assertions. I prefer the second option. They were here only for self-testing.
msg286169 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 09:57
> func(**{'x': 1, 'x': 2})

In this example the dictionary contains only one item. You need more complex example for making the assertion failing. Keys with non-constant hashes or equality, so first they are different, and later they become equal.
msg286175 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 11:18
Attached pystack_asdict-2.patch removes the two assertions and rewrites _PyStack_AsDict() documentation to describe better how it handles non-string and duplicated keys (don't check keys type and merge duplicated keys).


> We should either ensure that _PyStack_AsDict() is called only by CALL_FUNCTION_KW, and not by other way, or remove both assertions. I prefer the second option. They were here only for self-testing.

CALL_FUNTION_EX doesn't use _PyStack_AsDict() (at least, currently, I should carefully review my tp_fast{new,init,call} patches).

When I wrote _PyStack_AsDict(), I added the GetItem==NULL assertion because I didn't know how to handle this corner case.

For best performances, I'm in favor of not calling GetItem in _PyStack_AsDict() and so merge duplicated keys. But before removing the assertion, I wanted to make sure that it doesn't break the Python semantics.

The problem is that I'm unable to find any concrete tcase in Python 3.6 nor Python 3.7 where it would be possible to call _PyStack_AsDict() with duplicate keys. See my analysis below.

I agree to remove both assertion. I just checked, _PyStack_AsDict() already explicitly explain that keys must be strings and must be unique, but are not checked. My patch now also mentions that duplicated keys are merged.

--

(1) When a Python is called by _PyFunction_FastCallDict(), a TypeError is raised if an argument is passed as a positional argument and a keyword argument. Example: "def func(x): pass; func(1, x=2)".

(2) If for some reasons, two keys of keyword arguments are equal (same hash value and are equals) but go into "**kwargs", the last value is kept: _PyEval_EvalCodeWithName() calls PyDict_SetItem() which replaces existing key to the new value. I failed to find an example here.

(3) Calling other functions using kwargs dictionary from a FASTCALL function converts stack+kwnames arguments to kwargs dictionary using _PyStack_AsDict(). _PyCFunction_FastCallKeywords() implements that for the METH_VARARGS|METH_KEYWORDS calling convention. For example, round(3.14, ndigits=1) calls _PyCFunction_FastCallKeywords() with kwnames=('ndigits',) and args=[3.14, 1], _PyStack_AsDict() creates {'ndigits': 1} dictionary.


The tricky question is how (3) handles duplicated keys in dictionary.


"round(3.14, ndigits=1)" uses CALL_FUNCTION_KW which creates kwnames=('ndigits,') constant tuple. The compiler raises a SyntaxError if a function is called with a duplicate key. Without modifying a code object, I don't see any obvious way to pass duplicate keys in kwnames to CALL_FUNCTION_KW.

CALL_FUNCTION_EX expects a dictionary on the stack. For METH_VARARGS|METH_KEYWORDS C functions, the dictionary is passed through, there is not conversion.

If CALL_FUNCTION_EX gets a mapping which is not exactly the dict type, PyDict_New() + PyDict_Update() is used to convert the mapping into a regular dict. This function merges duplicated keys.


Without modifying code objects nor writing buggy C code using _PyObject_FastCallKeywords(), I'm not sure that it's possible to pass duplicated keys to _PyStack_AsDict(). It seems that the issue is more theorical, no?
msg286176 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 11:27
Try following keys:

class Key:
    def __eq__(self, other):
        return self is other or random.getrandbits(1)
    def __hash__(self):
        return random.getrandbits(1)
msg286177 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 11:42
Serhiy Storchaka: "Try following keys: (...)"

To which function am I supposed to pass such keys? I was unable to find a simple way how to pass arbitrary keys to _PyStack_AsDict().
msg286180 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 12:02
I don't know. To that that is crashed when pass non-string keys.
msg286183 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 13:33
> I don't know. To that that is crashed when pass non-string keys.

Oh ok. I found a way to create a dictionary with equal keys and pass it to dict(**kw) in my fast_init branch (I already removed the PyUnicode_Check() asssertion in this branch):

haypo@selma$ ./python bug.py 
call with: {'a': 1, 'b': 2}
python: Objects/abstract.c:2471: _PyStack_AsDict: Assertion `PyDict_GetItem(kwdict, key) == NULL' failed.
Aborted


So yes, with special objects, the PyDict_GetItem()==NULL assertion fails. With pystack_asdict-2.patch, the duplicated (equal) keys are merged as expected:

$ ./python bug.py 
call with: {'a': 1, 'b': 2}
got: {'a': 2}

Again, there is also a bug in my fast_init branch, I should avoid double conversion dict=>kwnames=>dict to call a type.
msg286185 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 13:39
pystack_asdict-2.patch LGTM.
msg286186 - (view) Author: Roundup Robot (python-dev) Date: 2017-01-24 14:09
New changeset 2d2210b36b25 by Victor Stinner in branch 'default':
Issue #29360: _PyStack_AsDict() doesn't check kwnames
https://hg.python.org/cpython/rev/2d2210b36b25
msg286187 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-24 14:10
Thanks Serhiy and Naoki. I really wanted more eyes on this weird issue :-)

It's also a reminder that we should be careful when converting keyword arguments, especially when duplicates are not allowed.
History
Date User Action Args
2017-01-24 15:24:20hayposetstatus: open -> closed
resolution: fixed
2017-01-24 14:10:46hayposetmessages: + msg286187
2017-01-24 14:09:14python-devsetnosy: + python-dev
messages: + msg286186
2017-01-24 13:39:41serhiy.storchakasetassignee: haypo
messages: + msg286185
stage: commit review
2017-01-24 13:33:42hayposetfiles: + bug.py

messages: + msg286183
2017-01-24 12:02:08serhiy.storchakasetmessages: + msg286180
2017-01-24 11:42:16hayposetmessages: + msg286177
2017-01-24 11:27:07serhiy.storchakasetmessages: + msg286176
2017-01-24 11:18:41hayposettitle: Don't check if all keys are strings in _PyStack_AsDict() -> _PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique
2017-01-24 11:18:15hayposetfiles: + pystack_asdict-2.patch

messages: + msg286175
2017-01-24 09:57:27serhiy.storchakasetmessages: + msg286169
2017-01-24 09:52:50serhiy.storchakasetmessages: + msg286166
2017-01-24 09:45:59hayposetmessages: + msg286164
2017-01-24 09:42:35hayposetmessages: + msg286163
2017-01-24 09:41:41hayposetmessages: + msg286162
2017-01-24 09:06:31serhiy.storchakasetmessages: + msg286159
2017-01-24 09:02:13inada.naokisetmessages: + msg286158
2017-01-24 08:49:51hayposetnosy: + inada.naoki, serhiy.storchaka
2017-01-24 08:49:44haypocreate