classification
Title: kwargs regression when there are multiple entries with the same key
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ammar2, iceboy, remi.lapeyre, serhiy.storchaka, terry.reedy, xtreak
Priority: normal Keywords: patch, patch, patch

Created on 2019-01-02 01:04 by iceboy, last changed 2019-01-12 08:13 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11438 merged serhiy.storchaka, 2019-01-05 16:42
PR 11438 merged serhiy.storchaka, 2019-01-05 16:42
PR 11438 merged serhiy.storchaka, 2019-01-05 16:42
Messages (8)
msg332849 - (view) Author: iceboy (iceboy) * Date: 2019-01-02 01:04
Using the multidict package on pypi to illustrate the problem.

Python 3.5.3 (default, Sep 27 2018, 17:25:39) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multidict
>>> d = multidict.CIMultiDict([('a', 1), ('a', 2)])
>>> def foo(**kwargs): pass
... 
>>> foo(**d)
>>> foo(**{}, **d)

Python 3.6.7 (default, Oct 21 2018, 08:08:16) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multidict
>>> d = multidict.CIMultiDict([('a', 1), ('a', 2)])
>>> def foo(**kwargs): pass
... 
>>> foo(**d)
>>> foo(**{}, **d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() got multiple values for keyword argument 'a'

(1) foo(**d)
(2) foo(**{}, **d)

(1) works fine in both versions but (2) only works in Python 3.5 but raises TypeError in Python 3.6. This should be a regression. We should either make both expressions work or raises error.
msg332859 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2019-01-02 08:03
This change in difference is caused by https://github.com/python/cpython/commit/e036ef8fa29f27d57fe9f8cef8d931d4122d8223

The old code checked for duplicate arguments by essentially running `set().intersection(d)` and since `set().intersection(['a', 'a'])` is the empty set, it doesn't register as a duplicated argument. The newer code iterates over the keys in order to merge the dictionaries.

Note however that 3.5 is now is in security only mode: https://devguide.python.org/#branchstatus so its unlikely this behavior will be back-ported.
msg332860 - (view) Author: iceboy (iceboy) * Date: 2019-01-02 10:07
I feel like we should not check the argument and allow overriding. If the argument checking is desired, can we also check when there is only a single kwargs? Currently `foo(**d)` still works in Python 3.6 with duplicated keys.
msg332874 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-01-02 13:05
I prepared a patch to make the check when there is only one dict, it may be the less surprising change.

BTW, I got lost when looking for the root cause  of this change because grepping for the error message did not work as it has line breaks:

-                            PyErr_Format(PyExc_TypeError,
-                                    "%.200s%.200s got multiple "
-                                    "values for keyword argument '%U'",
-                                    PyEval_GetFuncName(func),
-                                    PyEval_GetFuncDesc(func),
-                                    key);

Can we add a note like https://www.kernel.org/doc/html/v4.12/process/coding-style.html#breaking-long-lines-and-strings to the PEP7?

> However, never break user-visible strings such as printk messages, because that breaks the ability to grep for them.

It would help in such situations.
msg333002 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-01-04 21:17
Serhiy, nosying you because Ammar identified your commit as relevant.
https://github.com/python/cpython/commit/e036ef8fa29f27d57fe9f8cef8d931d4122d8223
---

3.6 is also security-fix only.

Normally, code bug reports need a minimal, reproducible, initially-failing test case that only uses the stdlib, not 3rd party code.  A test case would have to include a simplified version of a multidict.

The problem here is that a multidict with duplicate keys is not a proper mapping, in spite of having a MutableMapping interface.  (Just curious, what does d['a'] return?). 

The purpose of the package is to meet the needs of HTTP Headers and URL query strings (and other situations) where the 'keys' are value-type tags, not true mapping keys.  Another example would be a bibliography entry that tags each of multiple author names with 'Author:'.  (Aside: A deficiency of git (github) is allowing only 1 author key and only one github username as the value.)

Is an object with duplicate keys legal for **expression in calls?
https://docs.python.org/3/reference/expressions.html#index-48
says that expression must evaluate to a 'mapping'.  The glossary entry
https://docs.python.org/3/glossary.html#term-mapping says
"A container object that supports arbitrary key lookups and implements the methods specified in the Mapping or MutableMapping abstract base classes." followed by unique-key examples.  To me, 'key lookup' implies unique keys.  The Mapping functions include 'keys' and 'items'.  What signature?  To return set-like views, the keys should be unique.

If we take the call to be a bug, is CPython *obligated* to immediately raise an exception?  In other words, must every call with **mapping take the time to check for duplicates because someone might pass a dup-key 'mapping'.
msg333050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-05 08:54
Ammar is right. This is a consequence of issue27358. Issue27358 was merely an optimization necessary for bytecode changes in 3.6. It was not noticed that it changes also the behavior for multidict kwargs. But I think that it was correct change. Specifying multiple keyword arguments with the same name is an error, and it does not matter whether names are duplicated in different kwargs or in the same kwarg. I think it is worth to ensure that duplicated names are forbidden in the case of a single kwarg too. Since kwarg should be converted to a dict, this check should not add significant overhead.

3.5 and 3.6 are in security-only fixes mode. And I think it is not worth to do such changes in 3.7. This could break working user code (even if can be considered incorrect), and we should avoid a breakage in bugfix releases without need.

I take it.
msg333053 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-01-05 12:15
> (Just curious, what does d['a'] return?)

I was curious too and some results

$ python
Python 3.7.1rc2 (v3.7.1rc2:6c06ef7dc3, Oct 13 2018, 05:10:29)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import multidict
>>> d = multidict.CIMultiDict([('a', 1), ('a', 2)])
>>> d['a']
1
>>> d.keys()
_KeysView('a', 'a')
>>> d.values()
_ValuesView(1, 2)
>>> d.items()
_ItemsView('a': 1, 'a': 2)
>>> dict(d)
{'a': 1}

In the original issue where PEP 448 was implemented there were some discussions around duplicates in kwargs. msg234413 for Guido's call on duplicates and the messages below discuss some more scenarios about overriding/rejecting duplicates. PEP 448 also has a note on duplicates in https://www.python.org/dev/peps/pep-0448/#specification
msg333522 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-12 08:12
New changeset f1ec3cefad4639797c37eaa8c074830188fa0a44 by Serhiy Storchaka in branch 'master':
bpo-35634: Raise an error when first passed kwargs contains duplicated keys. (GH-11438)
https://github.com/python/cpython/commit/f1ec3cefad4639797c37eaa8c074830188fa0a44
History
Date User Action Args
2019-01-12 08:13:21serhiy.storchakasetkeywords: patch, patch, patch
status: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-01-12 08:12:26serhiy.storchakasetmessages: + msg333522
2019-01-05 16:42:58serhiy.storchakasetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request10877
2019-01-05 16:42:46serhiy.storchakasetkeywords: + patch
stage: test needed -> test needed
pull_requests: + pull_request10876
2019-01-05 16:42:35serhiy.storchakasetkeywords: + patch
stage: test needed -> test needed
pull_requests: + pull_request10875
2019-01-05 12:15:27xtreaksetnosy: + xtreak
messages: + msg333053
2019-01-05 08:54:06serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg333050
components: + Interpreter Core
versions: - Python 3.7
2019-01-04 21:17:23terry.reedysetversions: + Python 3.7, Python 3.8, - Python 3.6
nosy: + serhiy.storchaka, terry.reedy

messages: + msg333002

stage: test needed
2019-01-02 13:05:44remi.lapeyresetnosy: + remi.lapeyre
messages: + msg332874
2019-01-02 10:07:28iceboysetmessages: + msg332860
2019-01-02 08:03:07ammar2settype: behavior

messages: + msg332859
nosy: + ammar2
2019-01-02 01:04:33iceboycreate