This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Incorrect dictionary unpacking when calling str.format
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Akos Kiss, ammar2, eric.smith, iritkatriel, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2020-02-20 10:03 by Akos Kiss, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (14)
msg362304 - (view) Author: Akos Kiss (Akos Kiss) Date: 2020-02-20 10:03
My understanding was that in function calls, the keys in an **expression had to be strings. However, str.format seems to deviate from that and allows non-string keys in the mapping (and silently ignores them).

Please, see the transcript below:

>>> def f(): pass
... 
>>> def g(): pass
... 
>>> x = {None: ''}
>>> y = {1: ''}
>>> f(**x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() keywords must be strings
>>> f(**y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() keywords must be strings
>>> g(**x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: g() keywords must be strings
>>> g(**y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: g() keywords must be strings
>>> ''.format(**x)
''
>>> ''.format(**y)
''

I could reproduce this (incorrect?) behavior on macOS with python 3.4-3.7 and on Ubuntu 18.04 with python 3.6.
msg362306 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2020-02-20 10:48
Can re-create, this is because the **kwargs handling inside unicode format only performs a lookup when needed https://github.com/python/cpython/blob/c4cacc8c5eab50db8da3140353596f38a01115ca/Objects/stringlib/unicode_format.h#L393-L424 and doesn't eagerly check it like here https://github.com/python/cpython/blob/ffd9753a944916ced659b2c77aebe66a6c9fbab5/Objects/call.c#L985-L1010

This might not be worth fixing though because of the very common pattern of 

  "{abcd}".format(**locals())

This would take a pretty dramatic performance hit for a relatively uncommon edge case.
msg362318 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-02-20 14:04
Is this causing some practical problem? I can't think of any way to reference non-string kwargs in a format string, but maybe I'm not being creative enough. And if there is such a way, then changing this wouldn't be backward compatible.
msg362320 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-20 14:16
See similar issue for SimpleNamespace: https://bugs.python.org/issue31655.

We should add a similar check in str.format(). Accepting a non-string keys is an implementation detail. There is no guarantee that it works in other implementations.

It should not hit performance, because the kwargs dict is usually small, the check is cheap, and formatting itself can be expensive. For the case when you need to pass a huge dict and use only few items from it, you can use str.format_map().
msg362322 - (view) Author: Akos Kiss (Akos Kiss) Date: 2020-02-20 14:43
Re: Eric

I had a code where `'sometemplate'.format()` got a dictionary from outside (as a parameter, not from `locals()`), and that dictionary had `None` as a key. Actually, I wasn't even aware of that `None` key until I tried to execute the same code in PyPy where it failed with a `TypeError`. That's when I started to dig down to the root of the incompatibility.

Now, my code has a workaround to have an `if key is not None` filter in the comprehension that constructs the dict. (I'm not sure whether it can be called a workaround, since this is how it should have been written in the first place. It turns out I was just (ab)using an implementation detail / deviation.)

So much about real-world use cases.

There is one more use case that comes to my mind, but it is admittedly theoretical (for now, at least). If I ever wanted to patch/wrap/mock `str.format` then the wrapped version would behave differently from the built-in version (throw an error when the original does not). But this is hypothetical at the moment because I "can't set attributes of built-in/extension type 'str'" and don't have the time to experiment with complex mocking/patching/wrapping approaches.
msg362326 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-02-20 15:57
Okay, then I agree with Serhiy on the change.

And I agree that .format(**locals()) is an anti-pattern, and should use .format_map(locals()) instead. Not that it's related directly to this issue, I just like to point it out where I can.
msg362327 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-02-20 16:02
I'm with Eric and don't see this as a problem in practice.

Also, the kwarg handling for functions is fundamentally different because all arguments need to be matched.  In contrast, the kwargs for format() are only used on-demand:

    >>> '{found}'.format(found='used', not_found='ignored')
    'used'

I recommend this be left as-is.
msg362331 - (view) Author: Akos Kiss (Akos Kiss) Date: 2020-02-20 16:30
I couldn't find any discussion in the language reference about fundamental differences between calls to built-in functions and user-defined functions. (I tried to read Sections 6.3.4. "Calls" and 8.6. "Function definitions" carefully.) Until now I had the impression that built-in and user-defined functions should be as similar as possible.

BTW, a user-defined function can also implement support for unused/excess keyword arguments (or arguments used on-demand): it only has to add **kwargs at the end of its signature. And that's exactly how the documentation specifies the signature of format: str.format(*args, **kwargs). That API documentation signals (to me at least) that this function should be called just like any other function that has a (*args, **kwargs) signature, be it built-in of user-defined.
msg362404 - (view) Author: Akos Kiss (Akos Kiss) Date: 2020-02-21 12:03
I've come up with some extra examples (use cases?):

```py
import collections

class str2(str):
    def format(self, *args, **kwargs):
        return super().format(*args, **kwargs)

def unpacknonekey(s):
    print('input:', type(s), s)
    try:
        print('str key:', s.format(**{'bar': 'qux'}))
        print('none key:', s.format(**{'bar': 'qux', None: ''}))
    except TypeError as e:
        print('error:', e)

template = 'foo {bar} baz'
unpacknonekey(template)
unpacknonekey(str2(template))
unpacknonekey(collections.UserString(template))
```

The above script gives the following output:

```
input: <class 'str'> foo {bar} baz
str key: foo qux baz
none key: foo qux baz
input: <class '__main__.str2'> foo {bar} baz
str key: foo qux baz
error: format() keywords must be strings
input: <class 'collections.UserString'> foo {bar} baz
str key: foo qux baz
error: format() keywords must be strings
```

This shows inconsistency between `format` of `str` and subclasses of `str` or the standard library's `UserString`.
msg408047 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-08 22:06
Reproduced on 3.11.

>>> x = {None: ''}
>>> ''.format(**x)
''
>>> '{x}'.format(**x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'x'
msg408051 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-08 22:43
In most cases the test is cheap even for large dict (it has constant complexity if the dict never contained non-string keys).
msg408067 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-12-09 02:41
IMO, there is no actual problem being solved here.  Instead there is just a concern that something doesn't feel right.   Given that there is no problem in practice, I recommend closing this rather than cluttering docs, tests, or the C code for a non-issue.  

The format() method looksup keywords on demand and it can only lookup strings.  Anything not looked up is ignored.  We have a long history of that working out just fine:

     >>> 'The answer is %(answer)s.' % \
         {'answer': 'correct', 10: 'never used'}
     'The answer is correct.'

     >>> 'The answer is {answer}.'.format(
          **{'answer': 'correct', 10: 'never used'})
     'The answer is correct.'

     >>> 'The answer is {answer}.'.format_map(
         {'answer': 'correct', 10: 'never used'})
     'The answer is correct.

One could argue that making any of the above raise an error for a non-string in a dict is backwards incompatible and would only serve to break code that is already working fine.

I'm going to close this one.  If another core dev feels strongly that this is a problem in practice, go ahead and reopen it.  There was a similar change to SimpleNamespace but arguably that shouldn't have been done either, but at least it had much less risk of breaking existing code that has worked fine for over a decade.
msg408068 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-12-09 02:55
I concur with Raymond.
msg408101 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-09 10:07
I disagree, but does not insists if other core developers have other opinion.

This is a special case which breaks the rules. The Python language specification does not support non-string keywords. You cannot implement this "feature" in Python. And it is not compatible with other implementations (PyPy):

>>>> ''.format(**{1:2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: format() keywords must be strings, not 'int'

It would be understandable to not enforce this limitation if it has significant costs. But in this case it costs almost nothing.
History
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83875
2021-12-09 10:07:38serhiy.storchakasetmessages: + msg408101
2021-12-09 02:55:45eric.smithsetmessages: + msg408068
2021-12-09 02:41:39rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg408067

stage: resolved
2021-12-08 22:43:29serhiy.storchakasetmessages: + msg408051
2021-12-08 22:06:20iritkatrielsetversions: + Python 3.11, - Python 3.5, Python 3.6, Python 3.7
nosy: + iritkatriel

messages: + msg408047

components: + Library (Lib)
2020-03-02 17:40:24vstinnersetnosy: - vstinner
2020-02-21 12:03:28Akos Kisssetmessages: + msg362404
2020-02-20 16:30:09Akos Kisssetmessages: + msg362331
2020-02-20 16:02:49rhettingersetnosy: + rhettinger
messages: + msg362327
2020-02-20 15:57:34eric.smithsetmessages: + msg362326
2020-02-20 14:43:09Akos Kisssetmessages: + msg362322
2020-02-20 14:16:16serhiy.storchakasetmessages: + msg362320
2020-02-20 14:04:57eric.smithsetmessages: + msg362318
2020-02-20 10:48:15ammar2setnosy: + vstinner, serhiy.storchaka, ammar2, eric.smith
messages: + msg362306
2020-02-20 10:03:23Akos Kisscreate