Title: argparse.Namespace __repr__ does not handle reserved keywords
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: gousaiyang, rhettinger
Priority: normal Keywords:

Created on 2020-04-12 19:46 by gousaiyang, last changed 2020-04-13 08:06 by rhettinger. This issue is now closed.

Messages (4)
msg366265 - (view) Author: Saiyang Gou (gousaiyang) * Date: 2020-04-12 19:46
It is generally a convention to design the repr string such that `eval(repr(obj))` can reproduce the object. Issue 24360 handles the special situation when arg name passed to `argparse.Namespace` is not a valid identifier:

>>> from argparse import Namespace
>>> Namespace(**{')': 3})
Namespace(**{')': 3})

However there is one more corner case to handle: the arg name could be a reserved keyword.

>>> Namespace(**{'return': 3})
>>> Namespace(return=3)
  File "<stdin>", line 1
SyntaxError: invalid syntax

I noticed that the documentation of `str.isidentifier` tells me to also check for keywords with `keyword.iskeyword`. However `__debug__` is not considered a keyword by `keyword.iskeyword`, but cannot be assigned to:

>>> keyword.iskeyword('__debug__')
>>> Namespace(**{'__debug__':3})
>>> Namespace(__debug__=3)
  File "<stdin>", line 1
SyntaxError: cannot assign to __debug__

I propose to enhance the arg name check in `argparse._AttributeHolder` from `if name.isidentifier():` to `if name.isidentifier() and not keyword.iskeyword(name) and name != '__debug__:'` to fix this. However this may slow down the argparse library since it will need to `import keyword` at the beginning, and `__repr__` will also be slower due to the added arg name checks.

By the way, when I came across issue 39076, I noticed that `types.SimpleNamespace` is not considering this problem at all:

>>> types.SimpleNamespace(**{')': 1, 'return': 2})
namespace()=1, return=2)
msg366282 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-13 01:55
ISTM this a purely theoretical problem since "ns.return" is already a syntax error.

The primary function of the Namespace class is to hold valid attributes and to allow the dot operator to access those attributes.  Handling invalid attribute names is beyond its scope.
msg366284 - (view) Author: Saiyang Gou (gousaiyang) * Date: 2020-04-13 05:39
> The primary function of the Namespace class is to hold valid attributes and to allow the dot operator to access those attributes.

I acknowledge this. Invalid attribute names are not useful in production. However, some crazy people like me would define arguments like these, producing an repr string that does not round-trip:

>>> from argparse import ArgumentParser
>>> parser = ArgumentParser()
>>> parser.add_argument('-8')
>>> parser.add_argument('-@')
>>> parser.add_argument('--return')
>>> parser.parse_args(['-8', 'y', '-@', '', '--return', 'foo'])
Namespace(return='foo', **{'8': 'y', '@': ''})

(I know I can use the `dest` argument to define an alternative name to store in the namespace.)

The reason why I open this issue is that I see issue 24360 already tried to improve the repr string for invalid names by using the `isidentifier` check. (Which is almost "as unuseful as this issue" for production use, since repr is almost only used for development instead of production.) The original author thought the improvement was enough to be round-trippable but missed the special case of keywords, and my patch will fix this.
msg366286 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-13 08:06
Saiyang, thank you for the suggestion, but I'm going to decline.  The other referenced issue had to do with readability of otherwise incomprehensible output and that's what made it worthwhile.  In this issue, you're prioritizing round-tripping at the expense of readability.  To me, that is a step backwards.  Since people don't normally run eval() on the repr() of a Namespace, round-tripping should take a backseat to readability.  And, as you noted, there is already an option to set *dest* if needed.
Date User Action Args
2020-04-13 08:06:48rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg366286

stage: resolved
2020-04-13 05:39:37gousaiyangsetmessages: + msg366284
2020-04-13 01:55:29rhettingersetassignee: rhettinger

messages: + msg366282
nosy: + rhettinger
2020-04-12 19:46:18gousaiyangcreate