Issue 35297: untokenize documentation is not correct

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/79478

classification

Title:	untokenize documentation is not correct
Type:		Stage:
Components:	Documentation	Versions:	Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	Zachary McCord, csernazs, docs@python, donovick, utkarsh2102
Priority:	normal	Keywords:

Created on 2018-11-22 21:25 by csernazs, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg330281 - (view)	Author: Zsolt Cserna (csernazs) *	Date: 2018-11-22 21:25
untokenize documentation (https://docs.python.org/3/library/tokenize.html#tokenize.untokenize) states the following: """ Converts tokens back into Python source code. The iterable must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored. """ This last sentence is clearly not true because here: https://github.com/python/cpython/blob/master/Lib/tokenize.py#L242 The code checks for the length of the input token there, and the code behaves differently, in terms of whitespace, when an iterator of 2-tuples are given and when there are more elements in the tuple. When there are more elements in the tuple, the function renders whitespaces as the same as they were present in the original source. So this code: tokenize.untokenize(tokenize.tokenize(source.readline)) And this: tokenize.untokenize([x[:2] for x in tokenize.tokenize(source.readline)]) Have different results. I don't know that it is a documentation issue or a bug in the module itself, so I created this bugreport to seek for assistance in this regard.
msg340469 - (view)	Author: Utkarsh Gupta (utkarsh2102) *	Date: 2019-04-18 06:08
I am not sure if that's a documentation problem, is it? If so, I'll be happy to send a PR :)
msg354816 - (view)	Author: Zachary McCord (Zachary McCord)	Date: 2019-10-16 20:26
I think anyone using the tokenize module to programmatically edit python source wants to use and probably does use the undocumented behavior, which should then be documented. I ran into this issue because for me this manifested as a crash: $ python3 >>> import tokenize >>> tokenize.untokenize([(tokenize.STRING, "''", (1, 0), (1, 0), None)]) "''" >>> tokenize.untokenize([(tokenize.STRING, "''", None, None, None)]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/<snip>/virtualenv/lib/python3.6/tokenize.py", line 338, in untokenize out = ut.untokenize(iterable) File "/<snip>/virtualenv/lib/python3.6/tokenize.py", line 272, in untokenize self.add_whitespace(start) File "/<snip>/virtualenv/lib/python3.6/tokenize.py", line 231, in add_whitespace row, col = start TypeError: 'NoneType' object is not iterable The second call is giving untokenize() input that is documented to be valid, yet which causes a crash.

History
Date	User	Action	Args
2022-04-11 14:59:08	admin	set	github: 79478
2019-10-16 20:26:13	Zachary McCord	set	nosy: + Zachary McCord messages: + msg354816
2019-04-18 06:08:30	utkarsh2102	set	nosy: + utkarsh2102 messages: + msg340469
2019-04-16 22:40:12	donovick	set	nosy: + donovick
2018-11-22 21:25:34	csernazs	set	versions: + Python 3.6
2018-11-22 21:25:02	csernazs	create