classification
Title: untokenize documentation is not correct
Type: Stage:
Components: Documentation Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Zachary McCord, csernazs, docs@python, donovick, utkarsh2102
Priority: normal Keywords:

Created on 2018-11-22 21:25 by csernazs, last changed 2019-10-16 20:26 by Zachary McCord.

Messages (3)
msg330281 - (view) Author: Zsolt Cserna (csernazs) * Date: 2018-11-22 21:25
untokenize documentation (https://docs.python.org/3/library/tokenize.html#tokenize.untokenize) states the following:

"""
Converts tokens back into Python source code. The iterable must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored.
"""

This last sentence is clearly not true because here:
https://github.com/python/cpython/blob/master/Lib/tokenize.py#L242

The code checks for the length of the input token there, and the code behaves differently, in terms of whitespace, when an iterator of 2-tuples are given and when there are more elements in the tuple. When there are more elements in the tuple, the function renders whitespaces as the same as they were present in the original source.

So this code:
tokenize.untokenize(tokenize.tokenize(source.readline))

And this:
tokenize.untokenize([x[:2] for x in tokenize.tokenize(source.readline)])

Have different results.

I don't know that it is a documentation issue  or a bug in the module itself, so I created this bugreport to seek for assistance in this regard.
msg340469 - (view) Author: Utkarsh Gupta (utkarsh2102) * Date: 2019-04-18 06:08
I am not sure if that's a documentation problem, is it?
If so, I'll be happy to send a PR :)
msg354816 - (view) Author: Zachary McCord (Zachary McCord) Date: 2019-10-16 20:26
I think anyone using the tokenize module to programmatically edit python source wants to use and probably does use the undocumented behavior, which should then be documented.

I ran into this issue because for me this manifested as a crash:

$ python3
>>> import tokenize
>>> tokenize.untokenize([(tokenize.STRING, "''", (1, 0), (1, 0), None)])
"''"
>>> tokenize.untokenize([(tokenize.STRING, "''", None, None, None)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/<snip>/virtualenv/lib/python3.6/tokenize.py", line 338, in untokenize
    out = ut.untokenize(iterable)
  File "/<snip>/virtualenv/lib/python3.6/tokenize.py", line 272, in untokenize
    self.add_whitespace(start)
  File "/<snip>/virtualenv/lib/python3.6/tokenize.py", line 231, in add_whitespace
    row, col = start
TypeError: 'NoneType' object is not iterable

The second call is giving untokenize() input that is documented to be valid, yet which causes a crash.
History
Date User Action Args
2019-10-16 20:26:13Zachary McCordsetnosy: + Zachary McCord
messages: + msg354816
2019-04-18 06:08:30utkarsh2102setnosy: + utkarsh2102
messages: + msg340469
2019-04-16 22:40:12donovicksetnosy: + donovick
2018-11-22 21:25:34csernazssetversions: + Python 3.6
2018-11-22 21:25:02csernazscreate