classification
Title: tokenize.untokenize first token missing failure case
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: georg.brandl, rb
Priority: normal Keywords:

Created on 2010-04-21 02:18 by rb, last changed 2010-05-25 16:57 by rb.

Messages (2)
msg103799 - (view) Author: (rb) Date: 2010-04-21 02:18
When altering tokens and thus not providing token location information, tokenize.untokenize sometimes misses out the first token. Failure case below.

Expected output: 'import foo ,bar\n'
Actual output: 'foo ,bar\n'

$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import StringIO, tokenize
>>> 
>>> def strip(iterable):
...     for t_type, t_str, (srow, scol), (erow, ecol), line in iterable:
...         yield t_type, t_str
... 
>>> source = StringIO.StringIO('import foo, bar\n')
>>> print repr(tokenize.untokenize(strip(tokenize.generate_tokens(source.readline))))
'foo ,bar \n'
>>> source.seek(0)
>>> print repr(tokenize.untokenize(tokenize.generate_tokens(source.readline)))
'import foo, bar\n'
>>>
msg106450 - (view) Author: (rb) Date: 2010-05-25 16:57
I've looked into this in some more depth.

The problem is that Untokenizer.compat is assuming that iterable can restart from the beginning, when Untokenizer.untokenize has already had the first element out. So it works with a list, but not with a generator.

In particular, untokenize is broken for any input that is a generator which only supplies the first two elements.

Workaround: never hand untokenize a generator. Expand generators to lists first instead.
History
Date User Action Args
2010-05-25 16:57:26rbsetmessages: + msg106450
2010-04-21 18:18:30georg.brandlsetassignee: georg.brandl

nosy: + georg.brandl
2010-04-21 17:36:41rbsettype: behavior
2010-04-21 02:18:30rbcreate