Title: tokenize.untokenize first token missing failure case
Messages (7)
msg103799 - (view) Author: (rb) * Date: 2010-04-21 02:18
When altering tokens and thus not providing token location information, tokenize.untokenize sometimes misses out the first token. Failure case below.

Expected output: 'import foo ,bar\n'
Actual output: 'foo ,bar\n'

$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import StringIO, tokenize
>>> def strip(iterable):
...     for t_type, t_str, (srow, scol), (erow, ecol), line in iterable:
...         yield t_type, t_str
>>> source = StringIO.StringIO('import foo, bar\n')
>>> print repr(tokenize.untokenize(strip(tokenize.generate_tokens(source.readline))))
'foo ,bar \n'
>>> print repr(tokenize.untokenize(tokenize.generate_tokens(source.readline)))
'import foo, bar\n'
msg106450 - (view) Author: (rb) * Date: 2010-05-25 16:57
I've looked into this in some more depth.

The problem is that Untokenizer.compat is assuming that iterable can restart from the beginning, when Untokenizer.untokenize has already had the first element out. So it works with a list, but not with a generator.

In particular, untokenize is broken for any input that is a generator which only supplies the first two elements.

Workaround: never hand untokenize a generator. Expand generators to lists first instead.
msg172191 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-06 12:51
Attaching patch.  Actually both versions of untokenize() were broken; the version used for "full input" (5-tuples) had a flipped inequality sign in an assert.

Other changes in the patch:

* Docs fixed to describe both modes
* Tests fixed to exercise both modes
msg180589 - (view) Author: Thomas Kluyver (takluyver) * Date: 2013-01-25 14:25
#16224 appears to be a duplicate.

There seem to be several quite major issues with untokenize - see also #12691 - with patches made to fix them. Is there anything I can do to help push these forwards?
msg211448 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-17 21:50
New changeset c896d292080a by Terry Jan Reedy in branch '2.7':
Untokenize: An logically incorrect assert tested user input validity.

New changeset 51e5a89afb3b by Terry Jan Reedy in branch '3.3':
Untokenize: An logically incorrect assert tested user input validity.
msg211475 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-18 04:17
New changeset c2517a37c13a by Terry Jan Reedy in branch '2.7':
Issue #8478: Untokenizer.compat now processes first token from iterator input.

New changeset b6d6ca792b64 by Terry Jan Reedy in branch '3.3':
Issue #8478: Untokenizer.compat now processes first token from iterator input.
msg212041 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-23 23:01
New changeset 8d6dd02a973f by Terry Jan Reedy in branch '3.3':
Issue #20750, Enable roundtrip tests for new 5-tuple untokenize. The
