classification
Title: tokenize.untokenize first token missing failure case
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: Arfrever, eric.snow, georg.brandl, python-dev, rb, takluyver, terry.reedy
Priority: normal Keywords: patch

Created on 2010-04-21 02:18 by rb, last changed 2014-02-23 23:01 by python-dev.

Files
File name Uploaded Description Edit
untokenize.diff georg.brandl, 2012-10-06 12:51 review
Messages (7)
msg103799 - (view) Author: (rb) Date: 2010-04-21 02:18
When altering tokens and thus not providing token location information, tokenize.untokenize sometimes misses out the first token. Failure case below.

Expected output: 'import foo ,bar\n'
Actual output: 'foo ,bar\n'

$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import StringIO, tokenize
>>> 
>>> def strip(iterable):
...     for t_type, t_str, (srow, scol), (erow, ecol), line in iterable:
...         yield t_type, t_str
... 
>>> source = StringIO.StringIO('import foo, bar\n')
>>> print repr(tokenize.untokenize(strip(tokenize.generate_tokens(source.readline))))
'foo ,bar \n'
>>> source.seek(0)
>>> print repr(tokenize.untokenize(tokenize.generate_tokens(source.readline)))
'import foo, bar\n'
>>>
msg106450 - (view) Author: (rb) Date: 2010-05-25 16:57
I've looked into this in some more depth.

The problem is that Untokenizer.compat is assuming that iterable can restart from the beginning, when Untokenizer.untokenize has already had the first element out. So it works with a list, but not with a generator.

In particular, untokenize is broken for any input that is a generator which only supplies the first two elements.

Workaround: never hand untokenize a generator. Expand generators to lists first instead.
msg172191 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-06 12:51
Attaching patch.  Actually both versions of untokenize() were broken; the version used for "full input" (5-tuples) had a flipped inequality sign in an assert.

Other changes in the patch:

* Docs fixed to describe both modes
* Tests fixed to exercise both modes
msg180589 - (view) Author: Thomas Kluyver (takluyver) * Date: 2013-01-25 14:25
#16224 appears to be a duplicate.

There seem to be several quite major issues with untokenize - see also #12691 - with patches made to fix them. Is there anything I can do to help push these forwards?
msg211448 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-17 21:50
New changeset c896d292080a by Terry Jan Reedy in branch '2.7':
Untokenize: An logically incorrect assert tested user input validity.
http://hg.python.org/cpython/rev/c896d292080a

New changeset 51e5a89afb3b by Terry Jan Reedy in branch '3.3':
Untokenize: An logically incorrect assert tested user input validity.
http://hg.python.org/cpython/rev/51e5a89afb3b
msg211475 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-18 04:17
New changeset c2517a37c13a by Terry Jan Reedy in branch '2.7':
Issue #8478: Untokenizer.compat now processes first token from iterator input.
http://hg.python.org/cpython/rev/c2517a37c13a

New changeset b6d6ca792b64 by Terry Jan Reedy in branch '3.3':
Issue #8478: Untokenizer.compat now processes first token from iterator input.
http://hg.python.org/cpython/rev/b6d6ca792b64
msg212041 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-23 23:01
New changeset 8d6dd02a973f by Terry Jan Reedy in branch '3.3':
Issue #20750, Enable roundtrip tests for new 5-tuple untokenize. The
http://hg.python.org/cpython/rev/8d6dd02a973f
History
Date User Action Args
2014-02-23 23:01:28python-devsetmessages: + msg212041
2014-02-18 04:17:22python-devsetmessages: + msg211475
2014-02-17 21:50:19python-devsetnosy: + python-dev
messages: + msg211448
2014-02-17 21:18:32terry.reedysetassignee: terry.reedy
stage: patch review

nosy: + terry.reedy
versions: + Python 2.7, Python 3.3, Python 3.4, - Python 2.6
2013-03-28 10:05:04georg.brandlsetassignee: georg.brandl -> (no value)
2013-01-25 14:25:46takluyversetmessages: + msg180589
2013-01-24 01:01:59takluyversetnosy: + takluyver
2012-11-13 06:44:17eric.snowsetnosy: + eric.snow
2012-10-06 18:03:17Arfreversetnosy: + Arfrever
2012-10-06 12:51:33georg.brandlsetfiles: + untokenize.diff
keywords: + patch
messages: + msg172191
2010-05-25 16:57:26rbsetmessages: + msg106450
2010-04-21 18:18:30georg.brandlsetassignee: georg.brandl

nosy: + georg.brandl
2010-04-21 17:36:41rbsettype: behavior
2010-04-21 02:18:30rbcreate