classification
Title: tokenize.untokenize() misbehaves when moved to "compatiblity mode"
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: eric.snow, takluyver, terry.reedy
Priority: normal Keywords: patch

Created on 2012-10-14 06:04 by eric.snow, last changed 2014-02-18 01:49 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
untokenize_compat_force_iter.diff eric.snow, 2012-10-14 06:18 review
Messages (4)
msg172851 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-10-14 06:04
When tokenize.untokenize() encounters a 2-tuple, it moves to compatibility mode, where only the token type and string are used from that point forward.  There are two closely related problems:

* when the iterable is a sequence, the portion of the sequence prior to the 2-tuple is traversed a second time under compatibility mode.
* when the iterable is an iterator, the first 2-tuple encountered is essentially gobbled up (see issue16221).

Either an explicit "iterable = iter(iterable)" or "iterable = list(iterable)" should happen at the very beginning of Untokenizer.untokenize().  If the former, Untokenizer.compat() should be fixed to not treat that first token differently.  If the latter, self.tokens should be cleared at the beginning of Untokenizer.compat().

I'll put up a patch with the second option when I get a chance.
msg172853 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-10-14 06:18
Actually, here's a patch with the first option.  It preserves iterators as iterators, rather than dumping them into a list.  I've also rolled the tests from issue16221 into this patch.  Consequently, if the patch is suitable, that issue can be closed.
msg180587 - (view) Author: Thomas Kluyver (takluyver) * Date: 2013-01-25 14:20
I think this is a duplicate of #8478.
msg211469 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-02-18 01:49
While I am closing this as a duplicate, I will use some of your patch, including one test, and credit you as well.

Switching from 5-tuples to 2-tuples, as in one of your test cases, is not currently a supported use case,  Compat currently re-iterates the entire token list and that does not work if some tokens have already been processed. While iter(iterable) makes your toy example pass, switching still does not work because of the problem of initializing compat.

   indents = []
This could only work with switching by making it a instance attribute which is also updated in the 5-tuple case. It is needed in tokenize also to support tab indents (#20383) but would only need to be an attribute instead of a local to support switching.
 
    startline = token[0] in (NEWLINE, NL) (my replacement for 3 lines)
This is odd as the the file starts at the start of a line whether or not the first token is \n. On the other hand, the initial value of startline is irrelevant as long as it has some value because it is irrelevant until there has been an indent. It would also have to become an attribute to support switching and then it would be relevant since indents might not be initially empty. But I do not currently see the need for a tuple length switching feature.

    prevstring = False
This does not matter even if wrong since it only means adding a space.
History
Date User Action Args
2014-02-18 01:49:52terry.reedysetstatus: open -> closed

assignee: eric.snow -> terry.reedy
versions: - Python 3.2
nosy: + terry.reedy

messages: + msg211469
resolution: duplicate
2013-01-25 14:20:37takluyversetnosy: + takluyver
messages: + msg180587
2012-10-14 06:20:58eric.snowlinkissue16221 superseder
2012-10-14 06:18:57eric.snowsetfiles: + untokenize_compat_force_iter.diff
keywords: + patch
messages: + msg172853

stage: test needed -> patch review
2012-10-14 06:04:53eric.snowcreate