Issue 16224: tokenize.untokenize() misbehaves when moved to "compatiblity mode"

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60428

classification

Title:	tokenize.untokenize() misbehaves when moved to "compatiblity mode"
Type:	behavior	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.3, Python 3.4, Python 2.7

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:
Assigned To:	terry.reedy	Nosy List:	eric.snow, takluyver, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2012-10-14 06:04 by eric.snow, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
untokenize_compat_force_iter.diff	eric.snow, 2012-10-14 06:18		review

Messages (4)
msg172851 - (view)	Author: Eric Snow (eric.snow) *	Date: 2012-10-14 06:04
When tokenize.untokenize() encounters a 2-tuple, it moves to compatibility mode, where only the token type and string are used from that point forward. There are two closely related problems: * when the iterable is a sequence, the portion of the sequence prior to the 2-tuple is traversed a second time under compatibility mode. * when the iterable is an iterator, the first 2-tuple encountered is essentially gobbled up (see issue16221). Either an explicit "iterable = iter(iterable)" or "iterable = list(iterable)" should happen at the very beginning of Untokenizer.untokenize(). If the former, Untokenizer.compat() should be fixed to not treat that first token differently. If the latter, self.tokens should be cleared at the beginning of Untokenizer.compat(). I'll put up a patch with the second option when I get a chance.
msg172853 - (view)	Author: Eric Snow (eric.snow) *	Date: 2012-10-14 06:18
Actually, here's a patch with the first option. It preserves iterators as iterators, rather than dumping them into a list. I've also rolled the tests from issue16221 into this patch. Consequently, if the patch is suitable, that issue can be closed.
msg180587 - (view)	Author: Thomas Kluyver (takluyver) *	Date: 2013-01-25 14:20
I think this is a duplicate of #8478.
msg211469 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-02-18 01:49
While I am closing this as a duplicate, I will use some of your patch, including one test, and credit you as well. Switching from 5-tuples to 2-tuples, as in one of your test cases, is not currently a supported use case, Compat currently re-iterates the entire token list and that does not work if some tokens have already been processed. While iter(iterable) makes your toy example pass, switching still does not work because of the problem of initializing compat. indents = [] This could only work with switching by making it a instance attribute which is also updated in the 5-tuple case. It is needed in tokenize also to support tab indents (#20383) but would only need to be an attribute instead of a local to support switching. startline = token[0] in (NEWLINE, NL) (my replacement for 3 lines) This is odd as the the file starts at the start of a line whether or not the first token is \n. On the other hand, the initial value of startline is irrelevant as long as it has some value because it is irrelevant until there has been an indent. It would also have to become an attribute to support switching and then it would be relevant since indents might not be initially empty. But I do not currently see the need for a tuple length switching feature. prevstring = False This does not matter even if wrong since it only means adding a space.

History
Date	User	Action	Args
2022-04-11 14:57:37	admin	set	github: 60428
2014-02-18 01:49:52	terry.reedy	set	status: open -> closed assignee: eric.snow -> terry.reedy versions: - Python 3.2 nosy: + terry.reedy messages: + msg211469 resolution: duplicate
2013-01-25 14:20:37	takluyver	set	nosy: + takluyver messages: + msg180587
2012-10-14 06:20:58	eric.snow	link	issue16221 superseder
2012-10-14 06:18:57	eric.snow	set	files: + untokenize_compat_force_iter.diff keywords: + patch messages: + msg172853 stage: test needed -> patch review
2012-10-14 06:04:53	eric.snow	create