Message 245547 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gumblex
Recipients	Arfrever, gumblex, jaraco, terry.reedy
Date	2015-06-20.07:13:45
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1434784426.71.0.297913391897.issue20387@psf.upfronthosting.co.za>
In-reply-to

Content
Sorry for the inconvenience. I failed to find this old bug. I think there is another problem. The docs of `untokenize` said "The iterable must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored.", so if I feed in, say, a 3-tuple, the untokenize should accept it as tok[:2]. The attached patch should have addressed the problems above. When trying to make a patch, a tokenize bug was found. Consider the new attached tab.py, the tabs between comments and code, and the tabs between expressions are lost, so when untokenizing, position information is used to produce equivalent spaces, instead of tabs. Despite the tokenization problem, the patch can produce syntactically correct code as accurately as it can. The PEP 8 recommends spaces for indentation, but the usage of tabs should not be ignored. new tab.py (in Python string): '#!/usr/bin/env python\n# -- coding: utf-8 --\n\ndef foo():\n\t"""\n\tTests tabs in tokenization\n\t\tfoo\n\t"""\n\tpass\n\tpass\n\tif 1:\n\t\t# not indent correctly\n\t\tpass\n\t\t# correct\ttab\n\t\tpass\n\tpass\n\tbaaz = {\'a\ttab\':\t1,\n\t\t\t\'b\': 2}\t\t# also fails\n\npass\n#if 2:\n\t#pass\n#pass\n'

Sorry for the inconvenience. I failed to find this old bug.

I think there is another problem. The docs of `untokenize` said "The iterable must return sequences with **at least** two elements, the token type and the token string. Any additional sequence elements are ignored.", so if I feed in, say, a 3-tuple, the untokenize should accept it as tok[:2].

The attached patch should have addressed the problems above. 

When trying to make a patch, a tokenize bug was found. Consider the new attached tab.py, the tabs between comments and code, and the tabs between expressions are lost, so when untokenizing, position information is used to produce equivalent spaces, instead of tabs.

Despite the tokenization problem, the patch can produce syntactically correct code as accurately as it can.

The PEP 8 recommends spaces for indentation, but the usage of tabs should not be ignored.

new tab.py (in Python string):

'#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\ndef foo():\n\t"""\n\tTests tabs in tokenization\n\t\tfoo\n\t"""\n\tpass\n\tpass\n\tif 1:\n\t\t# not indent correctly\n\t\tpass\n\t\t# correct\ttab\n\t\tpass\n\tpass\n\tbaaz = {\'a\ttab\':\t1,\n\t\t\t\'b\': 2}\t\t# also fails\n\npass\n#if 2:\n\t#pass\n#pass\n'

History
Date	User	Action	Args
2015-06-20 07:13:46	gumblex	set	recipients: + gumblex, terry.reedy, jaraco, Arfrever
2015-06-20 07:13:46	gumblex	set	messageid: <1434784426.71.0.297913391897.issue20387@psf.upfronthosting.co.za>
2015-06-20 07:13:46	gumblex	link	issue20387 messages
2015-06-20 07:13:46	gumblex	create