Issue 42687: tokenize module does not recognize Barry as FLUFL

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86853

classification

Title:	tokenize module does not recognize Barry as FLUFL
Type:	enhancement	Stage:	resolved
Components:		Versions:	Python 3.10

process

Status:	closed	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	BTaskaya, esoma, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2020-12-19 15:46 by esoma, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 23857	closed	esoma, 2020-12-19 16:01

Messages (3)
msg383384 - (view)	Author: Erik Soma (esoma) *	Date: 2020-12-19 15:46
'<>' is not recognized by the tokenize module as a single token, instead it is two tokens. ``` $ python -c "import tokenize; import io; import pprint; pprint.pprint(list(tokenize.tokenize(io.BytesIO(b'<>').readline)))" [TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''), TokenInfo(type=54 (OP), string='<', start=(1, 0), end=(1, 1), line='<>'), TokenInfo(type=54 (OP), string='>', start=(1, 1), end=(1, 2), line='<>'), TokenInfo(type=4 (NEWLINE), string='', start=(1, 2), end=(1, 3), line=''), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')] ``` I would expect: ``` [TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''), TokenInfo(type=54 (OP), string='<>', start=(1, 0), end=(1, 2), line='<>'), TokenInfo(type=4 (NEWLINE), string='', start=(1, 2), end=(1, 3), line=''), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')] ``` This is the behavior of the CPython tokenizer which the tokenizer module tries "to match the working of".
msg383787 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2020-12-26 00:56
I strongly disagree. '<>' is not a legal operator any more. It is a parse-time syntax error. Whatever historical artifact is left in the CPython tokenizer, recognizing '<>' is not exposed to Python code. >>> p = ast.parse('a <> b') Traceback (most recent call last): ... a <> b ^ SyntaxError: invalid syntax When '<>' was legal, we may presume that tokenizer recognized it, so that not recognizing it was an intentional change. Reverting this would be a dis-service to users. I think that the PR and this issue should be closed. If the historical artifact bothers you, propose removing it instead on introducing a bug into tokenizer.
msg383794 - (view)	Author: Batuhan Taskaya (BTaskaya) *	Date: 2020-12-26 07:13
I concur with Terry.

History
Date	User	Action	Args
2022-04-11 14:59:39	admin	set	github: 86853
2020-12-27 01:05:55	esoma	set	status: open -> closed stage: patch review -> resolved
2020-12-26 07:13:41	BTaskaya	set	nosy: + BTaskaya messages: + msg383794
2020-12-26 00:56:19	terry.reedy	set	versions: - Python 3.9 nosy: + terry.reedy messages: + msg383787 type: enhancement
2020-12-19 16:01:57	esoma	set	keywords: + patch stage: patch review pull_requests: + pull_request22722
2020-12-19 15:46:27	esoma	create