Message 324148 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	monson
Recipients	monson
Date	2018-08-27.06:47:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1535352467.5.0.56676864532.issue34515@psf.upfronthosting.co.za>
In-reply-to

Content
Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). see https://docs.python.org/3/reference/lexical_analysis.html#identifiers But lib2to3 can't tokenize them corretly. ``` $ echo '中 = 1' \| python3.7 -m lib2to3.pgen2.tokenize 1,0-1,1: ERRORTOKEN '中' 1,2-1,3: OP '=' 1,4-1,5: NUMBER '1' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER '' ``` '中' should be tokenized as NAME instead of ERRORTOKEN.

Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). see https://docs.python.org/3/reference/lexical_analysis.html#identifiers

But lib2to3 can't tokenize them corretly.
```
$ echo '中 = 1' | python3.7 -m lib2to3.pgen2.tokenize
1,0-1,1:	ERRORTOKEN	'中'
1,2-1,3:	OP	'='
1,4-1,5:	NUMBER	'1'
1,5-1,6:	NEWLINE	'\n'
2,0-2,0:	ENDMARKER	''
```
'中' should be tokenized as NAME instead of ERRORTOKEN.

History
Date	User	Action	Args
2018-08-27 06:47:47	monson	set	recipients: + monson
2018-08-27 06:47:47	monson	set	messageid: <1535352467.5.0.56676864532.issue34515@psf.upfronthosting.co.za>
2018-08-27 06:47:47	monson	link	issue34515 messages
2018-08-27 06:47:47	monson	create