This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author monson
Recipients monson
Date 2018-08-27.06:47:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1535352467.5.0.56676864532.issue34515@psf.upfronthosting.co.za>
In-reply-to
Content
Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). see https://docs.python.org/3/reference/lexical_analysis.html#identifiers

But lib2to3 can't tokenize them corretly.
```
$ echo '中 = 1' | python3.7 -m lib2to3.pgen2.tokenize
1,0-1,1:	ERRORTOKEN	'中'
1,2-1,3:	OP	'='
1,4-1,5:	NUMBER	'1'
1,5-1,6:	NEWLINE	'\n'
2,0-2,0:	ENDMARKER	''
```
'中' should be tokenized as NAME instead of ERRORTOKEN.
History
Date User Action Args
2018-08-27 06:47:47monsonsetrecipients: + monson
2018-08-27 06:47:47monsonsetmessageid: <1535352467.5.0.56676864532.issue34515@psf.upfronthosting.co.za>
2018-08-27 06:47:47monsonlinkissue34515 messages
2018-08-27 06:47:47monsoncreate