This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: lib2to3 doesn't parse Python 3 identifiers containing non-spacing marks
Type: Stage: resolved
Components: 2to3 (2.x to 3.x conversion tool), Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder: Close 2to3 issues and list them here
View: 45544
Assigned To: Nosy List: BTaskaya, JustinTArthur, benjamin.peterson
Priority: normal Keywords:

Created on 2019-09-04 23:21 by JustinTArthur, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
badvar.py JustinTArthur, 2019-09-04 23:21 Module demonstrating non-word continuation characters
Messages (8)
msg351153 - (view) Author: Justin Arthur (JustinTArthur) * Date: 2019-09-04 23:21
Python 3 code with an identifier that has a non-spacing mark in it does not get tokenized by lib2to3 and will result in an exception thrown in the parsing process.

Parsing the attached file (badvar.py), results in `ParseError: bad token: type=58, value='̇', context=('', (1, 1))`

This happens because the Name pattern regular expression in lib2to3 is `r'\w+'` and the word character class doesn't contain non-spacing marks (and possible other [continuation characters allowed in Python 3 identifiers](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)).

(reported by energizer in the Python IRC channel)
msg351215 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-09-05 17:39
"2to3 is a Python program that reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code."

The example you supply, badvar,py, is not a valid Python 2.x program.  Python 2 identifiers cannot contain such characters.

https://docs.python.org/3/library/2to3.html
https://docs.python.org/2/reference/lexical_analysis.html#identifiers
https://docs.python.org/3/reference/lexical_analysis.html#identifiers
msg351220 - (view) Author: Justin Arthur (JustinTArthur) * Date: 2019-09-05 18:57
Ned, can you confirm that 2to3 is not intended for cumulative/incremental runs over the same codebase?

If it's not intended to be run on previously ported code, this will just need to be fixed on the lib2to3 downstream projects like awpa and Black that are encountering this issue.
msg351224 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-09-06 02:48
Benjamin, can you answer Justin's question above?
msg351226 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-09-06 04:10
2to3 should be able to parse valid Python 3 code.
msg351227 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-09-06 04:26
> 2to3 should be able to parse valid Python 3 code.

OK, then should the original behavior here be treated as a bug and fixed?  If so, this issue should be re-opened.
msg356957 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2019-11-19 09:50
Is there a consensus about fixing this? By the way, this isn't valid in the current tokenizer too. 
1,0-1,2:            NAME           'iÌ'
1,2-1,3:            ERRORTOKEN     '‡'
1,4-1,5:            OP             '='
1,6-1,7:            NUMBER         '5'
1,7-1,8:            NEWLINE        '\n'
msg377798 - (view) Author: Justin Arthur (JustinTArthur) * Date: 2020-10-02 04:52
Not sure if there is consensus on how to fix, but fixing #12731 will fix this for most of the cases I've seen complaints about as a side effect.
History
Date User Action Args
2022-04-11 14:59:19adminsetgithub: 82213
2021-10-20 22:55:19iritkatrielsetstatus: open -> closed
superseder: Close 2to3 issues and list them here
resolution: wont fix
stage: needs patch -> resolved
2020-10-02 04:52:07JustinTArthursetmessages: + msg377798
2019-11-19 09:50:19BTaskayasetnosy: + BTaskaya
messages: + msg356957
2019-09-06 04:38:14ned.deilysetnosy: - ned.deily
stage: resolved -> needs patch

versions: + Python 3.9, - Python 3.5, Python 3.6
2019-09-06 04:30:11benjamin.petersonsetstatus: closed -> open
resolution: not a bug -> (no value)
2019-09-06 04:26:42ned.deilysetmessages: + msg351227
2019-09-06 04:10:06benjamin.petersonsetmessages: + msg351226
2019-09-06 02:48:28ned.deilysetnosy: + benjamin.peterson
messages: + msg351224
2019-09-05 18:57:03JustinTArthursetmessages: + msg351220
2019-09-05 17:39:39ned.deilysetstatus: open -> closed

nosy: + ned.deily
messages: + msg351215

resolution: not a bug
stage: resolved
2019-09-04 23:21:42JustinTArthurcreate