This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode name accepts a punctuation glyph
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: ezio.melotti, julien.tayon, r.david.murray
Priority: normal Keywords:

Created on 2012-10-16 15:32 by julien.tayon, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg173049 - (view) Author: julien tayon (julien.tayon) Date: 2012-10-16 15:32
I guess unicode variable names are restricted to letters, and that symbols and punctuation shoud be ignored (except _).

I have tested other dots (punctuation) they dont work. 

Only
http://www.fileformat.info/info/unicode/char/00b7/index.htm
oddly enough has worked so far.


$ python3.2
Python 3.2.3 (default, Sep 10 2012, 18:14:40) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> foo⋅bar=42
  File "<stdin>", line 1
    foo⋅bar=42
            ^
SyntaxError: invalid character in identifier
>>> print(ord("foo⋅bar"[3]))
8901
>>> foo·bar = 42
>>> print(ord("foo·bar"[3]))
183

I have sampled randomly in the same block as MIDDLE DOT and it seems to behave correctly.
msg173052 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-10-16 15:41
The rules for python identifiers are documented here:

  http://docs.python.org/dev/reference/lexical_analysis.html#identifiers

Are you saying that the behavior does not match the documentation?
msg173055 - (view) Author: julien tayon (julien.tayon) Date: 2012-10-16 15:56
http://www.fileformat.info/info/unicode/char/b7/index.htm

the unicode category is Po (Ponctuation). 

Empirically, it cannot start a variable name so according to the rules given in the lexical analyser it should be one of : Mn, Mc, Nd, Pc

Which is not the case Po not in [ Mn, Mc, Nd, Pc ].

Modulo my weak brain, it does not seem right.
msg173056 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-16 16:02
The characters with the Other_ID_Continue property are also included, i.e.:

00B7          ; Other_ID_Continue # Po       MIDDLE DOT
0387          ; Other_ID_Continue # Po       GREEK ANO TELEIA
1369..1371    ; Other_ID_Continue # No   [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
19DA          ; Other_ID_Continue # No       NEW TAI LUE THAM DIGIT ONE

See http://unicode.org/Public/UNIDATA/PropList.txt
History
Date User Action Args
2022-04-11 14:57:37adminsetgithub: 60453
2012-10-16 16:02:24ezio.melottisetstatus: open -> closed

assignee: ezio.melotti

nosy: + ezio.melotti
messages: + msg173056
resolution: not a bug
stage: resolved
2012-10-16 15:56:51julien.tayonsetmessages: + msg173055
2012-10-16 15:41:18r.david.murraysetnosy: + r.david.murray
messages: + msg173052
2012-10-16 15:32:35julien.tayoncreate