This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python and the Unicode Character Database
Type: Stage:
Components: Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Alexander.Belopolsky, belopolsky
Priority: normal Keywords:

Created on 2010-11-28 20:18 by Alexander.Belopolsky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (2)
msg122718 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-11-28 20:18
Two recently reported issues brought into light the fact that Python
language definition is closely tied to character properties maintained
by the Unicode Consortium. [1,2]  For example, when Python switches to
Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two
additional characters that Python can use in identifiers. [3]

With Python 3.1:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    ೱ = 1
      ^
SyntaxError: invalid character in identifier

but with Python 3.2a4:

1

Of course, the likelihood is low that this change will affect any
user, but the change in str.isspace() reported in [1] is likely to
cause some trouble:

[u'A', u'B']

[u'A\u200bB']

While we have little choice but to follow UCD in defining
str.isidentifier(), I think Python can promise users more stability in
what it treats as space or as a digit in its builtins.   For example,
I don't think that supporting

1234.56

is more important than to assure users that once their program
accepted some text as a number, they can assume that the text is
ASCII.

[1] http://bugs.python.org/issue10567
[2] http://bugs.python.org/issue10557
[3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes
msg122720 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-28 20:25
This was meant as python-dev post, not an issue. (Sent to wrong address by mistake.)
History
Date User Action Args
2022-04-11 14:57:09adminsetgithub: 54777
2010-11-28 20:25:52belopolskysetstatus: open -> closed

nosy: + belopolsky
messages: + msg122720

resolution: not a bug
2010-11-28 20:18:44Alexander.Belopolskycreate