This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: access to unicodedata (via codepoints or 2-char surrogates)
Type: enhancement Stage:
Components: Unicode Versions:
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: doerwalter, vlbrom
Priority: normal Keywords:

Created on 2007-04-24 10:47 by vlbrom, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg55091 - (view) Author: vbr (vlbrom) Date: 2007-04-24 10:47
Currently, most functions of the unicodedata module require the unichr - unicode string of length 1 - as a parameter; for most uses it's ok, but especially while working with characters outside the BMP - (the code point over FFFF) on a narrow python build it could be quite handy, to access the properties of these characters simply using the codepoint or ordinal (since the simple unichr(x) only works for x <= FFFF on a narrow build, hence the other unicode planes are unaccessible this way).

I belive, the unicode database could be allready indexed using some numerical values like codepoints, or isn't it true?

With this improvement, the whole database could be effectively accessible also on narrow python builds, where it isn't possible to pass one-character string for codepoints over FFFF (even if the explicit limitation of unichr is bypassed, eg. by creating an unicode literal u'\Uxxxxxxxx', the resulting string consist of a surrogate pair and has obviously the length 2)

Alternatively, it could be possible, that the respective functions would also accept a two-character string, provided, this sequence can be correcly interpretted as a surrogate-pair representation of some valid unicode codepoint. 

Currently such behaviour (e.g. codepoint access) can be emulated with custom datasets derived from the unicode database, but I belive, that it should be possible to access the allready present data somehow (also on narrow builds), rather than having to duplicate it.

msg67639 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2008-06-02 20:41
Fixed for 2.6 in r63899.
msg67671 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2008-06-03 19:09
Fixed for 3.0 in r63918
Date User Action Args
2022-04-11 14:56:23adminsetgithub: 44891
2008-06-03 19:09:13doerwaltersetmessages: + msg67671
2008-06-02 20:41:34doerwaltersetstatus: open -> closed
resolution: fixed
messages: + msg67639
nosy: + doerwalter
2007-04-24 10:47:21vlbromcreate