classification
Title: access to unicodedata (via codepoints or 2-char surrogates)
Type: feature request Stage:
Components: Unicode Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: doerwalter, vlbrom (2)
Priority: normal Keywords:

Created on 2007-04-24 10:47 by vlbrom, last changed 2008-06-03 19:09 by doerwalter.

Messages (3)
msg55091 - (view) Author: vbr (vlbrom) Date: 2007-04-24 10:47
Currently, most functions of the unicodedata module require the unichr - unicode string of length 1 - as a parameter; for most uses it's ok, but especially while working with characters outside the BMP - (the code point over FFFF) on a narrow python build it could be quite handy, to access the properties of these characters simply using the codepoint or ordinal (since the simple unichr(x) only works for x <= FFFF on a narrow build, hence the other unicode planes are unaccessible this way).

I belive, the unicode database could be allready indexed using some numerical values like codepoints, or isn't it true?

With this improvement, the whole database could be effectively accessible also on narrow python builds, where it isn't possible to pass one-character string for codepoints over FFFF (even if the explicit limitation of unichr is bypassed, eg. by creating an unicode literal u'\Uxxxxxxxx', the resulting string consist of a surrogate pair and has obviously the length 2)

Alternatively, it could be possible, that the respective functions would also accept a two-character string, provided, this sequence can be correcly interpretted as a surrogate-pair representation of some valid unicode codepoint. 

Currently such behaviour (e.g. codepoint access) can be emulated with custom datasets derived from the unicode database, but I belive, that it should be possible to access the allready present data somehow (also on narrow builds), rather than having to duplicate it.

msg67639 - (view) Author: Walter Dörwald (doerwalter) Date: 2008-06-02 20:41
Fixed for 2.6 in r63899.
msg67671 - (view) Author: Walter Dörwald (doerwalter) Date: 2008-06-03 19:09
Fixed for 3.0 in r63918
History
Date User Action Args
2008-06-03 19:09:13doerwaltersetmessages: + msg67671
2008-06-02 20:41:34doerwaltersetstatus: open -> closed
resolution: fixed
messages: + msg67639
nosy: + doerwalter
2007-04-24 10:47:21vlbromcreate