Author amaury.forgeotdarc
Recipients amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date 2009-02-03.23:25:46
SpamBayes Score 4.7217e-05
Marked as misclassified No
Message-id <1233703549.72.0.567542361919.issue5127@psf.upfronthosting.co.za>
In-reply-to
Content
> I must be missing some detail, but what does the Unicode database
> have to do with the unicodeobject.c C API ?

Ah, now I understand your concerns. My suggestion is to change only the 20 functions in 
unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLowercase... and no change in 
unicodeobject.c at all.
They all take a single code point as argument, some also return a single code point.
Changing these functions is backwards compatible.

I join a patch so we can argue on concrete code (tests are missing).

Another effect of the patch: unicodedata.numeric('\N{AEGEAN NUMBER TWO}') can return 2.0.

The str.isalpha() (and others) methods did not change: they still split the surrogate pairs.
History
Date User Action Args
2009-02-03 23:25:50amaury.forgeotdarcsetrecipients: + amaury.forgeotdarc, lemburg, vstinner, ezio.melotti, bupjae
2009-02-03 23:25:49amaury.forgeotdarcsetmessageid: <1233703549.72.0.567542361919.issue5127@psf.upfronthosting.co.za>
2009-02-03 23:25:48amaury.forgeotdarclinkissue5127 messages
2009-02-03 23:25:47amaury.forgeotdarccreate