Message81115
> I must be missing some detail, but what does the Unicode database
> have to do with the unicodeobject.c C API ?
Ah, now I understand your concerns. My suggestion is to change only the 20 functions in
unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLowercase... and no change in
unicodeobject.c at all.
They all take a single code point as argument, some also return a single code point.
Changing these functions is backwards compatible.
I join a patch so we can argue on concrete code (tests are missing).
Another effect of the patch: unicodedata.numeric('\N{AEGEAN NUMBER TWO}') can return 2.0.
The str.isalpha() (and others) methods did not change: they still split the surrogate pairs. |
|
Date |
User |
Action |
Args |
2009-02-03 23:25:50 | amaury.forgeotdarc | set | recipients:
+ amaury.forgeotdarc, lemburg, vstinner, ezio.melotti, bupjae |
2009-02-03 23:25:49 | amaury.forgeotdarc | set | messageid: <1233703549.72.0.567542361919.issue5127@psf.upfronthosting.co.za> |
2009-02-03 23:25:48 | amaury.forgeotdarc | link | issue5127 messages |
2009-02-03 23:25:47 | amaury.forgeotdarc | create | |
|