Message124174
On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach
<report@bugs.python.org> wrote:
..
> The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless you can prove that
> Py_UNICODE_SOME_TRANSFORMATION will never transform characters < 0x10000 into characters >
> 0x10000 or vice versa.
>
> Can we prove will always be the case, for current and future versions of Unicode, for all or almost-all of the
> transformations we care about?
>
Certainly not for all, but for some important transformations, I
believe Unicode Standard does promise that the transformation maps
BMP to BMP and supplements to supplements. For example case folding
and normalization are two important examples.
> Answering that question and figuring out what to do about it are probably more trouble than it's worth.
> If a particularly point proves to be a bottleneck, we can always specialize the code there later.
Agree. It is even more likely that the applications that have to deal
with lots of supplementary characters will be better off using a wide
unicode build rather than such specialization. |
|
Date |
User |
Action |
Args |
2010-12-17 02:13:49 | belopolsky | set | recipients:
+ belopolsky, lemburg, loewis, doerwalter, rhettinger, amaury.forgeotdarc, Rhamphoryncus, pitrou, vstinner, eric.smith, stutzbach, ezio.melotti |
2010-12-17 02:13:48 | belopolsky | link | issue10542 messages |
2010-12-17 02:13:47 | belopolsky | create | |
|