Message 124174 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, lemburg, loewis, pitrou, rhettinger, stutzbach, vstinner
Date	2010-12-17.02:13:47
SpamBayes Score	1.5459856e-13
Marked as misclassified	No
Message-id	<AANLkTinHHweu3LFWnCxsXHNdKCz0icp8T2i1P7Tdzf-C@mail.gmail.com>
In-reply-to	<1292022566.21.0.51776102591.issue10542@psf.upfronthosting.co.za>

Content
On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach <report@bugs.python.org> wrote: .. > The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless you can prove that > Py_UNICODE_SOME_TRANSFORMATION will never transform characters < 0x10000 into characters > > 0x10000 or vice versa. > > Can we prove will always be the case, for current and future versions of Unicode, for all or almost-all of the > transformations we care about? > Certainly not for all, but for some important transformations, I believe Unicode Standard does promise that the transformation maps BMP to BMP and supplements to supplements. For example case folding and normalization are two important examples. > Answering that question and figuring out what to do about it are probably more trouble than it's worth. > If a particularly point proves to be a bottleneck, we can always specialize the code there later. Agree. It is even more likely that the applications that have to deal with lots of supplementary characters will be better off using a wide unicode build rather than such specialization.

On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach
<report@bugs.python.org> wrote:
..
> The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless you can prove that
> Py_UNICODE_SOME_TRANSFORMATION will never transform characters < 0x10000 into characters >
> 0x10000 or vice versa.
>
> Can we prove will always be the case, for current and future versions of Unicode, for all or almost-all of the
> transformations we care about?
>
Certainly not for all, but for some important transformations, I
believe Unicode Standard does promise that the transformation  maps
BMP to BMP and supplements to supplements.  For example case folding
and normalization are two important examples.

> Answering that question and figuring out what to do about it are probably more trouble than it's worth.
>  If a particularly point proves to be a bottleneck, we can always specialize the code there later.

Agree.  It is even more likely that the applications that have to deal
with lots of supplementary characters will be better off using a wide
unicode build rather than such specialization.

History
Date	User	Action	Args
2010-12-17 02:13:49	belopolsky	set	recipients: + belopolsky, lemburg, loewis, doerwalter, rhettinger, amaury.forgeotdarc, Rhamphoryncus, pitrou, vstinner, eric.smith, stutzbach, ezio.melotti
2010-12-17 02:13:48	belopolsky	link	issue10542 messages
2010-12-17 02:13:47	belopolsky	create