This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pitrou
Recipients Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date 2011-08-16.09:18:45
SpamBayes Score 0.00020139122
Marked as misclassified No
Message-id <1313486199.3542.3.camel@localhost.localdomain>
In-reply-to <1313485930.8.0.601749695449.issue10542@psf.upfronthosting.co.za>
Content
> I think the 4 macros:
>  #define _Py_UNICODE_ISSURROGATE
>  #define _Py_UNICODE_ISHIGHSURROGATE
>  #define _Py_UNICODE_ISLOWSURROGATE
>  #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.

I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)

> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
> 
> They will also be used in many str methods and afaiu PEP 393 should
> address that.  I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.

AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form.
History
Date User Action Args
2011-08-16 09:18:46pitrousetrecipients: + pitrou, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, vstinner, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-16 09:18:45pitroulinkissue10542 messages
2011-08-16 09:18:45pitroucreate