Author pitrou
Recipients Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date 2011-08-16.09:18:45
SpamBayes Score 0.000201391
Marked as misclassified No
Message-id <1313486199.3542.3.camel@localhost.localdomain>
In-reply-to <1313485930.8.0.601749695449.issue10542@psf.upfronthosting.co.za>
Content
> I think the 4 macros:
>  #define _Py_UNICODE_ISSURROGATE
>  #define _Py_UNICODE_ISHIGHSURROGATE
>  #define _Py_UNICODE_ISLOWSURROGATE
>  #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.

I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)

> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
> 
> They will also be used in many str methods and afaiu PEP 393 should
> address that.  I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.

AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form.
History
Date User Action Args
2011-08-16 09:18:46pitrousetrecipients: + pitrou, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, vstinner, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-16 09:18:45pitroulinkissue10542 messages
2011-08-16 09:18:45pitroucreate