Message 142177 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date	2011-08-16.09:18:45
SpamBayes Score	0.00020139122
Marked as misclassified	No
Message-id	<1313486199.3542.3.camel@localhost.localdomain>
In-reply-to	<1313485930.8.0.601749695449.issue10542@psf.upfronthosting.co.za>

Content
> I think the 4 macros: > #define _Py_UNICODE_ISSURROGATE > #define _Py_UNICODE_ISHIGHSURROGATE > #define _Py_UNICODE_ISLOWSURROGATE > #define _Py_UNICODE_JOIN_SURROGATES > are quite straightforward and can avoid using the trailing _. I don't want to bikeshed, but can we have proper consistent word separation? _Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE (etc.) > > we will still have to deal with surrogates in codecs, > > which is where these macros will get used > > They will also be used in many str methods and afaiu PEP 393 should > address that. I'm not sure it addresses codecs and builtin functions > like chr() and ord() too. AFAIU, PEP 393 avoids producing surrogate pairs in the canonical internal representation (that's one of its selling points). Only the UTF-16 codecs would need to deal with surrogate pairs, in the encoded form.

> I think the 4 macros:
>  #define _Py_UNICODE_ISSURROGATE
>  #define _Py_UNICODE_ISHIGHSURROGATE
>  #define _Py_UNICODE_ISLOWSURROGATE
>  #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.

I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)

> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
> 
> They will also be used in many str methods and afaiu PEP 393 should
> address that.  I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.

AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form.

History
Date	User	Action	Args
2011-08-16 09:18:46	pitrou	set	recipients: + pitrou, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, vstinner, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-16 09:18:45	pitrou	link	issue10542 messages
2011-08-16 09:18:45	pitrou	create