Message142177
> I think the 4 macros:
> #define _Py_UNICODE_ISSURROGATE
> #define _Py_UNICODE_ISHIGHSURROGATE
> #define _Py_UNICODE_ISLOWSURROGATE
> #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.
I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)
> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
>
> They will also be used in many str methods and afaiu PEP 393 should
> address that. I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.
AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form. |
|
Date |
User |
Action |
Args |
2011-08-16 09:18:46 | pitrou | set | recipients:
+ pitrou, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, vstinner, eric.smith, stutzbach, ezio.melotti, tchrist |
2011-08-16 09:18:45 | pitrou | link | issue10542 messages |
2011-08-16 09:18:45 | pitrou | create | |
|