Message 123290 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, eric.smith, ezio.melotti, lemburg, loewis, pitrou, rhettinger, vstinner
Date	2010-12-03.20:12:37
SpamBayes Score	3.4904218e-10
Marked as misclassified	No
Message-id	<4CF94F33.8050805@egenix.com>
In-reply-to	<AANLkTi=KyP7XoanUJ=qwM5OV2uGHX5RUcb5eV-rhLnft@mail.gmail.com>

Content
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger > <report@bugs.python.org> wrote: > .. >> I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator protocol is being used. >> > > As a data point, ICU defines U16_NEXT() for similar purpose. I also > like ICU terminology for surrogates ("lead" and "trail") better than > the backward "high" and "low". "High" and "low" are Unicode standard terms, so we should use those. Regarding Py_UCS4_READ_CODE_POINT: you're right that surrogates are code points, so how about Py_UCS4_READ_NEXT() ?! Regarding Py_UCS4_READ_NEXT() vs. Py_UNICODE_READ_NEXT(): the return value of the macro is a Py_UCS4 value, not a Py_UNICODE value. The first argument of the macro can be any array, not just Py_UNICODE, but also Py_UCS4 or even int. Py_UCS2_READ_NEXT() would be plain wrong :-) Also note that Python does have a Py_UCS4 type; it doesn't have a Py_UCS2 type. That's why we should use Py_UCS4*_READ_NEXT().

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger
> <report@bugs.python.org> wrote:
> ..
>> I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator protocol is being used.
>>
> 
> As a data point, ICU defines U16_NEXT() for similar purpose.  I also
> like ICU terminology for surrogates ("lead" and "trail") better than
> the backward "high" and "low". 

"High" and "low" are Unicode standard terms, so we should use
those.

Regarding Py_UCS4_READ_CODE_POINT: you're right that surrogates
are code points, so how about Py_UCS4_READ_NEXT() ?!

Regarding Py_UCS4_READ_NEXT() vs. Py_UNICODE_READ_NEXT(): the return
value of the macro is a Py_UCS4 value, not a Py_UNICODE value. The
first argument of the macro can be any array, not just Py_UNICODE*,
but also Py_UCS4* or even int*.

Py_UCS2_READ_NEXT() would be plain wrong :-) Also note that Python
does have a Py_UCS4 type; it doesn't have a Py_UCS2 type.

That's why we should use *Py_UCS4*_READ_NEXT().

History
Date	User	Action	Args
2010-12-03 20:12:53	lemburg	set	recipients: + lemburg, loewis, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, vstinner, eric.smith, ezio.melotti
2010-12-03 20:12:37	lemburg	link	issue10542 messages
2010-12-03 20:12:37	lemburg	create