Message 142720 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	Arfrever, Rhamphoryncus, amaury.forgeotdarc, belopolsky, ezio.melotti, lemburg, tchrist, vstinner
Date	2011-08-22.12:13:58
SpamBayes Score	3.758105e-14
Marked as misclassified	No
Message-id	<1314015239.36.0.592430578881.issue9200@psf.upfronthosting.co.za>
In-reply-to

Content
It turned out that this can't be fixed in 2.7 unless we backport the patch in #5127 (it's in 3.2/3.3 but not in 2.7). IIUC the macro works fine and joins surrogate pairs to a Py_UCS4 char, but since the Py_UNICODE_IS* macros still expect Py_UCS2 on narrow builds on 2.7, the higher bits gets truncated and the macros return wrong results. So, for example >>> u'\ud800\udc42'.isupper() True because \ud800 + \udc42 = \U000100429 → \U000100429 gets truncated to \u0429 → \u0429 is the CYRILLIC CAPITAL LETTER SHCHA → .isupper() returns True. The current behavior is instead broken in another way, because it checks that u'\ud800'.isupper() and u'\udc42'.isupper() separately. Would it make sense to backport #5127 or should I just give up and leave it broken?

It turned out that this can't be fixed in 2.7 unless we backport the patch in #5127 (it's in 3.2/3.3 but not in 2.7).

IIUC the macro works fine and joins surrogate pairs to a Py_UCS4 char, but since the Py_UNICODE_IS* macros still expect Py_UCS2 on narrow builds on 2.7, the higher bits gets truncated and the macros return wrong results.

So, for example
    >>> u'\ud800\udc42'.isupper()
    True
because \ud800 + \udc42 = \U000100429  →  \U000100429 gets truncated to \u0429  →  \u0429 is the CYRILLIC CAPITAL LETTER SHCHA  →  .isupper() returns True.

The current behavior is instead broken in another way, because it checks that u'\ud800'.isupper() and u'\udc42'.isupper() separately.

Would it make sense to backport #5127 or should I just give up and leave it broken?

History
Date	User	Action	Args
2011-08-22 12:13:59	ezio.melotti	set	recipients: + ezio.melotti, lemburg, amaury.forgeotdarc, belopolsky, Rhamphoryncus, vstinner, Arfrever, tchrist
2011-08-22 12:13:59	ezio.melotti	set	messageid: <1314015239.36.0.592430578881.issue9200@psf.upfronthosting.co.za>
2011-08-22 12:13:58	ezio.melotti	link	issue9200 messages
2011-08-22 12:13:58	ezio.melotti	create