This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: narrow build incorrectly translates cases for non-BMP code points
Type: behavior Stage: test needed
Components: Interpreter Core, Unicode Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: Rhamphoryncus, amaury.forgeotdarc, exarkun, ezio.melotti, lemburg, loewis
Priority: normal Keywords:

Created on 2010-01-10 05:27 by exarkun, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg97500 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-01-10 05:27
This issue may extend beyond just unicode.upper() and unicode.lower(), but it's very clear with these two methods, at least.

For example, consider DESERET SMALL LETTER EW.  On a UTF-16 build, calling upper on a string containing this doesn't change it to the capital variation (DESERET CAPITAL LETTER EW):

>>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}'
True

It can also be seen that this isn't even recognized as lower case:

>>> u'\N{DESERET SMALL LETTER EW}'.islower()
False

With a UTF-32 build, however, the expected behavior (ie, the behavior one would get for a code point in the BMP with small and capital variations) is provided.
msg97501 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2010-01-10 05:32
See also issue5127.
msg97529 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-01-10 18:00
This is a duplicate of http://bugs.python.org/issue5127
msg122569 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-11-27 22:20
This is not yet fixed but will be addressed in #10521 and #10542.
History
Date User Action Args
2022-04-11 14:56:56adminsetgithub: 51912
2010-11-27 23:31:48ezio.melottisettitle: UCS4 build incorrectly translates cases for non-BMP code points -> narrow build incorrectly translates cases for non-BMP code points
2010-11-27 22:20:38ezio.melottisetmessages: + msg122569
2010-01-10 18:04:27ezio.melottilinkissue5127 superseder
2010-01-10 18:00:12lemburgsetstatus: open -> closed
title: UTF-16 build incorrectly translates cases for non-BMP code points -> UCS4 build incorrectly translates cases for non-BMP code points
nosy: + lemburg

messages: + msg97529

resolution: duplicate
2010-01-10 16:59:59ezio.melottisetnosy: + loewis, ezio.melotti, amaury.forgeotdarc
priority: normal
components: + Unicode
type: behavior
stage: test needed
2010-01-10 05:32:35Rhamphoryncussetnosy: + Rhamphoryncus
messages: + msg97501
2010-01-10 05:27:27exarkuncreate