classification
Title: string method .upper() converts 'ß' to 'SS' instead of 'ẞ'
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Germany made the upper case ß official. 'ß'.upper() should now return ẞ.
View: 30810
Assigned To: Nosy List: Marc Richter, steven.daprano, xtreak
Priority: normal Keywords:

Created on 2018-10-08 09:49 by Marc Richter, last changed 2018-10-08 10:30 by xtreak. This issue is now closed.

Messages (5)
msg327336 - (view) Author: Marc Richter (Marc Richter) Date: 2018-10-08 09:49
There's a special letter in German orthography called "eszett" (ß). This letter had no uppercase variant for hundreds of years until 2017, there was an uppercase variant added to the official German orthography called "capital eszett" (ẞ) [1].

Python's .upper() string method still translates this to "SS" (which was correct before 2017):

~ $ python3.7.0
Python 3.7.0 (default, Aug 29 2018, 17:15:17) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 'gruß'.upper()
'GRUSS'
>>>

The result of this example should have been 'GRUẞ' instead.
That being said, it's fair to inform about the fact that this letter is still quite unpopular in Germany; it is not even typeable with German keyboards, yet. Anyways, I think since this became officials orthography, it's not Python's job to adopt behaviors but clear rules instead.

I'm not sure if this affects .casefold() as well, since I do not get that method's scope.

BR,
Marc Richter


[1]: https://en.wikipedia.org/wiki/Capital_%E1%BA%9E
msg327337 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-10-08 10:14
We match the Unicode specification, not arbitrary language rules. (Austrian and Swiss German are, I believe, phasing out ß altogether, and haven't added an uppercase variant.)

Until the Unicode consortium change their case conversion rules, it is still correct for .upper() to convert 'ß' to 'SS'. The eszett is just one of the many annoying anomalies in case conversion, like Turkish dotted and dotless i. Natural language is hard, and messy.

http://unicode.org/faq/casemap_charprop.html
msg327338 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-08 10:24
Thanks for the report and details but I think this exact case was already discussed in issue30810.
msg327339 - (view) Author: Marc Richter (Marc Richter) Date: 2018-10-08 10:28
Sorry then; that did not show up in my search :/
Yes, seems like this is duplicating that one.
msg327340 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-08 10:30
No problem. Thanks for the confirmation I am closing this as a duplicate.
History
Date User Action Args
2018-10-08 10:30:57xtreaksetstatus: open -> closed
superseder: Germany made the upper case ß official. 'ß'.upper() should now return ẞ.
messages: + msg327340

resolution: duplicate
stage: resolved
2018-10-08 10:28:44Marc Richtersetmessages: + msg327339
2018-10-08 10:24:47xtreaksetnosy: + xtreak
messages: + msg327338
2018-10-08 10:14:30steven.dapranosetnosy: + steven.daprano
messages: + msg327337
2018-10-08 09:49:32Marc Richtercreate