Message339568
str.capitalize appears to uppercase the first character of the string, which is okay for ASCII but not for non-English letters.
For example, the letter NJ in Croatian appears as Nj at the start of words when the first character is capitalized:
Njemačka ('Germany'), not NJemačka.
(In ASCII, that's Njemacka not NJemacka.)
https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Digraphs
But using any of:
U+01CA LATIN CAPITAL LETTER NJ
U+01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J
U+01CC LATIN SMALL LETTER NJ
we get the wrong result with capitalize:
py> 'NJemačka'.capitalize()
'NJemačka'
py> 'Njemačka'.capitalize()
'NJemačka'
py> 'njemačka'.capitalize()
'NJemačka'
I believe that the correct behaviour is to titlecase the first code point and lowercase the rest, which is what the Apache library here does:
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#capitalize-java.lang.String- |
|
Date |
User |
Action |
Args |
2019-04-07 10:40:51 | steven.daprano | set | recipients:
+ steven.daprano |
2019-04-07 10:40:51 | steven.daprano | set | messageid: <1554633651.74.0.204071668882.issue36549@roundup.psfhosted.org> |
2019-04-07 10:40:51 | steven.daprano | link | issue36549 messages |
2019-04-07 10:40:51 | steven.daprano | create | |
|