New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str.upper converts to title #56413
Comments
specification str.upper()¶
str.isupper()¶
>>> '\u1ff3'
'ῳ'
>>> '\u1ff3'.islower()
True
>>> '\u1ff3'.upper()
'ῼ'
>>> '\u1ff3'.upper().isupper()
False
>>> |
'\u1ff3'.upper() returns '\u1ffc', so we have: The entries for these two chars in the UnicodeData.txt0 files are: U+1FF3 has U+1FFC in both the third last and last field (Simple_Uppercase_Mapping and Simple_Titlecase_Mapping respectively -- see 1), so .upper() is doing the right thing here. The Unicode Standard Annex #442 defines the Lt category as: I'm not sure there's anything to fix here, both function behave as documented, and it might indeed be the case that .upper() returns chars with category Lt, that then return False with .isupper() |
Ezio Melotti wrote:
I think there's a misunderstanding here: title cased characters Note that .upper() also does not guarantee to return an upper The German ß is such a character (U+00DF). It doesn't have The character is normally mapped to 'SS' when converting it I suggest to close this ticket as invalid or to add a note |
A note sounds good. |
Here's a patch. |
New patch that factors out the definition of cased characters adding it to a footnote. |
Patch looks good, with one issue: I’ve never encountered “cased character” before, is it an accepted term or an invention in our docs? |
I think it's an invention, but its meaning is quite clear to me. |
New changeset 16edc5cf4a79 by Ezio Melotti in branch '3.2': New changeset fb49394f75ed by Ezio Melotti in branch '2.7': New changeset c821e3a54930 by Ezio Melotti in branch 'default': |
Fixed, thanks for the report! |
Are you sure this should have been backported? Are there any apps that may be working now but won't be after the next point release? |
This is only a doc patch, maybe you are confusing this issue with bpo-12266? |
Right. I was looking at the other patches that went in in the last 24 hours. |
It's unlikely that bpo-12266 might break apps. The behavior changed only for fairly unusual characters, and the old behavior was clearly wrong. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: