New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf-8 or utf8 or utf-8 (codec display name inconsistency) #58121
Comments
Since Python 3.2.2 (I don't have earlier version to test with), >>> "\udc80".encode("utf-8")
UnicodeEncodeError: *utf-8* codec can't encode character '\udc80'... but >>> b"\xff".decode("utf-8")
UnicodeDecodeError: *utf8* codec can't decode byte 0xff in position 0 and the table on the documentation of the codec module suggests *utf_8* as the name of the codec, which I believe to be equivalent to "utf_8" because '-' is not a valid character of an identifier. Can we at least make the above two consistent? I would go for "utf-8", which was probably introduced for rejecting surrogates, but "utf8" has been there for years. What do we do? I am happy to submit patches for all branches. These are one-liners anyway. The backward compatibility risk should be pretty low as usually you don't get encoding from these errors and I don't see any use of PyUnicode(Encode|Decode)Error_GetEncoding in trunk, although I'm using it for issue bpo-12892. Also, "latin_1" displays as *latin-1* but "iso2022-jp" displays as *iso2022_jp*. I care less about this nit though. |
typo: equivalent to "utf_8" → equivalent to "utf-8". |
New changeset c861c0a7f40c by Victor Stinner in branch '3.2': New changeset af1a9508f7fa by Victor Stinner in branch 'default': |
Use codecs.lookup(alias).name to the the normalize name of a codec. Examples: >>> import codecs
>>> codecs.lookup('utf-8').name
'utf-8'
>>> codecs.lookup('iso-8859-1').name
'iso8859-1'
>>> codecs.lookup('latin1').name
'iso8859-1'
>>> codecs.lookup('iso2022_jp').name
'iso2022_jp' All issues look to be addressed, so I close the issue. Thanks for the report! |
You need to update test_pep3120: http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.2/builds/910/steps/test/logs/stdio/text |
New changeset 5b8f146103fa by Victor Stinner in branch '3.2': New changeset 170a224ce01e by Victor Stinner in branch 'default': |
New changeset 824ddf6a30f2 by Victor Stinner in branch '3.2': New changeset 2cfba214c243 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: