This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: utf-8 or utf8 or utf-8 (codec display name inconsistency)
Type: enhancement Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, ezio.melotti, kennyluck, python-dev, vstinner
Priority: low Keywords:

Created on 2012-01-31 17:27 by kennyluck, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg152399 - (view) Author: Kang-Hao (Kenny) Lu (kennyluck) Date: 2012-01-31 17:27
Since Python 3.2.2 (I don't have earlier version to test with),

>>> "\udc80".encode("utf-8")
UnicodeEncodeError: *utf-8* codec can't encode character '\udc80'...

but

>>> b"\xff".decode("utf-8")
UnicodeDecodeError: *utf8* codec can't decode byte 0xff in position 0

and the table on the documentation of the codec module suggests *utf_8* as the name of the codec, which I believe to be equivalent to "utf_8" because '-' is not a valid character of an identifier.

Can we at least make the above two consistent? I would go for "utf-8", which was probably introduced for rejecting surrogates, but "utf8" has been there for years. What do we do? I am happy to submit patches for all branches. These are one-liners anyway.

The backward compatibility risk should be pretty low as usually you don't get encoding from these errors and I don't see any use of PyUnicode(Encode|Decode)Error_GetEncoding in trunk, although I'm using it for issue #12892. 

Also, "latin_1" displays as *latin-1* but "iso2022-jp" displays as *iso2022_jp*. I care less about this nit though.
msg152421 - (view) Author: Kang-Hao (Kenny) Lu (kennyluck) Date: 2012-02-01 00:42
> and the table on the documentation of the codec module suggests *utf_8*
> as the name of the codec, which I believe to be equivalent to "utf_8"
> because '-' is not a valid character of an identifier.

typo: equivalent to "utf_8" → equivalent to "utf-8".
msg153308 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-02-14 00:17
New changeset c861c0a7f40c by Victor Stinner in branch '3.2':
Issue #13913: normalize utf-8 codec name in UTF-8 decoder
http://hg.python.org/cpython/rev/c861c0a7f40c

New changeset af1a9508f7fa by Victor Stinner in branch 'default':
(Merge 3.2) Issue #13913: normalize utf-8 codec name in UTF-8 decoder
http://hg.python.org/cpython/rev/af1a9508f7fa
msg153309 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-14 00:19
Use codecs.lookup(alias).name to the the normalize name of a codec. Examples:

>>> import codecs
>>> codecs.lookup('utf-8').name
'utf-8'
>>> codecs.lookup('iso-8859-1').name
'iso8859-1'
>>> codecs.lookup('latin1').name
'iso8859-1'
>>> codecs.lookup('iso2022_jp').name
'iso2022_jp'

All issues look to be addressed, so I close the issue. Thanks for the report!
msg153417 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-02-15 17:09
You need to update test_pep3120: http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.2/builds/910/steps/test/logs/stdio/text
msg153437 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-02-15 21:25
New changeset 5b8f146103fa by Victor Stinner in branch '3.2':
Issue #13913: Fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/5b8f146103fa

New changeset 170a224ce01e by Victor Stinner in branch 'default':
(Merge 3.2) Issue #13913: Fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/170a224ce01e
msg153446 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-02-15 22:44
New changeset 824ddf6a30f2 by Victor Stinner in branch '3.2':
Issue #13913: Another fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/824ddf6a30f2

New changeset 2cfba214c243 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #13913: Another fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/2cfba214c243
History
Date User Action Args
2022-04-11 14:57:26adminsetgithub: 58121
2012-02-15 22:44:40python-devsetmessages: + msg153446
2012-02-15 21:25:02python-devsetmessages: + msg153437
2012-02-15 17:09:06eric.araujosetnosy: + eric.araujo
messages: + msg153417
2012-02-14 00:19:45vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg153309

resolution: fixed
2012-02-14 00:17:36python-devsetnosy: + python-dev
messages: + msg153308
2012-02-04 08:21:56eric.araujosetpriority: normal -> low
type: behavior -> enhancement
versions: - Python 2.7, Python 3.2
2012-02-01 00:42:29kennylucksetmessages: + msg152421
2012-01-31 17:28:42kennylucksettype: behavior
2012-01-31 17:27:56kennyluckcreate