Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8 or utf8 or utf-8 (codec display name inconsistency) #58121

Closed
kennyluck mannequin opened this issue Jan 31, 2012 · 7 comments
Closed

utf-8 or utf8 or utf-8 (codec display name inconsistency) #58121

kennyluck mannequin opened this issue Jan 31, 2012 · 7 comments
Labels
topic-unicode type-feature A feature request or enhancement

Comments

@kennyluck
Copy link
Mannequin

kennyluck mannequin commented Jan 31, 2012

BPO 13913
Nosy @vstinner, @ezio-melotti, @merwok

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-02-14.00:19:45.958>
created_at = <Date 2012-01-31.17:27:56.095>
labels = ['type-feature', 'expert-unicode']
title = 'utf-8 or utf8 or utf-8 (codec display name inconsistency)'
updated_at = <Date 2012-02-15.22:44:40.620>
user = 'https://bugs.python.org/kennyluck'

bugs.python.org fields:

activity = <Date 2012-02-15.22:44:40.620>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2012-02-14.00:19:45.958>
closer = 'vstinner'
components = ['Unicode']
creation = <Date 2012-01-31.17:27:56.095>
creator = 'kennyluck'
dependencies = []
files = []
hgrepos = []
issue_num = 13913
keywords = []
message_count = 7.0
messages = ['152399', '152421', '153308', '153309', '153417', '153437', '153446']
nosy_count = 5.0
nosy_names = ['vstinner', 'ezio.melotti', 'eric.araujo', 'python-dev', 'kennyluck']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue13913'
versions = ['Python 3.3']

@kennyluck
Copy link
Mannequin Author

kennyluck mannequin commented Jan 31, 2012

Since Python 3.2.2 (I don't have earlier version to test with),

>>> "\udc80".encode("utf-8")
UnicodeEncodeError: *utf-8* codec can't encode character '\udc80'...

but

>>> b"\xff".decode("utf-8")
UnicodeDecodeError: *utf8* codec can't decode byte 0xff in position 0

and the table on the documentation of the codec module suggests *utf_8* as the name of the codec, which I believe to be equivalent to "utf_8" because '-' is not a valid character of an identifier.

Can we at least make the above two consistent? I would go for "utf-8", which was probably introduced for rejecting surrogates, but "utf8" has been there for years. What do we do? I am happy to submit patches for all branches. These are one-liners anyway.

The backward compatibility risk should be pretty low as usually you don't get encoding from these errors and I don't see any use of PyUnicode(Encode|Decode)Error_GetEncoding in trunk, although I'm using it for issue bpo-12892.

Also, "latin_1" displays as *latin-1* but "iso2022-jp" displays as *iso2022_jp*. I care less about this nit though.

@kennyluck kennyluck mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 31, 2012
@kennyluck
Copy link
Mannequin Author

kennyluck mannequin commented Feb 1, 2012

and the table on the documentation of the codec module suggests *utf_8*
as the name of the codec, which I believe to be equivalent to "utf_8"
because '-' is not a valid character of an identifier.

typo: equivalent to "utf_8" → equivalent to "utf-8".

@merwok merwok added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Feb 4, 2012
@python-dev
Copy link
Mannequin

python-dev mannequin commented Feb 14, 2012

New changeset c861c0a7f40c by Victor Stinner in branch '3.2':
Issue bpo-13913: normalize utf-8 codec name in UTF-8 decoder
http://hg.python.org/cpython/rev/c861c0a7f40c

New changeset af1a9508f7fa by Victor Stinner in branch 'default':
(Merge 3.2) Issue bpo-13913: normalize utf-8 codec name in UTF-8 decoder
http://hg.python.org/cpython/rev/af1a9508f7fa

@vstinner
Copy link
Member

Use codecs.lookup(alias).name to the the normalize name of a codec. Examples:

>>> import codecs
>>> codecs.lookup('utf-8').name
'utf-8'
>>> codecs.lookup('iso-8859-1').name
'iso8859-1'
>>> codecs.lookup('latin1').name
'iso8859-1'
>>> codecs.lookup('iso2022_jp').name
'iso2022_jp'

All issues look to be addressed, so I close the issue. Thanks for the report!

@merwok
Copy link
Member

merwok commented Feb 15, 2012

@python-dev
Copy link
Mannequin

python-dev mannequin commented Feb 15, 2012

New changeset 5b8f146103fa by Victor Stinner in branch '3.2':
Issue bpo-13913: Fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/5b8f146103fa

New changeset 170a224ce01e by Victor Stinner in branch 'default':
(Merge 3.2) Issue bpo-13913: Fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/170a224ce01e

@python-dev
Copy link
Mannequin

python-dev mannequin commented Feb 15, 2012

New changeset 824ddf6a30f2 by Victor Stinner in branch '3.2':
Issue bpo-13913: Another fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/824ddf6a30f2

New changeset 2cfba214c243 by Victor Stinner in branch 'default':
(Merge 3.2) Issue bpo-13913: Another fix test_pep3120 for the UTF-8 codec name
http://hg.python.org/cpython/rev/2cfba214c243

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-unicode type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants