This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author bodograumann
Recipients bodograumann, ezio.melotti, vstinner
Date 2021-07-23.10:18:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1627035539.3.0.918716906127.issue44723@roundup.psfhosted.org>
In-reply-to
Content
This is a follow up on https://bugs.python.org/issue37751 concerning normalization of codec names.

First of all, the changes made therein are not documented correctly.
In the implementation
| Normalization works as follows: all non-alphanumeric
| characters except the dot used for Python package names are
| collapsed and replaced with a single underscore, e.g. '  -;#'
| becomes '_'. Leading and trailing underscores are removed.”
Cf. [encodings/__init__.py](https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Lib/encodings/__init__.py#L47-L50)

The documentation however only states that:
| Search functions are expected to take one argument, being the encoding name in all lower case letters with hyphens and spaces converted to underscores
Cf. https://docs.python.org/3/library/codecs.html#codecs.register

Secondly, this change breaks lots of iconv codecs with the python-iconv binding. E.g. `ASCII//TRANSLIT` is now normalized to `ascii_translit`, which iconv does not understand. Codec names which use hyphens also break and iinm not all of them have aliases in iconv without hyphens.
Cf. [python-iconv #4](https://github.com/bodograumann/python-iconv/issues/4)

How about first looking up the given name and only then, if the given name could not be found, looking for the codec by its normalized name?
History
Date User Action Args
2021-07-23 10:18:59bodograumannsetrecipients: + bodograumann, vstinner, ezio.melotti
2021-07-23 10:18:59bodograumannsetmessageid: <1627035539.3.0.918716906127.issue44723@roundup.psfhosted.org>
2021-07-23 10:18:59bodograumannlinkissue44723 messages
2021-07-23 10:18:59bodograumanncreate