This is a follow up on https://bugs.python.org/issue37751 concerning normalization of codec names.
First of all, the changes made therein are not documented correctly.
In the implementation
| Normalization works as follows: all non-alphanumeric
| characters except the dot used for Python package names are
| collapsed and replaced with a single underscore, e.g. ' -;#'
| becomes '_'. Leading and trailing underscores are removed.”
Cf. [encodings/__init__.py](https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Lib/encodings/__init__.py#L47-L50)
The documentation however only states that:
| Search functions are expected to take one argument, being the encoding name in all lower case letters with hyphens and spaces converted to underscores
Cf. https://docs.python.org/3/library/codecs.html#codecs.register
Secondly, this change breaks lots of iconv codecs with the python-iconv binding. E.g. `ASCII//TRANSLIT` is now normalized to `ascii_translit`, which iconv does not understand. Codec names which use hyphens also break and iinm not all of them have aliases in iconv without hyphens.
Cf. [python-iconv #4](https://github.com/bodograumann/python-iconv/issues/4)
How about first looking up the given name and only then, if the given name could not be found, looking for the codec by its normalized name?
|