This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: "utf8-sig" missing from codecs (inconsistency)
Type: behavior Stage:
Components: Unicode Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Peter Ludemann, ezio.melotti, vstinner
Priority: normal Keywords:

Created on 2019-12-29 18:14 by Peter Ludemann, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg358996 - (view) Author: Peter Ludemann (Peter Ludemann) Date: 2019-12-29 18:14
In general, 'utf8' and 'utf-8' are interchangeable in the codecs (and in many parts of the Python library). However, 'utf8-sig' is missing ... and it happens to also be generated by lib2to3.tokenize.detect_encoding.

>>> import codecs
>>> codecs.getincrementaldecoder('utf-8-sig')()
<encodings.utf_8_sig.IncrementalDecoder object at 0x7fecbcdbbc10>
>>> codecs.getincrementaldecoder('utf8-sig')()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/codecs.py", line 987, in getincrementaldecoder
    decoder = lookup(encoding).incrementaldecoder
LookupError: unknown encoding: utf8-sig
History
Date User Action Args
2022-04-11 14:59:24adminsetgithub: 83336
2019-12-29 18:14:13Peter Ludemanncreate