This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients doerwalter, lemburg, serhiy.storchaka, terry.reedy, vstinner, 王杰
Date 2015-12-26.22:05:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <567F0F15.10601@egenix.com>
In-reply-to <CAMpsgwY5i864Ot-itKA+7EqJohHtJkYpvubuKFsoSTcrN_SB+w@mail.gmail.com>
Content
On 26.12.2015 22:46, STINNER Victor wrote:
> 
> In Python, there are multiple implementations of the utf-8 codec with many
> shortcuts. I'm not surprised to see bugs depending on the exact syntax of
> the utf-8 codec name. Maybe we need to share even more code to normalize
> and compare codec names. (I think that py3 is better than py2 on this part.)

There's only one implementation (the one in unicodeobject.c), which is used
directly or via the wrapper in the encodings package, but there
are a few shortcuts to bypass the codec registry scattered around
the code since UTF-8 is such a commonly used codec.

In the case in question, the codec registry should trigger decoding
via the encodings package (rather than going directly to C APIs),
so will eventually end up using the same code. I wonder why this does not
trigger the exception.
History
Date User Action Args
2015-12-26 22:05:17lemburgsetrecipients: + lemburg, doerwalter, terry.reedy, vstinner, serhiy.storchaka, 王杰
2015-12-26 22:05:17lemburglinkissue25937 messages
2015-12-26 22:05:17lemburgcreate