This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients doerwalter, ezio.melotti, lemburg, ncoghlan, python-dev, serhiy.storchaka, vstinner
Date 2013-11-23.02:16:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Just noting the exact list of codecs that currently bypass the full codec machinery and go direct to the C implementation by normalising the codec name (which includes forcing to lowercase) and then using strcmp to check against a specific set of known encodings.

In PyUnicode_Decode (and hence bytes.decode and bytearray.decode):

mbcs (Windows only)

In PyUnicode_AsEncodedString (and hence str.encode), the list is mostly the same, but utf-16 and utf-32 are not accelerated (i.e. they're currently still looked up through the codec machinery).

It may be worth opening a separate issue to restore the consistency between the lists by adding utf-16 and utf-32 to the fast path for encoding as well.

As far as the wrapping mechanism from issue #17828 itself goes:

- it only triggers if PyEval_CallObject on the encoder or decoder returns NULL
- stateful exceptions (which includes UnicodeEncodeError and UnicodeDecodeError) and those with custom __init__ or __new__ implementations don't get wrapped
- the actual wrapping process is just the C equivalent of "raise type(exc)(new_msg) from exc", plus the initial checks to determine if the current exception can be wrapped safely
- it applies to the *general purpose* codec machinery, not just to the text model related convenience methods
Date User Action Args
2013-11-23 02:16:46ncoghlansetrecipients: + ncoghlan, lemburg, doerwalter, vstinner, ezio.melotti, python-dev, serhiy.storchaka
2013-11-23 02:16:46ncoghlansetmessageid: <>
2013-11-23 02:16:46ncoghlanlinkissue19619 messages
2013-11-23 02:16:45ncoghlancreate