This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pgimeno
Recipients pgimeno
Date 2019-11-14.14:30:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1573741824.98.0.130776319775.issue38800@roundup.psfhosted.org>
In-reply-to
Content
When implementing an error handler, it must return a tuple consisting of a substitution string and a position where to resume decoding. In the case of the UTF-8 codec, the resume position is ignored, and it always resumes immediately after the character that caused the error.

To reproduce, use this code:

import codecs
codecs.register_error('err', lambda err: (b'x', err.end + 1))
assert repr(u'\uDD00yz'.encode('utf8', errors='err')) == b'xz'

The above code fails the assertion because the result is b'xyz'.

It works OK for some other codecs. I have not tried to make an exhaustive list of which ones work and which ones don't, therefore this problem might apply to others.
History
Date User Action Args
2019-11-14 14:30:25pgimenosetrecipients: + pgimeno
2019-11-14 14:30:24pgimenosetmessageid: <1573741824.98.0.130776319775.issue38800@roundup.psfhosted.org>
2019-11-14 14:30:24pgimenolinkissue38800 messages
2019-11-14 14:30:24pgimenocreate