This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: codecs error handler is called with a UnicodeDecodeError with the same args
Type: behavior Stage:
Components: Unicode Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, doerwalter, ezio.melotti, lemburg, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2012-01-19 19:56 by amaury.forgeotdarc, last changed 2022-04-11 14:57 by admin.

Messages (4)
msg151650 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-01-19 19:56
The script below shows that the error handler is always called with the same error object.  The 'start', 'end', and 'reason' properties are correctly updated, but the 'args' is always the same and holds the values used for the first call.

It's a bit weird that error.args[2] is not equal to error.start, for example. All versions are affected: 2.7, 3.2, 3.3.
And by the way, I could not find where these are attributes documented.



def custom_handler(error):
    print(error.args,
          (error.start, error.end, error.reason))
    return b'?'.decode(), error.end

import codecs
codecs.register_error('custom', custom_handler)
b'\x80\xd0'.decode('utf-8', 'custom')
msg152528 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2012-02-03 15:56
See this ancient posting about this problem:

   http://mail.python.org/pipermail/python-dev/2002-August/027661.html

(see point 4.). So I guess somebody did finally complain! ;)

The error attributes are documented in PEP 293. The existence of the attributes is documented in Doc/c-api/exceptions.rst, but not their meaning.
msg152573 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-04 00:52
Codec encoders reuse the same exception object for speed, but set some attributes (start, end and reason). Recreate the args tuple each time that a attribute is set. UnicodeEncodeError and UnicodeDecodeError should maybe override args getter to create a new tuple at each call.
msg313062 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-28 18:24
For reference, this behavior was from beginning, since implementing PEP 293 in issue432401.
History
Date User Action Args
2022-04-11 14:57:25adminsetgithub: 58038
2018-02-28 18:24:13serhiy.storchakasetmessages: + msg313062
2018-02-28 11:32:44serhiy.storchakasetnosy: + serhiy.storchaka
2012-02-04 00:52:46vstinnersetmessages: + msg152573
2012-02-03 15:56:39doerwaltersetnosy: + doerwalter
messages: + msg152528
2012-02-03 14:37:07eric.araujosetnosy: + lemburg, vstinner
2012-01-19 19:56:36amaury.forgeotdarccreate