classification
Title: UnicodeError becomes unpicklable if data is appended to args
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: João Eiras, serhiy.storchaka
Priority: normal Keywords:

Created on 2020-02-25 16:15 by João Eiras, last changed 2020-03-01 17:55 by João Eiras.

Files
File name Uploaded Description Edit
test_unicode_error_unpickle.py João Eiras, 2020-02-25 16:15 Test case
Messages (4)
msg362648 - (view) Author: João Eiras (João Eiras) Date: 2020-02-25 16:15
Given some exception `ex`, you can append data like
  ex.args += (value1, value2, ...)
and then re-raise.

This is something I do in my projects to sometime propagate context when errors are raised, e.g., stacktraces across process boundaries or blobs of text with pickling or unicode errors.

When this is done with UnicodeError, the exception becomes non-unpicklable:

  TypeError: function takes exactly 5 arguments (6 given)

Example:
    import pickle

    def test_unicode_error_unpickle():
        ex0 = UnicodeEncodeError('ascii','message', 1, 2, 'e')
        ex0.args += ("extra context",)
        ex1 = pickle.loads(pickle.dumps(ex0))
        assert type(ex0).args == type(ex1).args
        assert ex0.args == ex1.args

The issue seems to be UnicodeEncodeError_init() at https://github.com/python/cpython/blob/v3.8.1/Objects/exceptions.c#L1895 and also UnicodeDecodeError_init().

The BaseException is initialized, but then Unicode*Error_init() tries to reparse the arguments and does not tolerate extra values.

This because BaseException.__reduce__ return a tuple (class,args).
msg362655 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-25 18:28
Yes, this is how exceptions are pickled. You save an information which which allows you to recreate the exception. If you fake args, you do not able to recreate it.

Just do not do this.
msg362695 - (view) Author: João Eiras (João Eiras) Date: 2020-02-26 15:07
Hi.
It's perfectly fine for classes to have their public APIs and intended uses.

But then unpickling would be the worst place to complain, specially when running 40 parallel processes while an unhandled stacktrace appears between a couple hundred thousand lines of logging data :)
msg363071 - (view) Author: João Eiras (João Eiras) Date: 2020-03-01 17:55
On a related note, after inspecting the UnicodeEror C code, the exception object keeps explicit references to 'encoding', 'object', 'start', 'end' and 'reason'. That means that if those properties are set (the C code does have setters) then the properties stored in UnicodeError go out of sync with the args tuple in BaseException. And so pickling and unpickling will restore the original values, and not those that were set after the exception being created, unless args is too modified.

My suggestion would be to just fetch all 5 properties from the args tuple inside the getters and setters, and in the setters, recreate the tuple with a modified value. The constructor could do argument validation but would not set any properties, because that would be delegated to BaseException.
History
Date User Action Args
2020-03-01 17:55:50João Eirassetmessages: + msg363071
2020-02-26 15:07:25João Eirassetmessages: + msg362695
2020-02-25 18:28:44serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg362655
2020-02-25 16:15:36João Eirascreate