This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author libcthorne
Recipients libcthorne, martin.panter, methane
Date 2018-06-05.12:32:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1528201952.42.0.592728768989.issue33578@psf.upfronthosting.co.za>
In-reply-to
Content
Ah, good find. I suppose that means `MultibyteCodec_State` and `pending` are both needed to fully capture state, as is done in `decoder.getstate`/`setstate` by returning a tuple of both. Unfortunately `encoder.getstate` is defined to return an integer, and because `MultibyteCodec_State` can occupy 8 bytes, and `pending` can occupy 2 bytes (MAXENCPENDING), we get a total of 10 bytes which I think exceeds what a PyLong can represent.

Returning either `pending` or `MultibyteCodec_State` seems infeasible because `setstate` will not know how to process it, and both may be needed together.

Some alternatives could be:

1. If we are restricted to returning an integer, perhaps this integer could be an index that references a state in a pool of encoder states stored internally (effectively a pointer). Managing this state pool seems quite complex.

2. encoder.getstate could be redefined to return a tuple, but obviously this is a breaking change. Backwards compatibility could be somewhat preserved by allowing setstate to accept either an integer or tuple.

3. Remove `PyObject *pending` from `MultibyteStatefulEncoderContext` and change encoders to only use `MultibyteCodec_State`. Not sure how feasible this is.

I think approach 2 would be simplest and matches the decoder interface. 

Does anyone have any opinions or further alternatives?
History
Date User Action Args
2018-06-05 12:32:32libcthornesetrecipients: + libcthorne, methane, martin.panter
2018-06-05 12:32:32libcthornesetmessageid: <1528201952.42.0.592728768989.issue33578@psf.upfronthosting.co.za>
2018-06-05 12:32:32libcthornelinkissue33578 messages
2018-06-05 12:32:32libcthornecreate