This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Infinite Recursion during Unpickling a codecs Object
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ThomasH, alexandre.vassalotti, lemburg, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2009-07-01 15:46 by ThomasH, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
codecs_stream_delegating.patch serhiy.storchaka, 2015-11-28 13:48 review
codecs_stream_delegating_2.patch serhiy.storchaka, 2015-12-02 19:59 review
codecs_stream_forbid_pickling.patch serhiy.storchaka, 2015-12-02 20:29 Forbid pickling and copying review
Messages (10)
msg89985 - (view) Author: (ThomasH) Date: 2009-07-01 15:46
I recently ran into an infinite recursion trying to unpickle a
codecs.StreamWriter object (I presume the issue would be the same for a
StreamReader).

Here is the end of the stack trace:

  File "/sw/lib/python2.5/codecs.py", line 330, in __getattr__    
    return getattr(self.stream, name)
  File "/sw/lib/python2.5/codecs.py", line 330, in __getattr__  
    return getattr(self.stream, name) 
  File "/sw/lib/python2.5/codecs.py", line 330, in __getattr__ 
    return getattr(self.stream, name) 
 RuntimeError: maximum recursion depth exceeded 

The issue is the same under Python2.6 but the error output has changed
(see http://bugs.python.org/issue5508).

The problem is that the codecs module tries to delegate member lookup to
the underlying stream. But after unpickling, "self.stream" is not
defined, so the reference to self.stream in __getattr__ itself leads to
an invocation of __getattr__ - hence the recursion loop.

Using tools from the Pickle protocol, like __getstate__/__setstate__,
could help degrade codecs objects gracefully during pickle/unpickle
cycles. E.g. it might be enough to provide a dummy self.stream through
__setstate__.
msg90180 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-07-06 14:26
I don't understand the use case for this. 

If the StreamWriter/Reader cannot pickle the underlying stream (which is
probably always the case), why should the object itself be pickleable ?

What we could do is implement .__get/setstate__() and have it raise an
exception to inform the user of the problem.
msg90258 - (view) Author: (ThomasH) Date: 2009-07-08 08:12
Sounds good to me. The main intention of this bug is not to make codecs
objects pickleable by all means (I don't know if this would be sensible
and/or possible), but at least to have them degrade gracefully during
pickling, particularly without infinite recursion. I've changed the bug
title to reflect this.
msg255540 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-28 13:48
Implementing only .__get/setstate__() doesn't fix all problem. We have implement also __getnewargs_ex__(). But implemented __getnewargs_ex__() has priority over __getnewargs__() implemented in subclasses.

And may be there are problems with other optional special methods that are incorrectly delegated to the stream in codecs IO classes.

I think more reliable way is to disallow delegating all special (or may be even private) methods. Here is a patch.
msg255756 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-02 19:16
> If the StreamWriter/Reader cannot pickle the underlying stream (which is
probably always the case), why should the object itself be pickleable ?

io.BytesIO() and io.StringIO() are pickleable.
msg255761 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-12-02 19:55
On 02.12.2015 20:16, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
>> If the StreamWriter/Reader cannot pickle the underlying stream (which is
>> probably always the case), why should the object itself be pickleable ?
> 
> io.BytesIO() and io.StringIO() are pickleable.

Ok, but I still don't see the use case :-)

I think all we need to do is add a .__reduce__()
method to StreamWriter and StreamReader, which then
raises a PickleError.

Example:

>>> import sys, codecs, pickle
>>> r = codecs.getreader('latin-1')
>>> class MyReader(r):
...    def __reduce__(self, *args):
...       raise pickle.PickleError
...
>>> s = MyReader(sys.stdin)
>>> pickle.dumps(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __reduce__
_pickle.PickleError
> <stdin>(3)__reduce__()
msg255763 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-02 19:59
Added tests for pickling and deepcopying.
msg255774 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-02 20:29
> I think all we need to do is add a .__reduce__()
> method to StreamWriter and StreamReader, which then
> raises a PickleError.

Rather TypeError. Yes, it is the least that we should to do in maintained releases. If codecs_stream_delegating_2.patch is considered too drastic for bugfix. But this can be only a part of problem. May be there are issues with other optional special methods. And adding __reduce_ex__ breaks subclass pickleability if it was implemented with __getstate__ and __getnewargs__.

Here is a patch for this way.
msg255785 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-12-02 21:53
On 02.12.2015 21:29, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
>> I think all we need to do is add a .__reduce__()
>> method to StreamWriter and StreamReader, which then
>> raises a PickleError.
> 
> Rather TypeError. Yes, it is the least that we should to do in maintained releases. If codecs_stream_delegating_2.patch is considered too drastic for bugfix. But this can be only a part of problem. May be there are issues with other optional special methods. And adding __reduce_ex__ breaks subclass pickleability if it was implemented with __getstate__ and __getnewargs__.
> 
> Here is a patch for this way.

Thanks. I think using __reduce__ instead of __reduce_ex__ is
safer, since subclasses will more likely override __reduce__
than __reduce_ex__.

The other approach will have too many backwards incompatible side
effects, e.g. the repr() over StreamReader instances would change.
msg256011 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-06 12:10
There is also a problem with deepcopying:

>>> class MemReader:
...     def __init__(self, data):
...         self.buf = memoryview(data).cast('B')
...     def read(self, size=-1):
...         if size < 0:
...             size = len(self.buf)
...         res = bytes(self.buf[:size])
...         self.buf = self.buf[size:]
...         return res
...     def close():
...         pass
...     def __deepcopy__(self, memo):
...         return MemReader(self.buf)
...     def __reduce__(self):
...         return MemReader, (bytes(self.buf),)
... 
>>> import codecs, copy
>>> s1 = codecs.getreader('utf-8')(MemReader(b'abcd'))
>>> s2 = copy.deepcopy(s1)
>>> s1.read()
'abcd'
>>> s2.read()
b'abcd'
>>> s2
<__main__.MemReader object at 0xb701988c>
History
Date User Action Args
2022-04-11 14:56:50adminsetgithub: 50644
2015-12-06 12:10:19serhiy.storchakasetmessages: + msg256011
2015-12-02 21:53:15lemburgsetmessages: + msg255785
2015-12-02 20:29:08serhiy.storchakasetfiles: + codecs_stream_forbid_pickling.patch

messages: + msg255774
2015-12-02 19:59:49serhiy.storchakasetfiles: + codecs_stream_delegating_2.patch

messages: + msg255763
2015-12-02 19:55:53lemburgsetmessages: + msg255761
2015-12-02 19:16:44serhiy.storchakasetmessages: + msg255756
2015-11-28 13:48:27serhiy.storchakasetfiles: + codecs_stream_delegating.patch
keywords: + patch
2015-11-28 13:48:09serhiy.storchakasetassignee: serhiy.storchaka
type: enhancement -> behavior
versions: + Python 3.5, Python 3.6, - Python 3.3
nosy: + serhiy.storchaka

messages: + msg255540
stage: patch review
2013-03-28 10:29:37georg.brandlsetversions: + Python 3.3, Python 3.4, - Python 3.2
2009-07-08 08:12:19ThomasHsetmessages: + msg90258
title: Add Pickle Support to the codecs Module -> Infinite Recursion during Unpickling a codecs Object
2009-07-06 14:26:40lemburgsetnosy: + lemburg
messages: + msg90180
2009-07-03 00:03:29alexandre.vassalottisetnosy: + alexandre.vassalotti
2009-07-01 16:02:16benjamin.petersonsettype: crash -> enhancement
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 2.5
2009-07-01 15:46:26ThomasHcreate