classification
Title: Proto 2 pickle vs dict subclass
Type: behavior Stage:
Components: Extension Modules Versions: Python 2.4, Python 2.3, Python 2.7, Python 2.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Andres.Riancho, Tim.Graham, akuchling, alexandre.vassalotti, pitrou, terry.reedy, tim.peters
Priority: normal Keywords:

Created on 2003-10-20 14:28 by tim.peters, last changed 2014-11-04 02:11 by Tim.Graham. This issue is now closed.

Messages (8)
msg60413 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-10-20 14:28
From c.l.py:

"""
From: Jimmy Retzlaff
Sent: Thursday, October 16, 2003 1:56 AM
To: python-list@python.org
Subject: Pickle dict subclass instances using new 
protocol in PEP 307


I have a subclass of dict that acts kind of like Windows' 
file systems - keys are case insensitive but case 
preserving (keys are assumed to be strings, or at least 
they have to support .lower()). It's worked well for quite 
a while - it used to inherit from UserDict and it has 
inherited from dict since that became possible.

I just tried to pickle an instance of this class for the first 
time using Python 2.3.2 on Windows. If I use protocols 0 
(text) or 1 (binary) everything works great. If I use 
protocol 2 (PEP 307) then I have a problem when loading 
my pickle. Here is a small sample to illustrate:

######

import pickle

class myDict(dict):
    def __init__(self, *args, **kwargs):
        self.x = 1
        dict.__init__(self, *args, **kwargs)

    def __getstate__(self):
        print '__getstate__ returning', (self.copy(), self.x)
        return (self.copy(), self.x)

    def __setstate__(self, (d, x)):
        print '__setstate__'
        print '    object already in state:', self
        print '    x already in self:', 'x' in dir(self)
        self.x = x
        self.update(d)

    def __setitem__(self, key, value):
        print '__setitem__', (key, value)
        dict.__setitem__(self, key, value)


d = myDict()
d['key'] = 'value'

protocols = [(0, 'Text'), (1, 'Binary'), (2, 'PEP 307')]
for protocol, description in protocols:
    print '--------------------------------------'
    print 'Pickling with Protocol %s (%s)' % (protocol, 
description)
    pickle.dump(d, file('test.pickle', 'wb'), protocol)
    del d
    print 'Unpickling'
    d = pickle.load(file('test.pickle', 'rb'))

######

When run it prints:

__setitem__ ('key', 'value') - self.x exists: True
--------------------------------------
Pickling with Protocol 0 (Text)
__getstate__ returning ({'key': 'value'}, 1)
Unpickling
__setstate__
    object already in state: {'key': 'value'}
    x already in self: False
--------------------------------------
Pickling with Protocol 1 (Binary)
__getstate__ returning ({'key': 'value'}, 1)
Unpickling
__setstate__
    object already in state: {'key': 'value'}
    x already in self: False
--------------------------------------
Pickling with Protocol 2 (PEP 307)
__getstate__ returning ({'key': 'value'}, 1)
Unpickling
__setitem__ ('key', 'value') - self.x exists: False
__setstate__
    object already in state: {'key': 'value'}
    x already in self: False


The problem I'm having stems from the fact that the 
subclass' __setitem__ is called before __setstate__ 
when loading a protocol 2 pickle (the subclass' 
__setitem__ is not called at all with protocols 0 or 1). If 
I don't define __get/setstate__ then I have the same 
problem in that the subclass' __setitem__ is called 
before the subclass' instance variables are created by 
the pickle mechanism. I need to access one of those 
instance variables in my __setitem__.

I suppose my question is one of practicality. I'd like my 
class instances to work with all pickle protocols. Am I 
getting too fancy trying to inherit from dict? Should I go 
back to UserDict or maybe to DictMixin? Should I submit 
a bug report on this, or am I getting too close to 
internals to expect a certain behavior across pickle 
protocols?
"""
msg60414 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2004-06-05 19:45
Logged In: YES 
user_id=11375

Bug #964868 is a duplicate of this one.
msg77047 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-12-05 17:55
James Stroud ran into this same issue with 2.5.  Here is his 'ugly fix'
for working with protocol 2 only.

class DictPlus(dict):
  def __init__(self, *args, **kwargs):
    self.extra_thing = ExtraThingClass()
    dict.__init__(self, *args, **kwargs)
  def __setitem__(self, k, v):
    try:
      do_something_with(self.extra_thing, k, v)
    except AttributeError:
      self.extra_thing = ExtraThingClass()
      do_something_with(self.extra_thing, k, v)
    dict.__setitem__(self, k, v)
  def __setstate__(self, adict):
    pass

Can this be closed as "won't fix", since there seems nothing to fix?
This issue of working with all protocols would seem dead by now, and for
protocol 2, it is a 'gotcha' that can be avoided with knowledge.
msg226588 - (view) Author: Andres Riancho (Andres.Riancho) Date: 2014-09-08 16:53
Well, closing this as wont-fix is far from ideal. +4 years have past from the last activity in this issue but people are still being hit by this issue.

In my case I'm not creating any special sub-class, I just use one of Python's built-in libs:

```python
import cPickle
import Cookie
 
c = Cookie.SimpleCookie()
c['abc'] = 'def'
 
unpickled_highest = cPickle.loads(cPickle.dumps(c, cPickle.HIGHEST_PROTOCOL))
unpickled_default = cPickle.loads(cPickle.dumps(c))
 
print "c['abc'].value                ", c['abc'].value
print "unpickled_default['abc'].value", unpickled_default['abc'].value
print "unpickled_highest['abc'].value", unpickled_highest['abc'].value
 
assert unpickled_default['abc'].value == c['abc'].value
assert unpickled_highest['abc'].value == c['abc'].value
```

I know there is a work-around (subclass SimpleCookie, override methods, etc.) but it's still going to be something that others will have to implement on their own, they are going to spend time debugging the issue until they reach this bug report, etc.

Batteries included should focus on cutting down development time, and this issue increases dev time by introducing strange/hidden limitations to pickle.

Is there any plan to actually fix this in the long term?
msg226591 - (view) Author: Andres Riancho (Andres.Riancho) Date: 2014-09-08 16:58
Django's issue [0] shows the ugly code people write to work around this python bug.

[0] https://code.djangoproject.com/ticket/15863
msg226602 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-09-08 19:36
Alexandre or Antoine, do either of you want to either reopen or verify that this dict subclass pickle issue was properly closed as won't fix?
msg226605 - (view) Author: Andres Riancho (Andres.Riancho) Date: 2014-09-08 19:44
FYI, I'm using Python 2.7.6
msg230570 - (view) Author: Tim Graham (Tim.Graham) * Date: 2014-11-04 02:11
Cookie pickling issue should be fixed in #22775.
History
Date User Action Args
2014-11-04 02:11:09Tim.Grahamsetnosy: + Tim.Graham
messages: + msg230570
2014-09-08 19:44:13Andres.Rianchosetmessages: + msg226605
2014-09-08 19:36:17terry.reedysetnosy: + pitrou, alexandre.vassalotti
messages: + msg226602
2014-09-08 17:03:43Andres.Rianchosettype: behavior
2014-09-08 17:03:16Andres.Rianchosetversions: + Python 2.7
2014-09-08 16:58:42Andres.Rianchosetmessages: + msg226591
2014-09-08 16:53:25Andres.Rianchosetnosy: + Andres.Riancho
messages: + msg226588
2008-12-05 20:35:56benjamin.petersonsetstatus: open -> closed
resolution: wont fix
2008-12-05 17:55:27terry.reedysetnosy: + terry.reedy
messages: + msg77047
versions: + Python 2.5, Python 2.4
2003-10-20 14:28:10tim.peterscreate