Message80670
Instance attribute names are normally interned - this is done in
PyObject_SetAttr (among other places). Unpickling (in pickle and
cPickle) directly updates __dict__ on the instance object. This
bypasses the interning so you end up with many copies of the strings
representing your attribute names, which wastes a lot of space, both in
RAM and in pickles of sequences of objects created from pickles. Note
that the native python memcached client uses pickle to serialize
objects.
>>> import pickle
>>> class C(object):
... def __init__(self, x):
... self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None),
pickle.HIGHEST_PROTOCOL)) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
1441
>>>
Interning the strings on unpickling makes the pickles smaller, and at
least for cPickle actually makes unpickling sequences of many objects
slightly faster. I have included proposed patches to cPickle.c and
pickle.py, and would appreciate any feedback. |
|
Date |
User |
Action |
Args |
2009-01-27 21:52:20 | jakemcguire | set | recipients:
+ jakemcguire |
2009-01-27 21:52:20 | jakemcguire | set | messageid: <1233093140.55.0.0122548653801.issue5084@psf.upfronthosting.co.za> |
2009-01-27 21:52:19 | jakemcguire | link | issue5084 messages |
2009-01-27 21:52:17 | jakemcguire | create | |
|