msg80670 - (view) |
Author: Jake McGuire (jakemcguire) |
Date: 2009-01-27 21:52 |
Instance attribute names are normally interned - this is done in
PyObject_SetAttr (among other places). Unpickling (in pickle and
cPickle) directly updates __dict__ on the instance object. This
bypasses the interning so you end up with many copies of the strings
representing your attribute names, which wastes a lot of space, both in
RAM and in pickles of sequences of objects created from pickles. Note
that the native python memcached client uses pickle to serialize
objects.
>>> import pickle
>>> class C(object):
... def __init__(self, x):
... self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None),
pickle.HIGHEST_PROTOCOL)) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
1441
>>>
Interning the strings on unpickling makes the pickles smaller, and at
least for cPickle actually makes unpickling sequences of many objects
slightly faster. I have included proposed patches to cPickle.c and
pickle.py, and would appreciate any feedback.
|
msg80678 - (view) |
Author: Gabriel Genellina (ggenellina) |
Date: 2009-01-27 23:30 |
Either my browser got crazy, or you uploaded the same patch (.py) twice.
|
msg80688 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) * |
Date: 2009-01-28 02:54 |
The patch for cPickle doesn't do the same thing as the pickle one. In
the cPickle one, you are only interning slot attributes, which, I
believe, is not what you intended. :-)
|
msg80689 - (view) |
Author: Jake McGuire (jakemcguire) |
Date: 2009-01-28 03:00 |
Are you sure? I may not have enough context in my diff, which I
should have done against an anonymous svn checkout, but it looks like
the slot attributes get set several lines after my diff. "while
(PyDict_Next(slotstate, ...))" as opposed to the "while
(PyDict_Next(state, ...))" in my change...
-jake
On Jan 27, 2009, at 6:54 PM, Alexandre Vassalotti wrote:
>
> Alexandre Vassalotti <alexandre@peadrop.com> added the comment:
>
> The patch for cPickle doesn't do the same thing as the pickle one. In
> the cPickle one, you are only interning slot attributes, which, I
> believe, is not what you intended. :-)
>
> ----------
> assignee: -> alexandre.vassalotti
> nosy: +alexandre.vassalotti
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue5084>
> _______________________________________
|
msg80690 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) * |
Date: 2009-01-28 03:03 |
Oh, you are right. I was looking at py3k's version of pickle, which uses
PyDict_Update instead of a while loop; that is why I got confused.
|
msg80918 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-02-01 20:53 |
Why do you call PyString_AsString() followed by PyString_FromString()?
Strings are immutable so you shouldn't neek to take a copy.
Besides, it would be nice to add a test.
|
msg82686 - (view) |
Author: Jake McGuire (jakemcguire) |
Date: 2009-02-24 23:32 |
The fromstring/asstring dance was due to my incomplete understanding of
refcounting. PyDict_Next returns a borrowed reference but
PyString_InternInPlace expects an owned reference. Thanks to Kirk
McDonald, I have a new patch that does the refcounting correctly.
What sort of test did you have in mind?
|
msg82690 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-02-25 02:03 |
The test should check that the unpickled strings have been interned.
|
msg82691 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-02-25 02:07 |
In your patch, I'm not sure where the `name` variable is coming from.
Have you checked it works?
|
msg82692 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-02-25 02:11 |
To give an example of what the test could check:
>>> class C(object):
... def __init__(self):
... self.some_long_attribute_name = 5
...
>>> c = C()
>>> c.__dict__
{'some_long_attribute_name': 5}
>>> sorted(map(id, c.__dict__))
[140371243499696]
>>> import pickle
>>> d = pickle.loads(pickle.dumps(c, -1))
>>> d
<__main__.C object at 0x7faaba1b0390>
>>> d.__dict__
{'some_long_attribute_name': 5}
>>> sorted(map(id, d.__dict__))
[140371243501232]
The `sorted(map(id, d.__dict__))` should have been the same before and
after pickling.
|
msg82720 - (view) |
Author: Jake McGuire (jakemcguire) |
Date: 2009-02-25 21:17 |
Ugh. Clearly I didn't check to see if it worked or not, as it didn't even
compile. A new diff, tested and verified to work, is attached. I'll work
on creating a test.
|
msg86919 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-05-01 21:54 |
Jake, are you string working on a test?
|
msg86920 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-05-01 21:55 |
(s/string/still/, of course...)
|
msg86930 - (view) |
Author: Jake McGuire (jakemcguire) |
Date: 2009-05-02 02:38 |
This fell through the cracks. But the following unit test seems like it
does the right thing (fails with the old module, works with the new ones).
|
msg86939 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-05-02 11:41 |
Thanks, I'll take a look very soon.
|
msg86982 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-05-02 21:42 |
Committed in r72223, r72224. Thanks!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:44 | admin | set | github: 49334 |
2009-05-02 21:42:49 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg86982
|
2009-05-02 11:41:39 | pitrou | set | assignee: alexandre.vassalotti -> pitrou messages:
+ msg86939 |
2009-05-02 11:40:40 | pitrou | set | stage: needs patch -> patch review |
2009-05-02 02:38:05 | jakemcguire | set | files:
+ pickletester.diff
messages:
+ msg86930 |
2009-05-01 21:55:32 | pitrou | set | messages:
+ msg86920 |
2009-05-01 21:54:58 | pitrou | set | priority: normal
stage: needs patch messages:
+ msg86919 versions:
+ Python 3.1, Python 2.7 |
2009-02-25 21:19:13 | jakemcguire | set | files:
- cPickle.c.diff |
2009-02-25 21:19:08 | jakemcguire | set | files:
- cPickle.c.diff |
2009-02-25 21:17:28 | jakemcguire | set | files:
+ cPickle.c.diff messages:
+ msg82720 |
2009-02-25 02:11:36 | pitrou | set | messages:
+ msg82692 |
2009-02-25 02:07:01 | pitrou | set | messages:
+ msg82691 |
2009-02-25 02:03:09 | pitrou | set | messages:
+ msg82690 |
2009-02-24 23:32:52 | jakemcguire | set | files:
+ cPickle.c.diff messages:
+ msg82686 |
2009-02-03 02:20:50 | collinwinter | set | nosy:
+ collinwinter, jyasskin |
2009-02-01 20:53:38 | pitrou | set | nosy:
+ pitrou messages:
+ msg80918 |
2009-01-29 10:56:14 | jcea | set | nosy:
+ jcea |
2009-01-28 03:03:10 | alexandre.vassalotti | set | messages:
+ msg80690 |
2009-01-28 03:00:02 | jakemcguire | set | messages:
+ msg80689 |
2009-01-28 02:54:53 | alexandre.vassalotti | set | assignee: alexandre.vassalotti messages:
+ msg80688 nosy:
+ alexandre.vassalotti |
2009-01-27 23:39:02 | jakemcguire | set | files:
+ cPickle.c.diff |
2009-01-27 23:38:46 | jakemcguire | set | files:
- cPickle.c.diff |
2009-01-27 23:30:05 | ggenellina | set | nosy:
+ ggenellina messages:
+ msg80678 |
2009-01-27 21:52:58 | jakemcguire | set | files:
+ pickle.py.diff |
2009-01-27 21:52:19 | jakemcguire | create | |