msg155626 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-13 16:10 |
A common theme in many talks last year about cloud computing was the need to suspend execution, pickle state, and resume it on a different node. This patch is the result of last year's stackless sprint at pycon, finally completed and submitted for review.
Python does not currently support pickling of many run-time structures, but pickling for things like iterators is trivial.
A large piece of Stackless' branch is to make sure that various run-time constructs are pickleable, including function objects. While this patch does not do that, it does add pickling for dictiter, and the lot.
This makes it possible to have compilcated data sets, iterate through them, and pickle them in a semi-consumed state.
Please note that a slight hack is needed to pickle some iterators. Many of these classes are namely "hidden" and there is no way to access their constructors by name. instead, an unpickling trick is to invoke "iter" on an object of the target type instead. Not the most elegant solution but I didn't want to complicate matters by adding iterator classes into namespaces. Where should stringiter live for example? Be a builtin like str?
We also didn't aim to make all iterators copy.copy-able using the __reduce__ protocol. Some iterators actually use internal iterators themselves, and if a (non-deep) copy were to happen, we would have to shallow copy those internal objects. Instead, we just return the internal iterator object directly from __reduce__ and allow recursive pickling to proceed.
|
msg155669 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-03-13 21:24 |
ISTM that a discussion on python-dev would be of value here. Iterators are a protocol, not a class; hence, the effort to make them all picklable is potentially endless. The effort would also always be incomplete because some iterators are difficult or inconvenient to pickle (esp. those with inputs from local or temporary resources).
Experience with SQL hasn't shown a need to save partially consumed cursors and my experience with iterators indicates that the need for pickling would be rare or that is would distract from better solutions (perhaps message based or somesuch).
The size of this patch is a hint that the idea is not a minor change and that it would add a maintenance burden. What is far from clear is whether it would have value for any real-world problems.
|
msg155683 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-13 22:27 |
Sure, I'll start a discussion there, but at least I've gotten the patch in. The patch is smaller than it looks, most of it is tests.
|
msg155685 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-03-13 22:38 |
The patch looks fine. If python-dev thinks that making-iterators-picklable is worth doing, I would support this going into Python 3.3 (no extra benefit will come from waiting).
|
msg155704 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-14 01:19 |
Btw, is there some way I can make this patch easier to review? I haven't contributed much since the Hg switchover, can I make it so that people can do visual diff?
|
msg155735 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2012-03-14 07:27 |
The "review" link next to the the patch file entry should already work and provide a nice visual diff + commenting interface.
|
msg155775 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-03-14 17:25 |
I think
PyAPI_FUNC(PyObject *) _PyIter_GetIter(const char *iter);
has a confusing name for a convenience function which retrieves an attribute from the builtin module by name.
Not sure what would be better. Maybe _PyIter_GetBuiltin().
|
msg155776 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-14 17:30 |
another trick has been suggested. For hidden iterator objects, such as stringiter, to actually put them in the "types" module.
in there, we could do something like:
#types.py
stringiter = iter('').__class__
and we would then change the name of the iterator in c to be "types.stringiter".
How does that sound? It _does_ make it necessary for the types module to be there to help with pickling. The _proper_ fix would be for e.g. stringiter to live in builtins, next to 'str' that it is iterating over. Any thoughts?
|
msg156688 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-24 10:19 |
sbt, I will fix the api name. Any other objections then? Leave it as it is with the "iter()" trick?
|
msg156720 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-03-24 22:40 |
Has python-dev discussion been launched? It is far from clear that this is worth doing. Pickling runtime structures may be a normal use case for Stackless but isn't a normal use case for regular Python.
Also, it seems pointless to start down this path because it will always be incomplete (i.e. pickling running generators, or socket streams, etc).
It also seems to be at odds with the normal use case for passing around partially consumed iterators -- the laziness and memory friendliness is a desired feature; however, the patch pickles some iterators (such as dicts) by running them to completion and storing *all* of the results that would have been returned.
|
msg156722 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-03-24 23:51 |
I think the "worth doing" argument doesn't really hold, given that it's done. The question at hand really is
a) is the patch correct?
b) can we commit to maintaining it, even as things around it may change?
I'm not bothered with the patch being potentially incomplete: anybody wishing to pickle more things should contribute patches for that.
As for a), I think we should give people some time to review, and then wait for beta releases to discover issues. If this is a rarely-used feature, the world won't end if it has bugs.
As for b), I think the main issue is forward compatibility: will pickles created by 3.3 still be readable by future versions?
|
msg156723 - (view) |
Author: Michael Foord (michael.foord) * |
Date: 2012-03-24 23:55 |
Yes there was a discussion on python-dev. Various people spoke in favour, no-one against:
http://mail.python.org/pipermail/python-dev/2012-March/117566.html
|
msg156731 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-03-25 02:38 |
Michael, thanks for the link. The email was clearer about its rationale than was listed here.
When this patch gets applied, any discussion of it in the docs should be clear that generators aren't included and that pickling things like dict iterators entail running the iterator to completion and storing all of the results in a list.
|
msg156749 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-03-25 11:40 |
> ... and that pickling things like dict iterators entail running the
> iterator to completion and storing all of the results in a list.
The thing to emphasise here is that pickling an iterator is "destructive": afterwards the original iterator will be "empty".
I can't think of any other examples where pickling an object causes non-trivial mutation of that object.
Come to think of it, doesn't copy.copy() delegate to __reduce__()/__reduce_ex__(). It would be a bit surprising if copy.copy(myiterator) were to consume myiterator. I expect copy.copy() to return an independent copy without mutating the original object.
|
msg156751 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-03-25 12:02 |
> The thing to emphasise here is that pickling an iterator is
> "destructive": afterwards the original iterator will be "empty".
If you look at the patch it isn't (or shouldn't be).
I agree with Raymond that accumulating dict and set iterators in a list is a bit weird. That said, with hash randomization, perhaps we can't do any better (the order of elements in the internal table depends on the process-wide hash seed).
|
msg156756 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2012-03-25 12:38 |
> If you look at the patch it isn't (or shouldn't be).
Sorry. I misunderstood when Raymond said "running the iterator to completion".
|
msg156902 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-27 10:57 |
Okay, I'll go ahead, fix the 'iter()' trick api name and apply the patch. Then we'll see what happens :).
Any suggestion towards what documentation changes are needed? I don't think the list of pickleable objects is made explicit anywhere, it is largely a trial and error thing. But obviously Misc/News is the prime candidate.
Thanks
|
msg156903 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-03-27 11:03 |
> Okay, I'll go ahead, fix the 'iter()' trick api name and apply the
> patch. Then we'll see what happens :).
Please wait for reviews.
|
msg156904 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-27 11:13 |
Raymond had already reviewed it, and sbt. I wasn't aware of any more pending reviews, but I'll wait for yours, of course.
|
msg156906 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-03-27 11:15 |
Well, please take a look at the review link. There are already some comments there.
|
msg156915 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-27 12:09 |
Btw, regarding compatibility: The docs say " The pickle serialization format is guaranteed to be backwards compatible across Python releases."
I take this to mean the serialization format itself. I don't think there is a broader guarantee that pickles generated by one version can be read by another, since objects can change their internal representation, types can even disappear or change.
In that sense, pickles made by 2.7 can be read by 3.2, in the sense that they will correctly return an error when they can't construct a 'str' object.
If I misunderstand things, then at least I think that the pickle documentation should be made clearer, that not only is the protocol supposed to be read, but also that entire pickles should always be readable _and_ instantiatable, by future versions
|
msg156928 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-03-27 14:28 |
I've incorporated antoine's comments and the proposed internal function name into a new patch.
A lot of the changes concerned thecking the type() of the unpickled iterator. Now, it wasn't a specific design goal to get the exact same objects back, only _equivalent_ objects. In particular, dicts and sets have problems, so a dictiter to a partially consumed dict cannot be pickled as it is.
so, I've added type cases everywhere, but for those cases.
Now, how important do you think type consistency is? when using iterators, does one ever look at it and test its type? if this is important, I _could_ take another look at dicts and seta and create fresh iterators to the dicts and sets made out of the remainder of the items, rather than iterators to lists.
Any thoughs?
|
msg156933 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-03-27 15:10 |
> Now, how important do you think type consistency is? when using
> iterators, does one ever look at it and test its type? if this is
> important, I _could_ take another look at dicts and seta and create
> fresh iterators to the dicts and sets made out of the remainder of the
> items, rather than iterators to lists.
I think type consistency is important if it can be achieved reasonably
simply.
In the dict and set case, I'm not sure you can recreate the internal
table in the same order (even accross interpreter restarts). In this
case, you should just check that the unpickled data is a subclass of
collections.abc.Iterator.
|
msg157405 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-04-03 09:56 |
Good idea Antoine.
So, I'll with your suggested fix to the unittests I'll commit this and then look on while Rome burns.
|
msg157408 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-04-03 11:12 |
New changeset 4ff234337e24 by Kristján Valur Jónsson in branch 'default':
Issue #14288: Serialization support for builtin iterators.
http://hg.python.org/cpython/rev/4ff234337e24
New changeset 51c88d51aa4a by Kristján Valur Jónsson in branch 'default':
Issue #14288: Modify Misc/NEWS
http://hg.python.org/cpython/rev/51c88d51aa4a
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:27 | admin | set | github: 58496 |
2012-04-08 17:07:43 | kristjan.jonsson | set | status: open -> closed resolution: fixed |
2012-04-03 11:12:04 | python-dev | set | nosy:
+ python-dev messages:
+ msg157408
|
2012-04-03 09:56:32 | kristjan.jonsson | set | messages:
+ msg157405 |
2012-03-27 15:10:06 | pitrou | set | messages:
+ msg156933 |
2012-03-27 14:29:08 | kristjan.jonsson | set | files:
+ pickling2.patch
messages:
+ msg156928 |
2012-03-27 12:09:01 | kristjan.jonsson | set | messages:
+ msg156915 |
2012-03-27 11:15:29 | pitrou | set | messages:
+ msg156906 |
2012-03-27 11:13:29 | kristjan.jonsson | set | messages:
+ msg156904 |
2012-03-27 11:03:41 | pitrou | set | messages:
+ msg156903 |
2012-03-27 10:57:31 | kristjan.jonsson | set | messages:
+ msg156902 |
2012-03-25 21:31:46 | rhettinger | set | assignee: rhettinger -> |
2012-03-25 12:38:21 | sbt | set | messages:
+ msg156756 |
2012-03-25 12:02:37 | pitrou | set | messages:
+ msg156751 |
2012-03-25 11:40:22 | sbt | set | messages:
+ msg156749 |
2012-03-25 02:38:28 | rhettinger | set | messages:
+ msg156731 |
2012-03-24 23:55:28 | michael.foord | set | messages:
+ msg156723 |
2012-03-24 23:51:39 | loewis | set | messages:
+ msg156722 |
2012-03-24 22:40:37 | rhettinger | set | messages:
+ msg156720 |
2012-03-24 13:53:49 | pitrou | set | nosy:
+ pitrou
|
2012-03-24 10:19:21 | kristjan.jonsson | set | messages:
+ msg156688 versions:
+ Python 3.3, - Python 3.4 |
2012-03-14 19:37:07 | rhettinger | set | assignee: rhettinger |
2012-03-14 17:30:59 | kristjan.jonsson | set | messages:
+ msg155776 |
2012-03-14 17:25:37 | sbt | set | nosy:
+ sbt messages:
+ msg155775
|
2012-03-14 16:17:10 | rhettinger | set | messages:
- msg155759 |
2012-03-14 15:44:39 | rhettinger | set | messages:
+ msg155759 |
2012-03-14 07:27:41 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg155735
|
2012-03-14 01:19:59 | kristjan.jonsson | set | messages:
+ msg155704 |
2012-03-13 22:38:05 | rhettinger | set | messages:
+ msg155685 |
2012-03-13 22:27:29 | kristjan.jonsson | set | messages:
+ msg155683 |
2012-03-13 21:24:31 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg155669
|
2012-03-13 18:30:40 | jcea | set | nosy:
+ jcea
|
2012-03-13 16:10:19 | kristjan.jonsson | create | |