classification
Title: Make Generators Pickle-able
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, christian.heimes, georg.brandl, jaysonvantuyl, josiahcarlson, loewis
Priority: normal Keywords:

Created on 2004-12-29 21:52 by jaysonvantuyl, last changed 2008-07-20 11:25 by georg.brandl. This issue is now closed.

Messages (7)
msg54345 - (view) Author: Jayson Vantuyl (jaysonvantuyl) Date: 2004-12-29 21:52
Would it be possible to make generators pickle-able?  I
mean, currently the internal state is saved in some
way.  Would it be possible to make pickle handle them?

Put another way, if generators had a __getnewargs__
function that returned some data (say a tuple of module
name, function name, locals/globals dicts and some code
dependent location data) and then allow:

  generator.__new__(statedata)

to reconstruct it (or something more elegant).
msg54346 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-12-30 14:16
Logged In: YES 
user_id=21627

Moving into the feature requests tracker.

I don't think this is possible (or should be done if it were
possible). Pickle has traditionally abstained from pickling
functions, so IMO it should not picke generators, either. If
this was enabled by default, it would open up yet another
security hole.
msg54347 - (view) Author: Jayson Vantuyl (jaysonvantuyl) Date: 2004-12-30 18:37
Logged In: YES 
user_id=626872

I'm not talking about pickling functions.  Which, by the
way, is not entirely accurate.  See the following:

>>> from pickle import dumps,loads
>>> loads(dumps(dumps))
<function dumps at 0x40376dbc>
>>>

As you see, we don't pickle the function code, just a
reference to it.  I propose doing the same with generators,
but including the function namespace and frame info in the
pickle to allow it to continue execution after unpickling. 
Furthermore, this is nothing more than what the interpreter
already does internally.  When a generator yields, all of
its state is neatly stashed away.  I just would like pickle
to be able to get at it, store it, and then later recover
it--without dealing with any actual code objects, just the
state.

As for security, what I am talking about is nothing more or
less secure than pickling classes.

When we pickle a class, we don't pickle the methods on it. 
Rather, we pickle the information to reconstruct the class
(__getstate__, __getnewargs__, __reduce__).  There is a
security concern in that modified pickles could be used to
put bogus data into the unpickled classes (i.e. a password
stored as an attribute on a class could be replaced).

What I'm asking for is nothing more than pickling a form of
the frozen frame object (or something akin to it) for the
generator.  Put another way, when the generator isn't
running, something stores the entire state of its execution.
 I'm not sure what it is, but I'd be willing to be it
consists of little more than a few dicts (namespaces), some
scoping info, and some sort of instruction pointer.

By pickling the generator, I propose nothing more radical
than pickling a class.  The generator is still instantiated
from the some code as before (just as a class or function
reference is) and it still can/will act on that information
(just as a class does).  No actual code is pickled.  What
this does allow is the use of generators for efficent
handling of a class.

Specifically, I'm writing an application that uses a seeded
pseudo-random number generator.  The idea is to transmit the
state of the generator over the network so that the
client/server can deterministically make the same random
choices without communicating the complex state that results
from that.

I have a choice of the following:

def randfunc(seed,num,otherstate):
  # Costly Setup
  # Iteration to appropriate number

otherside.sendPickle( (seed,num,otherstate,) )

l = [ randfunc(123,i,...) for i in range(5) ]

Versus

def randGen(seed):
  # Costly Setup
  # yield in a simple loop

r = randGen(123)

otherside.sendPickle(r)

l = [ r() for i in range(5) ]

I think you can see which one is more efficient in terms of
both simplicity of expression and ease of coding.

Of course, the standard answer to this is to implement a
randgen class that contains all of the state.

This complicates the code a great deal, since a generator
can't be used.

What is problematic is that generators are like black holes.
 Once information goes into them, it won't come back out. 
There's not a good way to use them for anything more than
runtime.  This rules out things like process migration,
pickling of any structure involving generator data, and
generally using them like any other language component. 
Using them for counters, prngs, prime number generators, OGR
sieves, and anything that needs to be used to durably
generate a sequence is impossible if that data needs to be
persisted.

No one would think about implementing a new pickle without
the ability to represent classes and function references,
generators shouldn't be second class control structures.

FYI, Stackless Python already does this, but it may be
easier for them due to the way they've modified frame handling.
msg54348 - (view) Author: Josiah Carlson (josiahcarlson) * Date: 2005-01-02 18:34
Logged In: YES 
user_id=341410

In a practical sense I believe this kind of thing is
possible (doing a little spelunking in some generators
reveals gen.gi_frame, which looks to be the stackframe for
the generator).  Why it isn't done could have something to
do with executing on arbitrary stack frames.

In your random-number-generator case, using old-style
iterators via classes can do what you want them to do now
(that is, won't have to wait until Python 2.5 to make it in,
which is at least 1.5 years away).

class g:
    def __init__(self, state=None):
        if state is None:
            #generate some default state
        else:
            #process the state to validate and internalize it.
    def __iter__(self):
        return self
    def next(self):
        state = self.state
        ret = #some function on the current internal state
        self.state = #some function on the current internal
state
        return ret


Heck, you don't even need pickle hooks, because pickle will
pickle your iterator's __dict__ attribute.

Alternatively, you can use Christian Tismer's Stackless
Python, which has generator pickling.
msg59303 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-05 19:12
Discuss this for 2.6

I thin it's neither possible to find a generic solution nor a good idea
to make generators picklable.
msg67893 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-06-10 04:18
I think is a bad idea too. Unless I see patch that implements this
feature cleanly, I will have to reject this feature request.
msg70080 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-20 11:25
A patch can open a new issue, then.
History
Date User Action Args
2008-07-20 11:25:10georg.brandlsetstatus: pending -> closed
nosy: + georg.brandl
messages: + msg70080
2008-06-10 04:18:32alexandre.vassalottisetstatus: open -> pending
nosy: + alexandre.vassalotti
resolution: rejected
messages: + msg67893
2008-01-05 19:12:56christian.heimessetnosy: + christian.heimes
messages: + msg59303
versions: + Python 2.6
2004-12-29 21:52:55jaysonvantuylcreate