Message 54347 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jaysonvantuyl
Recipients
Date	2004-12-30.18:37:26
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=626872 I'm not talking about pickling functions. Which, by the way, is not entirely accurate. See the following: >>> from pickle import dumps,loads >>> loads(dumps(dumps)) <function dumps at 0x40376dbc> >>> As you see, we don't pickle the function code, just a reference to it. I propose doing the same with generators, but including the function namespace and frame info in the pickle to allow it to continue execution after unpickling. Furthermore, this is nothing more than what the interpreter already does internally. When a generator yields, all of its state is neatly stashed away. I just would like pickle to be able to get at it, store it, and then later recover it--without dealing with any actual code objects, just the state. As for security, what I am talking about is nothing more or less secure than pickling classes. When we pickle a class, we don't pickle the methods on it. Rather, we pickle the information to reconstruct the class (__getstate__, __getnewargs__, __reduce__). There is a security concern in that modified pickles could be used to put bogus data into the unpickled classes (i.e. a password stored as an attribute on a class could be replaced). What I'm asking for is nothing more than pickling a form of the frozen frame object (or something akin to it) for the generator. Put another way, when the generator isn't running, something stores the entire state of its execution. I'm not sure what it is, but I'd be willing to be it consists of little more than a few dicts (namespaces), some scoping info, and some sort of instruction pointer. By pickling the generator, I propose nothing more radical than pickling a class. The generator is still instantiated from the some code as before (just as a class or function reference is) and it still can/will act on that information (just as a class does). No actual code is pickled. What this does allow is the use of generators for efficent handling of a class. Specifically, I'm writing an application that uses a seeded pseudo-random number generator. The idea is to transmit the state of the generator over the network so that the client/server can deterministically make the same random choices without communicating the complex state that results from that. I have a choice of the following: def randfunc(seed,num,otherstate): # Costly Setup # Iteration to appropriate number otherside.sendPickle( (seed,num,otherstate,) ) l = [ randfunc(123,i,...) for i in range(5) ] Versus def randGen(seed): # Costly Setup # yield in a simple loop r = randGen(123) otherside.sendPickle(r) l = [ r() for i in range(5) ] I think you can see which one is more efficient in terms of both simplicity of expression and ease of coding. Of course, the standard answer to this is to implement a randgen class that contains all of the state. This complicates the code a great deal, since a generator can't be used. What is problematic is that generators are like black holes. Once information goes into them, it won't come back out. There's not a good way to use them for anything more than runtime. This rules out things like process migration, pickling of any structure involving generator data, and generally using them like any other language component. Using them for counters, prngs, prime number generators, OGR sieves, and anything that needs to be used to durably generate a sequence is impossible if that data needs to be persisted. No one would think about implementing a new pickle without the ability to represent classes and function references, generators shouldn't be second class control structures. FYI, Stackless Python already does this, but it may be easier for them due to the way they've modified frame handling.

Logged In: YES 
user_id=626872

I'm not talking about pickling functions.  Which, by the
way, is not entirely accurate.  See the following:

>>> from pickle import dumps,loads
>>> loads(dumps(dumps))
<function dumps at 0x40376dbc>
>>>

As you see, we don't pickle the function code, just a
reference to it.  I propose doing the same with generators,
but including the function namespace and frame info in the
pickle to allow it to continue execution after unpickling. 
Furthermore, this is nothing more than what the interpreter
already does internally.  When a generator yields, all of
its state is neatly stashed away.  I just would like pickle
to be able to get at it, store it, and then later recover
it--without dealing with any actual code objects, just the
state.

As for security, what I am talking about is nothing more or
less secure than pickling classes.

When we pickle a class, we don't pickle the methods on it. 
Rather, we pickle the information to reconstruct the class
(__getstate__, __getnewargs__, __reduce__).  There is a
security concern in that modified pickles could be used to
put bogus data into the unpickled classes (i.e. a password
stored as an attribute on a class could be replaced).

What I'm asking for is nothing more than pickling a form of
the frozen frame object (or something akin to it) for the
generator.  Put another way, when the generator isn't
running, something stores the entire state of its execution.
 I'm not sure what it is, but I'd be willing to be it
consists of little more than a few dicts (namespaces), some
scoping info, and some sort of instruction pointer.

By pickling the generator, I propose nothing more radical
than pickling a class.  The generator is still instantiated
from the some code as before (just as a class or function
reference is) and it still can/will act on that information
(just as a class does).  No actual code is pickled.  What
this does allow is the use of generators for efficent
handling of a class.

Specifically, I'm writing an application that uses a seeded
pseudo-random number generator.  The idea is to transmit the
state of the generator over the network so that the
client/server can deterministically make the same random
choices without communicating the complex state that results
from that.

I have a choice of the following:

def randfunc(seed,num,otherstate):
  # Costly Setup
  # Iteration to appropriate number

otherside.sendPickle( (seed,num,otherstate,) )

l = [ randfunc(123,i,...) for i in range(5) ]

Versus

def randGen(seed):
  # Costly Setup
  # yield in a simple loop

r = randGen(123)

otherside.sendPickle(r)

l = [ r() for i in range(5) ]

I think you can see which one is more efficient in terms of
both simplicity of expression and ease of coding.

Of course, the standard answer to this is to implement a
randgen class that contains all of the state.

This complicates the code a great deal, since a generator
can't be used.

What is problematic is that generators are like black holes.
 Once information goes into them, it won't come back out. 
There's not a good way to use them for anything more than
runtime.  This rules out things like process migration,
pickling of any structure involving generator data, and
generally using them like any other language component. 
Using them for counters, prngs, prime number generators, OGR
sieves, and anything that needs to be used to durably
generate a sequence is impossible if that data needs to be
persisted.

No one would think about implementing a new pickle without
the ability to represent classes and function references,
generators shouldn't be second class control structures.

FYI, Stackless Python already does this, but it may be
easier for them due to the way they've modified frame handling.

History
Date	User	Action	Args
2007-08-23 16:10:13	admin	link	issue1092962 messages
2007-08-23 16:10:13	admin	create