Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make iterators pickleable #58496

Closed
kristjanvalur mannequin opened this issue Mar 13, 2012 · 25 comments
Closed

Make iterators pickleable #58496

kristjanvalur mannequin opened this issue Mar 13, 2012 · 25 comments
Labels
type-feature A feature request or enhancement

Comments

@kristjanvalur
Copy link
Mannequin

kristjanvalur mannequin commented Mar 13, 2012

BPO 14288
Nosy @loewis, @birkenfeld, @rhettinger, @jcea, @pitrou, @kristjanvalur, @voidspace
Files
  • pickling.patch
  • pickling2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-04-08.17:07:43.530>
    created_at = <Date 2012-03-13.16:10:19.630>
    labels = ['type-feature']
    title = 'Make iterators pickleable'
    updated_at = <Date 2012-04-08.17:07:43.529>
    user = 'https://github.com/kristjanvalur'

    bugs.python.org fields:

    activity = <Date 2012-04-08.17:07:43.529>
    actor = 'kristjan.jonsson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-04-08.17:07:43.530>
    closer = 'kristjan.jonsson'
    components = []
    creation = <Date 2012-03-13.16:10:19.630>
    creator = 'kristjan.jonsson'
    dependencies = []
    files = ['24822', '25042']
    hgrepos = []
    issue_num = 14288
    keywords = ['patch']
    message_count = 25.0
    messages = ['155626', '155669', '155683', '155685', '155704', '155735', '155775', '155776', '156688', '156720', '156722', '156723', '156731', '156749', '156751', '156756', '156902', '156903', '156904', '156906', '156915', '156928', '156933', '157405', '157408']
    nosy_count = 9.0
    nosy_names = ['loewis', 'georg.brandl', 'rhettinger', 'jcea', 'pitrou', 'kristjan.jonsson', 'michael.foord', 'python-dev', 'sbt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14288'
    versions = ['Python 3.3']

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 13, 2012

    A common theme in many talks last year about cloud computing was the need to suspend execution, pickle state, and resume it on a different node. This patch is the result of last year's stackless sprint at pycon, finally completed and submitted for review.

    Python does not currently support pickling of many run-time structures, but pickling for things like iterators is trivial.
    A large piece of Stackless' branch is to make sure that various run-time constructs are pickleable, including function objects. While this patch does not do that, it does add pickling for dictiter, and the lot.
    This makes it possible to have compilcated data sets, iterate through them, and pickle them in a semi-consumed state.

    Please note that a slight hack is needed to pickle some iterators. Many of these classes are namely "hidden" and there is no way to access their constructors by name. instead, an unpickling trick is to invoke "iter" on an object of the target type instead. Not the most elegant solution but I didn't want to complicate matters by adding iterator classes into namespaces. Where should stringiter live for example? Be a builtin like str?

    We also didn't aim to make all iterators copy.copy-able using the __reduce__ protocol. Some iterators actually use internal iterators themselves, and if a (non-deep) copy were to happen, we would have to shallow copy those internal objects. Instead, we just return the internal iterator object directly from __reduce__ and allow recursive pickling to proceed.

    @kristjanvalur kristjanvalur mannequin added the type-feature A feature request or enhancement label Mar 13, 2012
    @rhettinger
    Copy link
    Contributor

    ISTM that a discussion on python-dev would be of value here. Iterators are a protocol, not a class; hence, the effort to make them all picklable is potentially endless. The effort would also always be incomplete because some iterators are difficult or inconvenient to pickle (esp. those with inputs from local or temporary resources).

    Experience with SQL hasn't shown a need to save partially consumed cursors and my experience with iterators indicates that the need for pickling would be rare or that is would distract from better solutions (perhaps message based or somesuch).

    The size of this patch is a hint that the idea is not a minor change and that it would add a maintenance burden. What is far from clear is whether it would have value for any real-world problems.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 13, 2012

    Sure, I'll start a discussion there, but at least I've gotten the patch in. The patch is smaller than it looks, most of it is tests.

    @rhettinger
    Copy link
    Contributor

    The patch looks fine. If python-dev thinks that making-iterators-picklable is worth doing, I would support this going into Python 3.3 (no extra benefit will come from waiting).

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 14, 2012

    Btw, is there some way I can make this patch easier to review? I haven't contributed much since the Hg switchover, can I make it so that people can do visual diff?

    @birkenfeld
    Copy link
    Member

    The "review" link next to the the patch file entry should already work and provide a nice visual diff + commenting interface.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Mar 14, 2012

    I think

        PyAPI_FUNC(PyObject *) _PyIter_GetIter(const char *iter);

    has a confusing name for a convenience function which retrieves an attribute from the builtin module by name.

    Not sure what would be better. Maybe _PyIter_GetBuiltin().

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 14, 2012

    another trick has been suggested. For hidden iterator objects, such as stringiter, to actually put them in the "types" module.
    in there, we could do something like:
    #types.py
    stringiter = iter('').__class__

    and we would then change the name of the iterator in c to be "types.stringiter".

    How does that sound? It _does_ make it necessary for the types module to be there to help with pickling. The _proper_ fix would be for e.g. stringiter to live in builtins, next to 'str' that it is iterating over. Any thoughts?

    @rhettinger rhettinger self-assigned this Mar 14, 2012
    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 24, 2012

    sbt, I will fix the api name. Any other objections then? Leave it as it is with the "iter()" trick?

    @rhettinger
    Copy link
    Contributor

    Has python-dev discussion been launched? It is far from clear that this is worth doing. Pickling runtime structures may be a normal use case for Stackless but isn't a normal use case for regular Python.

    Also, it seems pointless to start down this path because it will always be incomplete (i.e. pickling running generators, or socket streams, etc).

    It also seems to be at odds with the normal use case for passing around partially consumed iterators -- the laziness and memory friendliness is a desired feature; however, the patch pickles some iterators (such as dicts) by running them to completion and storing *all* of the results that would have been returned.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 24, 2012

    I think the "worth doing" argument doesn't really hold, given that it's done. The question at hand really is
    a) is the patch correct?
    b) can we commit to maintaining it, even as things around it may change?

    I'm not bothered with the patch being potentially incomplete: anybody wishing to pickle more things should contribute patches for that.

    As for a), I think we should give people some time to review, and then wait for beta releases to discover issues. If this is a rarely-used feature, the world won't end if it has bugs.

    As for b), I think the main issue is forward compatibility: will pickles created by 3.3 still be readable by future versions?

    @voidspace
    Copy link
    Contributor

    Yes there was a discussion on python-dev. Various people spoke in favour, no-one against:

    http://mail.python.org/pipermail/python-dev/2012-March/117566.html

    @rhettinger
    Copy link
    Contributor

    Michael, thanks for the link. The email was clearer about its rationale than was listed here.

    When this patch gets applied, any discussion of it in the docs should be clear that generators aren't included and that pickling things like dict iterators entail running the iterator to completion and storing all of the results in a list.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Mar 25, 2012

    ... and that pickling things like dict iterators entail running the
    iterator to completion and storing all of the results in a list.

    The thing to emphasise here is that pickling an iterator is "destructive": afterwards the original iterator will be "empty".

    I can't think of any other examples where pickling an object causes non-trivial mutation of that object.

    Come to think of it, doesn't copy.copy() delegate to __reduce__()/reduce_ex(). It would be a bit surprising if copy.copy(myiterator) were to consume myiterator. I expect copy.copy() to return an independent copy without mutating the original object.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 25, 2012

    The thing to emphasise here is that pickling an iterator is
    "destructive": afterwards the original iterator will be "empty".

    If you look at the patch it isn't (or shouldn't be).

    I agree with Raymond that accumulating dict and set iterators in a list is a bit weird. That said, with hash randomization, perhaps we can't do any better (the order of elements in the internal table depends on the process-wide hash seed).

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Mar 25, 2012

    If you look at the patch it isn't (or shouldn't be).

    Sorry. I misunderstood when Raymond said "running the iterator to completion".

    @rhettinger rhettinger removed their assignment Mar 25, 2012
    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 27, 2012

    Okay, I'll go ahead, fix the 'iter()' trick api name and apply the patch. Then we'll see what happens :).
    Any suggestion towards what documentation changes are needed? I don't think the list of pickleable objects is made explicit anywhere, it is largely a trial and error thing. But obviously Misc/News is the prime candidate.
    Thanks

    @pitrou
    Copy link
    Member

    pitrou commented Mar 27, 2012

    Okay, I'll go ahead, fix the 'iter()' trick api name and apply the
    patch. Then we'll see what happens :).

    Please wait for reviews.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 27, 2012

    Raymond had already reviewed it, and sbt. I wasn't aware of any more pending reviews, but I'll wait for yours, of course.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 27, 2012

    Well, please take a look at the review link. There are already some comments there.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 27, 2012

    Btw, regarding compatibility: The docs say " The pickle serialization format is guaranteed to be backwards compatible across Python releases."

    I take this to mean the serialization format itself. I don't think there is a broader guarantee that pickles generated by one version can be read by another, since objects can change their internal representation, types can even disappear or change.

    In that sense, pickles made by 2.7 can be read by 3.2, in the sense that they will correctly return an error when they can't construct a 'str' object.

    If I misunderstand things, then at least I think that the pickle documentation should be made clearer, that not only is the protocol supposed to be read, but also that entire pickles should always be readable _and_ instantiatable, by future versions

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Mar 27, 2012

    I've incorporated antoine's comments and the proposed internal function name into a new patch.

    A lot of the changes concerned thecking the type() of the unpickled iterator. Now, it wasn't a specific design goal to get the exact same objects back, only _equivalent_ objects. In particular, dicts and sets have problems, so a dictiter to a partially consumed dict cannot be pickled as it is.

    so, I've added type cases everywhere, but for those cases.
    Now, how important do you think type consistency is? when using iterators, does one ever look at it and test its type? if this is important, I _could_ take another look at dicts and seta and create fresh iterators to the dicts and sets made out of the remainder of the items, rather than iterators to lists.

    Any thoughs?

    @pitrou
    Copy link
    Member

    pitrou commented Mar 27, 2012

    Now, how important do you think type consistency is? when using
    iterators, does one ever look at it and test its type? if this is
    important, I _could_ take another look at dicts and seta and create
    fresh iterators to the dicts and sets made out of the remainder of the
    items, rather than iterators to lists.

    I think type consistency is important if it can be achieved reasonably
    simply.
    In the dict and set case, I'm not sure you can recreate the internal
    table in the same order (even accross interpreter restarts). In this
    case, you should just check that the unpickled data is a subclass of
    collections.abc.Iterator.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Apr 3, 2012

    Good idea Antoine.
    So, I'll with your suggested fix to the unittests I'll commit this and then look on while Rome burns.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 3, 2012

    New changeset 4ff234337e24 by Kristján Valur Jónsson in branch 'default':
    Issue bpo-14288: Serialization support for builtin iterators.
    http://hg.python.org/cpython/rev/4ff234337e24

    New changeset 51c88d51aa4a by Kristján Valur Jónsson in branch 'default':
    Issue bpo-14288: Modify Misc/NEWS
    http://hg.python.org/cpython/rev/51c88d51aa4a

    @kristjanvalur kristjanvalur mannequin closed this as completed Apr 8, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants