classification
Title: itertools.chain behaves strangly when copied with copy.copy
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: MSeifert, kristjan.jonsson, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-03-24 19:14 by MSeifert, last changed 2017-04-03 18:56 by rhettinger.

Files
File name Uploaded Description Edit
itertools-chain-copy.diff serhiy.storchaka, 2017-03-31 16:03 review
Messages (10)
msg290106 - (view) Author: Michael Seifert (MSeifert) * Date: 2017-03-24 19:14
When using `copy.copy` to copy an `itertools.chain` instance the results can be weird. For example

>>> from itertools import chain
>>> from copy import copy
>>> a = chain([1,2,3], [4,5,6])
>>> b = copy(a)
>>> next(a)  # looks okay
1
>>> next(b)  # jumps to the second iterable, not okay?
4
>>> tuple(a)
(2, 3)
>>> tuple(b)
(5, 6)

I don't really want to "copy.copy" such an iterator (I would either use `a, b = itertools.tee(a, 2)` or `b = a` depending on the use-case). This just came up because I investigated how pythons iterators behave when copied, deepcopied or pickled because I want to make the iterators in my extension module behave similarly.
msg290143 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-24 21:19
Humph, that is definitely not the expected result.  The itertools copy/reduce support has been a never-ending source of bugs and headaches.

It looks like the problem is that __reduce__ is returning the existing tuple iterator rather than a new one:

>>> a = chain([1,2,3], [4,5,6])
>>> b = copy(a)
>>> next(a)
1
>>> a.__reduce__()
(<class 'itertools.chain'>, (), (<tuple_iterator object at 0x104ee78d0>, <list_iterator object at 0x104f81b70>))
>>> b.__reduce__()
(<class 'itertools.chain'>, (), (<tuple_iterator object at 0x104ee78d0>,))
msg290146 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-24 21:30
chain(x) is a shortcut for chain.from_iterable(iter(x)).

Neither copy.copy() nor __reduce__ don't have particular relation to this. Consider following example:

>>> from itertools import chain
>>> i = iter([[1, 2, 3], [4, 5, 6]])
>>> a = chain.from_iterable(i)
>>> b = chain.from_iterable(i)
>>> next(a)
1
>>> next(b)
4
>>> tuple(a)
(2, 3)
>>> tuple(b)
(5, 6)
msg290916 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-31 15:43
This issue is related to the behavior of other composite iterators.

>>> from copy import copy
>>> it = map(ord, 'abc')
>>> list(copy(it))
[97, 98, 99]
>>> list(copy(it))
[]
>>> it = filter(None, 'abc')
>>> list(copy(it))
['a', 'b', 'c']
>>> list(copy(it))
[]

The copy is too shallow. If you consume an item from one copy, it is disappeared for the original.

Compare with the behavior of iterators of builtin sequences:

>>> it = iter('abc')
>>> list(copy(it))
['a', 'b', 'c']
>>> list(copy(it))
['a', 'b', 'c']
>>> it = iter(list('abc'))
>>> list(copy(it))
['a', 'b', 'c']
>>> list(copy(it))
['a', 'b', 'c']
msg290917 - (view) Author: Michael Seifert (MSeifert) * Date: 2017-03-31 15:59
Just an update what doesn't work: just overriding the `__copy__` method. 

I tried it but it somewhat breaks `itertools.tee` because if the passed iterable has a `__copy__` method `tee` rather copies the iterator (=> resulting in a lot of unnecessary memory overhead or breakage if a generator is "inside") instead of using it's memory-efficient internals.
msg290918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-31 16:03
Just for example there is a patch that implements in Python deeper copying for itertools.chain objects. I doesn't mean pushing it, it is too complicated. I have wrote also slightly simpler implementation, but it doesn't work due to the behavior of copied map object.
msg290995 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-01 16:12
Serhiy, feel free to take this in whatever direction you think is best.
msg291027 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2017-04-02 09:12
It is a tricky issue. How deep do you go?what if you are chaining several
of the itertools? Seems like we're entering a semantic sinkhole here.

Deepcopy would be too deep...
The original copy support in these objects stems from the desire to support
pickling.

On 1 Apr 2017 16:12, "Raymond Hettinger" <report@bugs.python.org> wrote:

>
> Raymond Hettinger added the comment:
>
> Serhiy, feel free to take this in whatever direction you think is best.
>
> ----------
> assignee:  -> serhiy.storchaka
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue29897>
> _______________________________________
>
msg291035 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-02 12:40
Yes, this issue is tricky, and I don't have .

If implement __copy__ for builtin compound iterators I would implement filter.__copy__ and map.__copy__ something like:

def __copy__(self):
    cls, *args = self.__reduce__()
    return cls(*map(copy, args))

If the underlying iterators properly support copying, the copying of filter and map iterators will be successful. If they don't support copying, the copying of filter and map iterators should fail, and don't accumulate elements in the tee() object.

But there are open questions.

1. This is a behavior change. What if any code depends on the current behavior? This is silly, copy(filter) and copy(map) could just return the original iterator if this is a desirable behavior.

2. Depending on the copy module in the method of the builtin type looks doubtful. Should we implement copy.copy() in C and provide a public C API?

3. If make a copying of limited depth, shouldn't we use a memo as for deepcopy() to prevent unwanted duplications? Otherwise the copied `map(func, it, it)` would behave differently from the original. This example is not so silly as looked.

4. Is it possible to implement the copying for all compound iterators? For example the copying of chain() should change the state of the original object (by using __setstate__), so that it makes copies of subiterators before using them.

Perhaps all this deserves a PEP.
msg291091 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-03 18:56
> Perhaps all this deserves a PEP.

If Serhiy and Kristján are on a course of action, that will suffice.  Copying iterators is an esoteric endeavor of interest to very few users (no one has even noticed until now).
History
Date User Action Args
2017-04-03 18:56:13rhettingersetmessages: + msg291091
2017-04-02 12:40:24serhiy.storchakasetmessages: + msg291035
2017-04-02 09:12:12kristjan.jonssonsetmessages: + msg291027
2017-04-01 16:12:12rhettingersetassignee: serhiy.storchaka
messages: + msg290995
2017-03-31 16:03:43serhiy.storchakasetfiles: + itertools-chain-copy.diff
keywords: + patch
messages: + msg290918
2017-03-31 15:59:13MSeifertsetmessages: + msg290917
2017-03-31 15:43:45serhiy.storchakasetmessages: + msg290916
2017-03-25 03:13:55rhettingersetassignee: rhettinger -> (no value)
2017-03-24 21:30:17serhiy.storchakasetmessages: + msg290146
2017-03-24 21:19:37rhettingersetnosy: + kristjan.jonsson, serhiy.storchaka
messages: + msg290143
2017-03-24 20:49:57rhettingersetassignee: rhettinger

nosy: + rhettinger
2017-03-24 19:14:26MSeifertcreate