Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itertools.islice() doesn't release reference to the source iterator when the slice is exhausted #65520

Closed
AntonAfanasyev mannequin opened this issue Apr 21, 2014 · 17 comments
Assignees
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@AntonAfanasyev
Copy link
Mannequin

AntonAfanasyev mannequin commented Apr 21, 2014

BPO 21321
Nosy @rhettinger, @pitrou
Files
  • issue21321_3.4_8c8315bac6a8.diff
  • issue21321_2.7_e3217efa6edd.diff
  • issue21321_3.4_8c8315bac6a8_2.diff
  • issue21321_2.7_e3217efa6edd_3.diff
  • issue21321_3.4_8c8315bac6a8_3.diff
  • issue21321_2.7_e3217efa6edd_4.diff
  • issue21321_3.4_8c8315bac6a8_4.diff
  • issue21321_3.4_8c8315bac6a8_5.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2014-04-29.10:27:27.284>
    created_at = <Date 2014-04-21.12:13:50.306>
    labels = ['extension-modules', 'performance']
    title = "itertools.islice() doesn't release reference to the source iterator when the slice is exhausted"
    updated_at = <Date 2014-04-29.10:27:27.284>
    user = 'https://bugs.python.org/AntonAfanasyev'

    bugs.python.org fields:

    activity = <Date 2014-04-29.10:27:27.284>
    actor = 'pitrou'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2014-04-29.10:27:27.284>
    closer = 'pitrou'
    components = ['Extension Modules']
    creation = <Date 2014-04-21.12:13:50.306>
    creator = 'Anton.Afanasyev'
    dependencies = []
    files = ['34990', '34991', '35003', '35076', '35077', '35086', '35087', '35091']
    hgrepos = []
    issue_num = 21321
    keywords = ['patch']
    message_count = 17.0
    messages = ['216939', '216940', '216992', '217014', '217180', '217407', '217410', '217411', '217474', '217495', '217503', '217504', '217505', '217506', '217508', '217509', '217510']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'pitrou', 'python-dev', 'Anton.Afanasyev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue21321'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 21, 2014

    This issue results in redundant memory consumption for e.g. in this case:

    ================================================

    from itertools import *
    
    def test_islice():
    items, lookahead = tee(repeat(1, int(1e9)))
    lookahead = islice(lookahead, 10)
    
        for item in lookahead:
            pass
    
        for item in items:
            pass
    
    if __name__ == "__main__":
        test_islice()

    ================================================
    This demo is taken from real case where one needs to look ahead input stream before processing it. For my PC this demo stops with 'Segmentation fault' message after exhausting all PC memory, while running it with patched python consumes only 0.1% of memory till the end.

    When one uses pure pythonic implementation of itertools.islice() (taken from docs), the issue goes away as well, since this implementation doesn't store redundant reference to source iterator.

    ================================================

    def islice(iterable, *args):
        s = slice(*args)
        it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
        nexti = next(it)
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)

    ================================================

    Attaching patch for this issue. Have to change '__reduce__()' method since now unpickling of exhausted 'islice()' object cannot return old source iterator.

    @AntonAfanasyev AntonAfanasyev mannequin added extension-modules C modules in the Modules dir performance Performance or resource usage labels Apr 21, 2014
    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 21, 2014

    Added patch for 2.7 version (no need to change '__reduce__()' method since it's not implemented).

    @rhettinger
    Copy link
    Contributor

    The ref-counts in the islice_reduce code don't look to be correct at first glance.

    @rhettinger rhettinger self-assigned this Apr 22, 2014
    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 22, 2014

    Hi Raymond,
    do you mean allocation exceptions handling should be more accurate?
    Attaching fixed version for 3.4 branch.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 26, 2014

    Haven't reviewed the patch, but you should definitely add a unit test for the bugfix.

    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 28, 2014

    Hi Antoine,
    I have no found a way to check resource usage in test infrastructure and I don't think it could be done carefully. The only method I found to test issue is straightforward: just to check source iterator is not referenced from itertools.islice() after the latter has been exhausted:

    ================================================

    a = [random.random() for i in range(10)]
    before = sys.getrefcount(a)
    b = islice(a, 5)
    for i in b: pass
    after = sys.getrefcount(a)
    self.assertEqual(before, after)

    ================================================

    Attaching "issue21321_2.7_e3217efa6edd_3.diff" and "issue21321_3.4_8c8315bac6a8_3.diff" patches with this test included in "Lib/test/test_itertools.py".

    @pitrou
    Copy link
    Member

    pitrou commented Apr 28, 2014

    Anton, the test is wrong: it is taking a reference to the iterable object (the list), not the iterator.

    To check the reference to the original iterator is released, something like this would work:

    >>> import itertools, weakref
    >>> it = (x for x in (1, 2))
    >>> wr = weakref.ref(it)
    >>> it = itertools.islice(it, 1)
    >>> wr() is None
    False
    >>> list(it)
    [1]
    >>> wr() is None  # returns True with the patch, False without
    True

    @pitrou
    Copy link
    Member

    pitrou commented Apr 28, 2014

    (note I haven't looked at the C part of the patch)

    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 29, 2014

    Hi Antoine,
    my test works for me. It can be either
    >>> a = [1, 2, 3]
    or
    >>> a = iter([1, 2, 3])
    , no matter: both objects will be +1 referenced after taking
    >>> b = islice(a, 1)
    . 
    My test failed without patch and passed with one.

    But your test is more straightforward, thanks.
    Attaching patches with your test.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2014

    Thanks. Could you also add a test for the islice_reduce additions? Or is it already tested?
    I suspect there's a reference leak there: after calling PyObject_GetIter, you should always Py_DECREF(empty_list).
    Also, with the "O" code, Py_BuildValue will take a new reference to empty_it, so you should use the "N" code instead.

    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 29, 2014

    Hi Antoine,
    oops you are right about leaks: fixed them in new attached patch.
    As for testing changes in "reduce()": they are already covered by "self.pickletest(islice(range(100), *args))". Function "pickletest()" covers case for pickle dumping/loading of exhausted iterator.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2014

    For the record, checks such as:

        self.assertEqual(wr() is None, False)
    

    are better written:

            self.assertIsNotNone(wr())

    No need to upload a new patch, I'm gonna make the change while committing :-)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 29, 2014

    New changeset b795105db23a by Antoine Pitrou in branch '3.4':
    Issue bpo-21321: itertools.islice() now releases the reference to the source iterator when the slice is exhausted.
    http://hg.python.org/cpython/rev/b795105db23a

    New changeset a627b3e3c9c8 by Antoine Pitrou in branch 'default':
    Issue bpo-21321: itertools.islice() now releases the reference to the source iterator when the slice is exhausted.
    http://hg.python.org/cpython/rev/a627b3e3c9c8

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2014

    Patch committed, thank you!
    If you want to provide a patch for 2.7, please say so, otherwise I'll close the issue.

    @AntonAfanasyev
    Copy link
    Mannequin Author

    AntonAfanasyev mannequin commented Apr 29, 2014

    Antoine, not sure about 2.7. The issue first arose for me at Python 2.7, so I would prefer "issue21321_2.7_e3217efa6edd_4.diff" patch be applied.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 29, 2014

    New changeset 8ee76e1b5aa6 by Antoine Pitrou in branch '2.7':
    Issue bpo-21321: itertools.islice() now releases the reference to the source iterator when the slice is exhausted.
    http://hg.python.org/cpython/rev/8ee76e1b5aa6

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2014

    Ok, then I've committed to 2.7 too. Thank you very much for contributing!

    @pitrou pitrou closed this as completed Apr 29, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants