Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call_at/call_later with Timer cancellation can result in (practically) unbounded memory usage. #66638

Closed
chatgris mannequin opened this issue Sep 20, 2014 · 25 comments
Closed
Labels
performance Performance or resource usage topic-asyncio

Comments

@chatgris
Copy link
Mannequin

chatgris mannequin commented Sep 20, 2014

BPO 22448
Nosy @gvanrossum, @vstinner, @1st1
Files
  • wait_for.py: Unbounded memory use example
  • asyncio_remove_delayed.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-09-30.18:58:08.288>
    created_at = <Date 2014-09-20.00:14:00.358>
    labels = ['expert-asyncio', 'performance']
    title = 'call_at/call_later with Timer cancellation can result in (practically) unbounded memory usage.'
    updated_at = <Date 2014-10-05.09:45:14.730>
    user = 'https://bugs.python.org/chatgris'

    bugs.python.org fields:

    activity = <Date 2014-10-05.09:45:14.730>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-09-30.18:58:08.288>
    closer = 'yselivanov'
    components = ['asyncio']
    creation = <Date 2014-09-20.00:14:00.358>
    creator = 'chatgris'
    dependencies = []
    files = ['36666', '36758']
    hgrepos = []
    issue_num = 22448
    keywords = ['patch']
    message_count = 25.0
    messages = ['227136', '227139', '227157', '227160', '227178', '227185', '227189', '227190', '227200', '227547', '227571', '227572', '227573', '227874', '227919', '227920', '227921', '227940', '227943', '227950', '227952', '227968', '227975', '227976', '228553']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'vstinner', 'python-dev', 'yselivanov', 'chatgris']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue22448'
    versions = ['Python 3.4']

    @chatgris
    Copy link
    Mannequin Author

    chatgris mannequin commented Sep 20, 2014

    The core issue stems from the implementation of Timer cancellation. (which features like asyncio.wait_for build upon). BaseEventLoop stores scheduled events in an array backed heapq named _scheduled. Once an event has been scheduled with call_at, cancelling the event only marks the event as cancelled, it does not remove it from the array backed heap. It is only removed once the cancelled event is at the next scheduled event for the loop.

    In a system where many events are run (and then cancelled) that may have long timeout periods, and there always exists at least one event that is scheduled for an earlier time, memory use is practically unbounded. The attached program wait_for.py demonstrates a trivial example where memory use is practically unbounded for an hour of time. This is the case even though the program only ever has two "uncancelled" events and two coroutines at any given time in its execution.

    This could be fixed in a variety of ways:

    a) Timer cancellation could result in the object being removed from the heap like in the sched module. This would be at least O(N) where N is the number of scheduled events.
    b) Timer cancellation could trigger a callback that tracks the number of cancelled events in the _scheduled list. Once this number exceeds a threshold ( 50% ? ), the list could be cleared of all cancelled events and then be re-heapified.
    c) A balanced tree structure could be used to implement the scheduled events O(log N) time complexity (current module is O(log N) for heappop anyways).

    Given python's lack of a balanced tree structure in the standard library, I assume option c) is a non-starter.

    I would prefer option b) over option a) as when there are a lot of scheduled events in the system (upwards of 50,000 - 100,000 in some of my use cases) the amortized complexity for cancelling an event trends towards O(1) (N/2 cancellations are handled by a single O(N) event) at the cost of slightly more, but bounded relative to the amount of events, memory.

    I would be willing to take a shot at implementing this patch with the most agreeable option. Please let me know if that would be appreciated, or if someone else would rather tackle this issue. (First time bug report for python, not sure of the politics/protocols involved).

    Disclaimer that I by no means an asyncio expert, my understanding of the code base is based on my reading of it debugging this memory leak.

    @chatgris chatgris mannequin added topic-asyncio performance Performance or resource usage labels Sep 20, 2014
    @gvanrossum
    Copy link
    Member

    Hi Joshua,

    This is indeed a problem -- I just never expected that you would be having that many events and and canceling the majority. I am sorry you had to debug this. :-(

    This was anticipated by the author of pyftpdlib (Giampaolo Roloda'), who proposed an elegant solution: keep track of the number of cancelled events, and when the number gets too high (according to some measure) the heapq is simply rebuilt by filtering out cancelled events. I think this is similar to your proposal (b).

    I would love it if you could implement this! Just make sure to add some tests and follow the PEP-8 style guide. You can contribute upstream to the Tulip project first. https://code.google.com/p/tulip/

    @chatgris
    Copy link
    Mannequin Author

    chatgris mannequin commented Sep 20, 2014

    You can contribute upstream to the Tulip project first.

    Will I be writing a patch and tests for tulip, and then separate a patch and tests for python 3.4? Or will I submit to tulip, and then the changes will get merged from tulip into python by some other process?

    If possible, I would like to get this into python 3.4.2 (assuming all goes well).

    @gvanrossum
    Copy link
    Member

    We can merge the changes into 3.4 and 3.5 for you, it's just a simple copy
    (the codebases are identical). However, the 3.4.2 release candidate is
    apparently in 2 days, so I think you've missed that train already.

    On Sat, Sep 20, 2014 at 9:02 AM, Joshua Moore-Oliva <report@bugs.python.org>
    wrote:

    Joshua Moore-Oliva added the comment:

    > You can contribute upstream to the Tulip project first.

    Will I be writing a patch and tests for tulip, and then separate a patch
    and tests for python 3.4? Or will I submit to tulip, and then the changes
    will get merged from tulip into python by some other process?

    If possible, I would like to get this into python 3.4.2 (assuming all goes
    well).

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue22448\>


    @gvanrossum
    Copy link
    Member

    By the way I just looked at wait_for.py; it has a bug where do_work() isn't using yield-from with the sleep() call. But that may well be the issue you were trying to debug, and this does not change my opinion about the issue -- I am still looking forward to your patch.

    @chatgris
    Copy link
    Mannequin Author

    chatgris mannequin commented Sep 20, 2014

    My patch is ready for review, if I followed the process correctly I think you should have received an email

    https://codereview.appspot.com/145220043

    By the way I just looked at wait_for.py; it has a bug where do_work() isn't using yield-from with the sleep() call. But that may well be the issue you were trying to debug, and this does not change my opinion about the issue

    That was not intended, it was just a mistake.

    (A quick aside on yield from, feel free to ignore, I don't expect to change anyone's opinion on this)
    I don't use "yield from" much - my first use of asyncio was porting an application from gevent (I made a small custom wrapper with fibers (https://pypi.python.org/pypi/fibers) that can internally yield on coroutines). I have read https://glyph.twistedmatrix.com/2014/02/unyielding.html but in my cases, I tend to write my code with the thought that any non standard library function can yield (I initially tried porting to vanilla asyncio, but I ended up having yield from almost everywhere). In the rare cases I want to ensure no yielding takes place across function calls, I like the way gruvi (https://github.com/geertj/gruvi) handles it with a construct to assert no yielding takes place.

    with assert_no_switchpoints():
        do_something()
        do_something_else()

    I also find that it is less error prone (missing a yield from), but that is a minor point as I could write a static analyzer (on top of test cases ofc) to check for that.

    But that's just my opinion and opinion's evolve :)

    @gvanrossum
    Copy link
    Member

    I will try to review later tonight. One thing though:

    I tend to write my code with the thought that any non standard library
    function can yield

    That makes sense when using geven, but not when using asyncio or Trollius.
    Nothing will make events run if you don't use yield [from].

    @chatgris
    Copy link
    Mannequin Author

    chatgris mannequin commented Sep 20, 2014

    I will try to review later tonight.

    Thanks!

    That makes sense when using gevent, but not when using asyncio or Trollius. Nothing will make events run if you don't use yield [from].

    Yes, I am aware of that. I have written a small custom library using fibers (a greenlet-like library) on top of asyncio so that I don't need to use yield from in my application(s).

    @gvanrossum
    Copy link
    Member

    On Sat, Sep 20, 2014 at 3:38 PM, Joshua Moore-Oliva <report@bugs.python.org>
    wrote:

    Joshua Moore-Oliva added the comment:

    > I will try to review later tonight.

    Thanks!

    > That makes sense when using gevent, but not when using asyncio or
    Trollius. Nothing will make events run if you don't use yield [from].

    Yes, I am aware of that. I have written a small custom library using
    fibers (a greenlet-like library) on top of asyncio so that I don't need to
    use yield from in my application(s).

    Hm. That sounds like you won't actually be interoperable with other
    asyncio-using code.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 25, 2014

    New changeset 2a868c9f8f15 by Yury Selivanov in branch '3.4':
    asyncio: Improve canceled timer handles cleanup. Closes issue bpo-22448.
    https://hg.python.org/cpython/rev/2a868c9f8f15

    New changeset a6aaacb2b807 by Yury Selivanov in branch 'default':
    asyncio: Improve canceled timer handles cleanup. Closes issue bpo-22448.
    https://hg.python.org/cpython/rev/a6aaacb2b807

    @chatgris
    Copy link
    Mannequin Author

    chatgris mannequin commented Sep 25, 2014

    Hm. That sounds like you won't actually be interoperable with other asyncio-using code.

    asyncio code can be interoperated with by spinning off an asyncio coroutine that on completion calls a callback that reschedules a non-asyncio coroutine.

    I assume we shouldn't be spamming an issue with unrelated chatter, I'd be happy to discuss more via email if you would like.

    @chatgris
    Copy link
    Mannequin Author

    chatgris mannequin commented Sep 25, 2014

    Also - should I close this issue now that a patch has been committed?

    @1st1
    Copy link
    Member

    1st1 commented Sep 25, 2014

    Hm, strange, usually roundup robot closes issues. Anyways, closed now. Thanks again, Joshua.

    @1st1 1st1 closed this as completed Sep 25, 2014
    @vstinner
    Copy link
    Member

    Sorry, I'm coming later after the review and the commit, but I worry about performances of _run_once() since it's the core of asyncio. Yury proposed to only iterate once on self._scheduled when removing delayed calls, and I have the same concern.

    Here is a patch which change _run_once() to only iterate once.

    IMO the change is obvious, the current iterates twice and makes the same check twice (check the _cancelled attribute of handles).

    @vstinner vstinner reopened this Sep 30, 2014
    @1st1
    Copy link
    Member

    1st1 commented Sep 30, 2014

    Victor,

    During the code review we tried the single loop approach. At the end Joshua wrote a small benchmark to test if it's really faster to do it in one loop or not. Turned out that single loop approach is not faster than loop+comprehension (but it's not much slower either, I'd say they are about the same in terms of speed).

    One loop approach might work faster on PyPy, but unfortunately, they still don't support 3.3 to test.

    @vstinner
    Copy link
    Member

    Turned out that single loop approach is not faster than loop+comprehension

    IMO it makes the code simpler and easier to understand.

    @1st1
    Copy link
    Member

    1st1 commented Sep 30, 2014

    IMO it makes the code simpler and easier to understand.

    But it's a tad slower, like 2-3% ;) You can test it yourself, we only tested it on huge tasks list of 1M items.

    FWIW, I'm not opposed to your patch.

    @1st1
    Copy link
    Member

    1st1 commented Sep 30, 2014

    Victor,

    I've done some additional testing. Here's a test that Joshua wrote for the code review: https://gist.github.com/1st1/b38ac6785cb01a679722

    It appears that single loop approach works a bit faster for smaller collections of tasks. On a list of 10000 tasks it's on average faster 2-3%, on a list of 1000000 tasks it's slower for 2-3%.

    I'm not sure what's the average number of tasks for an "average" asyncio application, but something tells me it's not going to be in range of millions. I think you can fix the code to have a single loop.

    @vstinner
    Copy link
    Member

    "... on average faster 2-3% ... slower for 2-3%"

    3% is meaningless. On a microbenchmark, you can say that it's faster or slower if the difference is at least 10%.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 30, 2014

    New changeset b85ed8bb7523 by Victor Stinner in branch '3.4':
    Issue bpo-22448: asyncio, cleanup _run_once(), only iterate once to remove delayed
    https://hg.python.org/cpython/rev/b85ed8bb7523

    New changeset 8e9df3414185 by Victor Stinner in branch 'default':
    (Merge 3.4) Issue bpo-22448: asyncio, cleanup _run_once(), only iterate once to
    https://hg.python.org/cpython/rev/8e9df3414185

    @1st1
    Copy link
    Member

    1st1 commented Sep 30, 2014

    Victor,

    Here's an updated benchmark results:

    NUMBER_OF_TASKS 10000
    ITERATIONS -> 2000 out of 2000
    2 loops:  0.004267875499863294
    1 loop:   0.007916624497738667
    TOTAL_BENCH_TIME 15.975227117538452
    
    NUMBER_OF_TASKS 100000
    ITERATIONS -> 2000 out of 2000
    2 loops:  0.04882345050100412
    1 loop:   0.09141454550081107
    TOTAL_BENCH_TIME 180.64003109931946
    

    2 loops is always about 30-40% slower. I've updated the benchmark I used: https://gist.github.com/1st1/b38ac6785cb01a679722

    Now it incorporates a call to heapify, and should yield more stable results. Please check it out, as I'm maybe doing something wrong there, but if it's alright, I think that you need to revert your commits.

    @1st1
    Copy link
    Member

    1st1 commented Sep 30, 2014

    typo:

    2 loops is always about 30-40% slower.

    2 loops is always about 30-40% faster.

    @vstinner
    Copy link
    Member

    Hum you don't reset start between the two benchmarks.

    @1st1
    Copy link
    Member

    1st1 commented Sep 30, 2014

    Eh, I knew something was wrong. Thanks.

    NUMBER_OF_TASKS 100000
    ITERATIONS -> 2000 out of 2000
    2 loops: 0.045037232999675325
    1 loop: 0.045182990999819594
    TOTAL_BENCH_TIME 91.36706805229187

    Please commit your change to the tulip repo too.

    @1st1 1st1 closed this as completed Sep 30, 2014
    @vstinner
    Copy link
    Member

    vstinner commented Oct 5, 2014

    Please commit your change to the tulip repo too.

    Oh sorry. In fact, I made the commit but I forgot to push my change :-p

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-asyncio
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants