This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author chatgris
Recipients chatgris, gvanrossum, vstinner, yselivanov
Date 2014-09-20.00:13:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1411172040.41.0.974691283228.issue22448@psf.upfronthosting.co.za>
In-reply-to
Content
The core issue stems from the implementation of Timer cancellation. (which features like asyncio.wait_for build upon). BaseEventLoop stores scheduled events in an array backed heapq named _scheduled.  Once an event has been scheduled with call_at, cancelling the event only marks the event as cancelled, it does not remove it from the array backed heap.  It is only removed once the cancelled event is at the next scheduled event for the loop.

In a system where many events are run (and then cancelled) that may have long timeout periods, and there always exists at least one event that is scheduled for an earlier time, memory use is practically unbounded.  The attached program wait_for.py demonstrates a trivial example where memory use is practically unbounded for an hour of time. This is the case even though the program only ever has two "uncancelled" events and two coroutines at any given time in its execution.

This could be fixed in a variety of ways:

a) Timer cancellation could result in the object being removed from the heap like in the sched module.  This would be at least O(N) where N is the number of scheduled events.
b) Timer cancellation could trigger a callback that tracks the number of cancelled events in the _scheduled list. Once this number exceeds a threshold ( 50% ? ), the list could be cleared of all cancelled events and then be re-heapified. 
c) A balanced tree structure could be used to implement the scheduled events O(log N) time complexity (current module is O(log N) for heappop anyways).

Given python's lack of a balanced tree structure in the standard library, I assume option c) is a non-starter.

I would prefer option b) over option a) as when there are a lot of scheduled events in the system (upwards of 50,000 - 100,000 in some of my use cases) the amortized complexity for cancelling an event trends towards O(1) (N/2 cancellations are handled by a single O(N) event) at the cost of slightly more, but bounded relative to the amount of events, memory.

I would be willing to take a shot at implementing this patch with the most agreeable option. Please let me know if that would be appreciated, or if someone else would rather tackle this issue. (First time bug report for python, not sure of the politics/protocols involved). 

Disclaimer that I by no means an asyncio expert, my understanding of the code base is based on my reading of it debugging this memory leak.
History
Date User Action Args
2014-09-20 00:14:01chatgrissetrecipients: + chatgris, gvanrossum, vstinner, yselivanov
2014-09-20 00:14:00chatgrissetmessageid: <1411172040.41.0.974691283228.issue22448@psf.upfronthosting.co.za>
2014-09-20 00:14:00chatgrislinkissue22448 messages
2014-09-20 00:13:59chatgriscreate