Issue26259
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016-02-01 16:06 by Brunsgaard, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
poc.py | Brunsgaard, 2016-02-01 16:06 | Example program to reproduce | ||
poc.py | Bartek Biernacki, 2017-03-08 16:21 | Custom Queue without described behaviour |
Messages (10) | |||
---|---|---|---|
msg259327 - (view) | Author: Jonas Brunsgaard (Brunsgaard) | Date: 2016-02-01 16:06 | |
When making repeated calls to queue.get, memory is building up and is not freed until queue.push is called. I wrote this little program to show my findings. The program will perform a lot of calls to queue.get and once every 60 seconds a queue.push is performed. Every 15 seconds the memory usage of dictionaries is printet to the console. You can find the output below the program ``` import asyncio from pympler import muppy from pympler import summary q = asyncio.Queue() loop = asyncio.get_event_loop() closing = False async def get_with_timeout(): while not closing: try: task = asyncio.ensure_future(q.get()) await asyncio.wait_for(task, 0.2) except asyncio.TimeoutError: pass def mem_profiling(): if not closing: types_ = muppy.filter(muppy.get_objects(), Type=dict) summary.print_(summary.summarize(types_)) loop.call_later(15, mem_profiling) def put(): q.put_nowait(None) loop.call_later(60, put) put() tasks = [asyncio.ensure_future(get_with_timeout()) for _ in range(10000)] mem_profiling() try: loop.run_forever() except KeyboardInterrupt: closing = True loop.run_until_complete( asyncio.ensure_future(asyncio.wait(tasks))) finally: loop.close() ``` Output: types | # objects | total size ======================================== | =========== | ============ <class 'dict | 11886 | 3.20 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 224835 | 62.44 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 401487 | 111.00 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 576621 | 158.86 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 219745 | 58.99 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 390061 | 105.72 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 560623 | 152.65 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ <class 'dict | 220372 | 59.11 MB <class 'collections.OrderedDict | 11 | 16.84 KB <class 'email._encoded_words._QByteMap | 1 | 288 B types | # objects | total size ======================================== | =========== | ============ What we see is that memory builds up ~3mb/s, and when the push method is called the memory usage returns to normal. Is this the expected behavior or is this a bug? If it is expected I think we should update the documentation, to let people know about this behavior. -- Jonas Brunsgaard |
|||
msg259328 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-02-01 16:35 | |
I think this is expected behavior when you do a lot of get() calls that are cancelled by a timeout. Each call leaves a cancelled Future in the deque self._getters; these are removed only when _wakeup_next() is called (by put or put_nowait). Why is your app doing this? I suppose we could band-aid this by removing cancelled Futures when next calling get(), but that won't help if you start a load of get() calls and then cancel them all. Note that I don't want to complicate this code -- in the past it has been very difficult to prove it was correct under all circumstances, and now I am sure it is correct (barring memory leaks, as you've shown). I want to make sure any fix for this issue doesn't complicate the code, making it harder to understand or prove its correctness. On Mon, Feb 1, 2016 at 8:06 AM, Jonas Brunsgaard <report@bugs.python.org> wrote: > > New submission from Jonas Brunsgaard: > > When making repeated calls to queue.get, memory is building up and is not freed until queue.push is called. > > I wrote this little program to show my findings. The program will perform a lot of calls to queue.get and once every 60 seconds a queue.push is performed. Every 15 seconds the memory usage of dictionaries is printet to the console. You can find the output below the program > > ``` > import asyncio > from pympler import muppy > from pympler import summary > > > q = asyncio.Queue() > loop = asyncio.get_event_loop() > closing = False > > > async def get_with_timeout(): > while not closing: > try: > task = asyncio.ensure_future(q.get()) > await asyncio.wait_for(task, 0.2) > except asyncio.TimeoutError: > pass > > def mem_profiling(): > if not closing: > types_ = muppy.filter(muppy.get_objects(), Type=dict) > summary.print_(summary.summarize(types_)) > loop.call_later(15, mem_profiling) > > def put(): > q.put_nowait(None) > loop.call_later(60, put) > > put() > tasks = [asyncio.ensure_future(get_with_timeout()) for _ in range(10000)] > mem_profiling() > > try: > loop.run_forever() > except KeyboardInterrupt: > closing = True > loop.run_until_complete( > asyncio.ensure_future(asyncio.wait(tasks))) > finally: > loop.close() > ``` > > Output: > > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 11886 | 3.20 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 224835 | 62.44 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 401487 | 111.00 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 576621 | 158.86 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 219745 | 58.99 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 390061 | 105.72 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 560623 | 152.65 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > <class 'dict | 220372 | 59.11 MB > <class 'collections.OrderedDict | 11 | 16.84 KB > <class 'email._encoded_words._QByteMap | 1 | 288 B > types | # objects | total size > ======================================== | =========== | ============ > > What we see is that memory builds up ~3mb/s, and when the push method is called the memory usage returns to normal. > > Is this the expected behavior or is this a bug? If it is expected I think we should update the documentation, to let people know about this behavior. > > -- > Jonas Brunsgaard > > ---------- > components: asyncio > files: poc.py > messages: 259327 > nosy: Jonas Brunsgaard, gvanrossum, haypo, yselivanov > priority: normal > severity: normal > status: open > title: Memleak when repeated calls to asyncio.queue.Queue.get is performed, without push to queue. > type: resource usage > versions: Python 3.5 > Added file: http://bugs.python.org/file41772/poc.py > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue26259> > _______________________________________ |
|||
msg259335 - (view) | Author: Jonas Brunsgaard (Brunsgaard) | Date: 2016-02-01 17:28 | |
In my particular case, I developed an application close to beanstalkd, but with redis as "engine". I did create a callbackback reader class for users to subclass, the callbackreader is checking every second, on every tube(queue.Object). If new data has arrived for processing (this subroutine is using queue.get with wait_for). Maybe asyncio.Condition would have been the better choice. The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue. Before my fix, over a week the program would take up 10 Gigs of memory in our staging environment if nothing was touched, so I was assigned to investigate the cause. I think the current behavior is undesirable and cumbersome to see through, and if not changed there should be some kind of note in the documentation, so other good python folks will have a better chance to understand the behavior without reading the cpython asyncio queue implementation. |
|||
msg259340 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-02-01 18:38 | |
> The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue. There's an API for that -- get_nowait(). I'm happy to review a fix if you come up with one. But I currently don't have time to devise a fix myself. (Not even a docfix, if you think that's sufficient, but if you submit one it will swiftly be applied.) On Mon, Feb 1, 2016 at 9:28 AM, Jonas Brunsgaard <report@bugs.python.org> wrote: > > Jonas Brunsgaard added the comment: > > In my particular case, I developed an application close to beanstalkd, but with redis as "engine". I did create a callbackback reader class for users to subclass, the callbackreader is checking every second, on every tube(queue.Object). If new data has arrived for processing (this subroutine is using queue.get with wait_for). Maybe asyncio.Condition would have been the better choice. The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue. > > Before my fix, over a week the program would take up 10 Gigs of memory in our staging environment if nothing was touched, so I was assigned to investigate the cause. I think the current behavior is undesirable and cumbersome to see through, and if not changed there should be some kind of note in the documentation, so other good python folks will have a better chance to understand the behavior without reading the cpython asyncio queue implementation. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue26259> > _______________________________________ |
|||
msg259374 - (view) | Author: Jonas Brunsgaard (Brunsgaard) | Date: 2016-02-02 10:25 | |
You are right that get_nowait() is the correct api for my use case, using get_nowait() nothing is pushed to the internal _getters deque. The reason for my us of get() is that job futures are created one place in the code and then thrown in a processing function that will yield the job future. This design was created to handle all exceptions in processing(), but I agree that get_nowait would have been the superior solution. I do not have time on my hands right now to take on the challenge of writing a patch, but I might take it up later, sound fun ;) Good day to you sir, and thank you for the feedback. |
|||
msg259448 - (view) | Author: Jonas Brunsgaard (Brunsgaard) | Date: 2016-02-03 00:13 | |
Okay I thoroughly read the code again. Can you describe the architectural changes to the code regarding a patch, I will do a proposal. But I have to know we are on the same page, so I do not waste my and your time :) |
|||
msg259449 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-02-03 00:21 | |
First, you wrote "The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue" -- which is exactly what get_nowait() does. As to a patch, I think you should just describe how you would change the code, then we can talk details. |
|||
msg289244 - (view) | Author: Bartek Biernacki (Bartek Biernacki) | Date: 2017-03-08 16:21 | |
I ran into similar problem with many getters timing out and staying in memory until some put is called. I think this may be solved by using a callback on Future which would clean it if it was cancelled. I am presenting the idea on attached poc. If you think that this is a good idea I can provide a patch. |
|||
msg303389 - (view) | Author: Suren Nihalani (snihalani) * | Date: 2017-09-30 03:48 | |
@cjrh brought up that this issue is the same as https://bugs.python.org/issue31620. I put up a PR there. Can people review that and then we can close this? |
|||
msg308235 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2017-12-13 19:18 | |
Duplicate of https://bugs.python.org/issue31620 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:27 | admin | set | github: 70447 |
2017-12-13 19:18:26 | asvetlov | set | status: open -> closed superseder: asyncio.Queue leaks memory if the queue is empty and consumers poll it frequently nosy: + asvetlov messages: + msg308235 resolution: duplicate stage: resolved |
2017-09-30 03:48:43 | snihalani | set | nosy:
+ snihalani messages: + msg303389 |
2017-03-08 17:04:33 | gvanrossum | set | nosy:
- gvanrossum |
2017-03-08 16:22:53 | vstinner | set | nosy:
- vstinner |
2017-03-08 16:21:47 | Bartek Biernacki | set | files:
+ poc.py nosy: + Bartek Biernacki messages: + msg289244 |
2016-02-03 00:21:34 | gvanrossum | set | messages: + msg259449 |
2016-02-03 00:13:07 | Brunsgaard | set | messages: + msg259448 |
2016-02-02 10:25:50 | Brunsgaard | set | messages: + msg259374 |
2016-02-01 18:38:50 | gvanrossum | set | messages: + msg259340 |
2016-02-01 17:28:00 | Brunsgaard | set | messages: + msg259335 |
2016-02-01 16:35:23 | gvanrossum | set | messages: + msg259328 |
2016-02-01 16:06:20 | Brunsgaard | create |