Issue33565
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-05-18 02:40 by thehesiod, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
tracemalloc_test.py | thehesiod, 2018-05-18 02:40 | |||
tracemalloc_test2.py | thehesiod, 2018-05-18 19:38 |
Messages (32) | |||
---|---|---|---|
msg317002 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-18 02:40 | |
while investigating https://github.com/boto/botocore/issues/1464 I used tracemalloc (like I've done before in 3.5.2) to try to figure out where the leak was. To my surprise tracemalloc listed stacks that didn't make any sense. Strangest example is the top result when running the attached script against python 3.6.5 in the following manner: PYTHONMALLOC=malloc /valgrind/bin/python3 /tmp/test.py head_object The top hit is listed as: 21 memory blocks: 4.7 KiB File "/tmp/test.py", line 28 raise File "/tmp/test.py", line 47 test(s3_client) File "/tmp/test.py", line 65 main() how is it that the "raise" is a leak? It doesn't make any sense to me specially given that no contexts are used in that call. Further that line is never hit because the exception is not re-thrown. Further a bunch of regular expression allocs don't make any sense either given that I've cleared the cache before doing snapshots. if someone could shed some light on why this is/isn't a bug that would be great. It seems to me that the callstacks are not related at all to the leak. |
|||
msg317009 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2018-05-18 07:42 | |
Victor, could you take a look? |
|||
msg317023 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2018-05-18 11:09 | |
Can you post a reproducer that doesn't involve S3? |
|||
msg317052 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-18 19:38 | |
here's a version that tries to do something similar but does not reproduce the issue |
|||
msg317082 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-19 00:48 | |
tracemalloc doesn't track memory leaks. It tracks memory allocations. I'm not sure that you are using tracemalloc properly. I suggest you to enable tracemalloc as soon as possible, take snapshot frequently (eg. at each iteration) and write them on disk, and later compute differences between snapshots. You may want to try my tracemallocqt UI to analyze snapshots: http://pytracemalloc.readthedocs.io/tracemallocqt.html What matters to track a memory leak is to find the largest positive difference. I don't know what allocated "21 memory blocks: 4.7 KiB" at your "raise" line. tracemalloc tracks *all* memory allocated by Python, including very private internal structures, like the dictionary used for "interned" strings. Sometimes, results can be surprising, but it should be simpler if you focus on differences between two snapshots. If you have a leak per iteration, after 10 or 25 iterations, the difference should become obvious. |
|||
msg317083 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-19 00:49 | |
Once you found and fixed your memory leak, any documentation enhancement is welcome :-) |
|||
msg317105 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-19 06:53 | |
I realize it doesn't track leaks, it's a tool to help find leaks when used correctly :) This example should be similar to using the compare snapshots mechanism as I start tracking from a stable point (several iterations in after a gc), and then compare to another stable point several iterations later. I have a much more complicated set-up at our company but wanted to keep the example as short as people complain here about large examples. Further I realize how tracemalloc works, I have a lot of experience in leak hunting from my c++ days, I've even written my own native version of tracemalloc before (it's not hard). The top stat is what bubbles up as the largest leak after a number of runs, that's why the results are so peculiar. I've used tracemalloc before to find https://bugs.python.org/issue29870 in 3.5.2 and there the results made sense, here it makes no sense. To my understanding there should not be any interned strings or other items that would cause this particular callstack to be the top hit of unreleased blocks of memory (leaks). I still don't see any credible reason from why that callstack would be returned. I still believe there's a real bug here, perhaps there's a leak inside the python interpreter implementation it's trying to point out? I think it's worth investigating. |
|||
msg317195 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-20 18:19 | |
so based on my last comment I just realized we could easily see if something was amiss by comparing results from python 3.5.2 to 3.6.5 and low and behold the callstack in question does not appear in the tracemalloc results from 3.5.2. |
|||
msg317198 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-20 19:05 | |
actually it does show in 3.5.2, it doesn't show when using get_object, but it does when using head_object, and they both throw the same exception, so weird. |
|||
msg317200 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-20 20:16 | |
here's another problem, if I change that function signature to: def test(s3_client): try: method = getattr(s3_client, sys.argv[1]) method(Bucket='archpi.dabase.com', Key='doesnotexist') except botocore.exceptions.ClientError as e: if e.response['ResponseMetadata']['HTTPStatusCode'] != 404: raise print('.', end='', flush=True) the print happens every time and the raise does not ever get hit, yet the raise line is still marked as the top "leak" (unfreed blocks). It's really strange. Furthermore the second top "leak" is this stack: File "/tmp/Python-3.6.5/Lib/sre_compile.py", line 439 return prefix, prefix_skip, False File "/tmp/Python-3.6.5/Lib/sre_compile.py", line 498 prefix, prefix_skip, got_all = _get_literal_prefix(pattern) File "/tmp/Python-3.6.5/Lib/sre_compile.py", line 548 _compile_info(code, p, flags) which doesn't make sense either, why would the return line be marked as having allocated anything? and btw I did change the implementation to start early, and then compare differences between a snapshot at iter 20 and iter 100 and the same holds true as I stated, there should logically be no difference between starting it later, or starting in the beginning and comparing two snapshots as all what you're doing in either cases is comparing two |
|||
msg317288 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-22 13:59 | |
A memory leak is when each iteration adds a fixed number of bytes of the memory: I'm talking about tracemalloc.get_traced_memory()[0] value. For example, if you compare the total traced memory between your iteration 30 and iteration 20, you get a value 10N. If you compare iteration 100 and 20, I would expect the value 80N. Is it the case? You can get the number of allocated bytes from a snapshot using: >>> sum(trace.size for trace in snapshot.traces) 2620350 |
|||
msg317305 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-22 16:19 | |
I believe your method is flawed, when enabling tracemalloc it's going to be using memory as well to store the traces. I still believe you need to use the method I mentioned and further even if we don't take into account the total memory the traces I mentioned need to be explained. |
|||
msg317308 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-22 16:29 | |
Oh. Usually, I strip traces allocated by tracemalloc using filters: http://pytracemalloc.readthedocs.io/examples.html#pretty-top snapshot = snapshot.filter_traces(( tracemalloc.Filter(False, "<frozen importlib._bootstrap>"), tracemalloc.Filter(False, "<unknown>"), )) |
|||
msg317309 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-22 16:38 | |
that's not going to affect http://pytracemalloc.readthedocs.io/api.html#get_traced_memory. There is no filter for that :) as to your sum that's exactly what my original callstack lists: 21 memory blocks: 4.7 KiB this means 21 blocks were not released, and in this case leaked because nothing should be held onto after the first iteration (creating the initial connector in the connection pool. In the head object case that's going to be a new connector per iteration, however the old one should go away. |
|||
msg317338 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-22 20:12 | |
> this means 21 blocks were not released, and in this case leaked because nothing should be held onto after the first iteration (creating the initial connector in the connection pool. In the head object case that's going to be a new connector per iteration, however the old one should go away. I'm not sure that I understand properly. If you call the function many times, does the memory usage increase? I'm not sure that this issue is a bug in Python. |
|||
msg317341 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-22 20:21 | |
yes, memory does go up. If you click the botocore bug link you'll see a graph of memory usage over time. |
|||
msg317351 - (view) | Author: Inada Naoki (methane) * | Date: 2018-05-23 02:34 | |
I can't reproduce it: File "test1.py", line 19, in do_get_obj response = s3_client.get_object(Bucket='archpi.dabase.com', Key='style.css') File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call return self._make_api_call(operation_name, kwargs) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/client.py", line 599, in _make_api_call operation_model, request_dict) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/endpoint.py", line 148, in make_request return self._send_request(request_dict, operation_model) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/endpoint.py", line 173, in _send_request request = self.create_request(request_dict, operation_model) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/endpoint.py", line 157, in create_request operation_name=operation_model.name) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/hooks.py", line 227, in emit return self._emit(event_name, kwargs) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/hooks.py", line 210, in _emit response = handler(**kwargs) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/signers.py", line 156, in sign auth.add_auth(request) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/auth.py", line 420, in add_auth super(S3SigV4Auth, self).add_auth(request) File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/auth.py", line 352, in add_auth raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentials |
|||
msg317362 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-23 07:13 | |
INADA Naoki: Unfortunately you'll need to use credentials from a free AWS account: https://aws.amazon.com/free/. Then create a credentials file in ~/.aws/credentials: https://docs.aws.amazon.com/cli/latest/userguide/cli-config-files.html |
|||
msg317393 - (view) | Author: Inada Naoki (methane) * | Date: 2018-05-23 12:22 | |
Alexander Mohr: Thanks, I can reproduce. I found ResourceWarning is happened when head_object. re may be used for filtering warning. You can use `-W always` to see the warnings. I don't know why the leak happened yet. |
|||
msg317396 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-23 12:33 | |
> Thanks, I can reproduce. Are you testing the master branch? I fixed a memory leak in the _warnings module for ignored ResourceWarning warnings. |
|||
msg317397 - (view) | Author: Inada Naoki (methane) * | Date: 2018-05-23 12:45 | |
I confirmed the leak on 3.6.5, and I can't reproduce it on 3.7.0b4. |
|||
msg317400 - (view) | Author: Inada Naoki (methane) * | Date: 2018-05-23 12:55 | |
Hmm, GH-4587 is merged in 3.6 branch in last November, but 3.6.5 doesn't include it? |
|||
msg317401 - (view) | Author: Inada Naoki (methane) * | Date: 2018-05-23 13:00 | |
I was wrong. It is not merged into 3.6 branch. The pull request is just closed. Now nothing strange in tracemalloc. |
|||
msg317410 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-05-23 14:31 | |
> Hmm, GH-4587 is merged in 3.6 branch in last November, but 3.6.5 doesn't include it? We decided to not fix the memory leak in Python 3.6: https://github.com/python/cpython/pull/4587#issuecomment-347659163 """ This change makes warning.warn() 1.4x slower on Python 3.6 for the ignore action: https://bugs.python.org/issue27535#msg307165 I don't think that the memory leak issue is important enough to justify such slowdown in a stable branch. So I close my PR. """ So yes, this issue is a bug in Python: it's bpo-27535. It's a bug in the _warnings module. tracemalloc only tells you the truth, it's just that it's hard to understand the output :-) The bug occurs when you get ResourceWarning warnings. Such warning means that the code has a bug: the bug should be fixed. If you fix the ResourceWarning, the bug should disappear. As Naoki INADA wrote: use python3 -Walways (I prefer python3 -Wdefault, but it's almost the same ;-)) to see such warnings. Python 3.7 comes with a new -X dev option which enables these warnings but also other runtime checks to ease development. I suggest to close this issue as a duplicate of bpo-27535. Thanks Alexander Mohr for your bug report, and good luck to fix your warning ;-) If you are motivated, you can propose changes to the tracemalloc documentation! |
|||
msg317419 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-23 16:07 | |
I'll try with that fix and see if the leak is fixed, is the reasoning that if the warning happens after the try/except scope and that's why the callstack has it? |
|||
msg317428 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-23 16:57 | |
Ok I've verified that the patch does indeed fix the leak detected. Thank you very much INADA for knowing that there was a leak in the warnings module, I would have never guessed, especially given the tracemalloc stack given. Had it showed a callstack where the warning was created it would have made a lot more sense. I agree this can be closed, however can the leak fix PLEASE be put into 3.6 (any any other version that needs it)? Who cares if warnings are 1.4x slower with the fix? Are you going to rationally tell me that keeping warnings fast is more important than fixing leaks? In most applications there should be no warnings so it doesn't really matter. This particular leak was causing our application to fail after running for a few days which makes it unusable in production. It's caused me a lot of days wasted in investigation. If speed was really a problem that would have been a much worthier thing to spend time on than finding leaks. leaks should be highest priority, then speed. No rational developer would have complained that warnings got slower, that's when you fix warnings, not because of leaks! :) |
|||
msg317439 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-23 19:25 | |
actually, thinking about this more on my way to work, this should NOT be closed. The callstack I initially mentioned still has no explanation and we now know is not correct. It should either have listed something related to warnings, or nothing at all (<unknown>) or something like that. There is no documentation to describe this behavior, think about it, it would have to be something like: "tracemalloc may give completely irrelevant callstacks". So I think this callstack still needs to be explained, and either: 1) the module should be fixed so it would give something more relevant (to give developers some foothold to realize this was related to warnings 2) For this scenario the callstack should be removed, to inform the developer that they should manually track the allocations in gdb or with some other mechanism. It would be really nice to know what c-callstacks (with parameters) trigger this tracemalloc stack. |
|||
msg317498 - (view) | Author: Inada Naoki (methane) * | Date: 2018-05-24 01:47 | |
> The callstack I initially mentioned still has no explanation and we now know is not correct. I don't think so. The callstack is Python's callstack where caused leak in C. (Python doesn't call malloc directly) In this case, local variables are freed "at end of the function", and it freed unclosed socket which cause warning. (*) So the callstack is completely valid. While only CPython expert can understand. (*) or "at the end of the except clause" which frees `e` and `e` have reference to the unclosed socket. |
|||
msg317501 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-24 01:57 | |
ok fair enough, however a caveat is that it's not necessarily end of function as I was able to expand the function by a few more lines and the callstack stayed the same, however I'm guessing the interpreter was smart enough to realize the calls below the try/except had no references. it would be nice if there was a virtual "socket.__del__` or something at the end of the stack. Basically if it could plug into the extension callbacks. That way we could have a little more visibility. closing, thanks guys. This fixes the issue in botocore, on to the next related leak found via aiobotocore in aiohttp, where it now has no tracemalloc entries, so I'm guessing a leak via the ssl module to openssl :( thanks again guys for the help, I really appreciate it, I hope in the future, my some mechanism, scenarios like these will be a lot easier to decipher. |
|||
msg317513 - (view) | Author: Mariatta (Mariatta) * | Date: 2018-05-24 02:34 | |
Alexander, in the future please use gender-neutral words such as "folks" and "y'all" instead of "guys". Thanks. |
|||
msg317515 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-24 02:35 | |
of course, sorry, I meant in a gender neutral way |
|||
msg317516 - (view) | Author: Alexander Mohr (thehesiod) * | Date: 2018-05-24 02:36 | |
this is how my friends talk here, see: https://english.stackexchange.com/questions/11816/is-guy-gender-neutral |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:00 | admin | set | github: 77746 |
2018-05-24 02:36:05 | thehesiod | set | messages: + msg317516 |
2018-05-24 02:35:03 | thehesiod | set | messages: + msg317515 |
2018-05-24 02:34:08 | Mariatta | set | nosy:
+ Mariatta messages: + msg317513 |
2018-05-24 01:57:59 | thehesiod | set | status: open -> closed resolution: not a bug messages: + msg317501 stage: resolved |
2018-05-24 01:47:03 | methane | set | messages: + msg317498 |
2018-05-23 19:25:15 | thehesiod | set | messages: + msg317439 |
2018-05-23 16:57:04 | thehesiod | set | messages: + msg317428 |
2018-05-23 16:07:29 | thehesiod | set | messages: + msg317419 |
2018-05-23 14:31:54 | vstinner | set | messages: + msg317410 |
2018-05-23 13:00:37 | methane | set | messages: + msg317401 |
2018-05-23 12:56:33 | pitrou | set | nosy:
- pitrou |
2018-05-23 12:55:23 | methane | set | messages: + msg317400 |
2018-05-23 12:45:08 | methane | set | messages: + msg317397 |
2018-05-23 12:33:43 | vstinner | set | messages: + msg317396 |
2018-05-23 12:22:44 | methane | set | messages: + msg317393 |
2018-05-23 07:13:16 | thehesiod | set | messages: + msg317362 |
2018-05-23 02:34:35 | methane | set | messages: + msg317351 |
2018-05-22 20:21:18 | thehesiod | set | messages: + msg317341 |
2018-05-22 20:12:27 | vstinner | set | messages: + msg317338 |
2018-05-22 16:38:47 | thehesiod | set | messages: + msg317309 |
2018-05-22 16:29:26 | vstinner | set | messages: + msg317308 |
2018-05-22 16:19:53 | thehesiod | set | messages: + msg317305 |
2018-05-22 13:59:03 | vstinner | set | messages: + msg317288 |
2018-05-22 05:33:39 | methane | set | nosy:
+ methane |
2018-05-20 20:16:58 | thehesiod | set | messages: + msg317200 |
2018-05-20 19:05:20 | thehesiod | set | messages: + msg317198 |
2018-05-20 18:19:01 | thehesiod | set | messages: + msg317195 |
2018-05-19 06:53:29 | thehesiod | set | messages: + msg317105 |
2018-05-19 00:49:04 | vstinner | set | messages: + msg317083 |
2018-05-19 00:48:38 | vstinner | set | messages: + msg317082 |
2018-05-18 19:38:51 | thehesiod | set | files:
+ tracemalloc_test2.py messages: + msg317052 |
2018-05-18 11:09:27 | pitrou | set | nosy:
+ pitrou messages: + msg317023 |
2018-05-18 07:42:29 | asvetlov | set | nosy:
+ vstinner, asvetlov messages: + msg317009 |
2018-05-18 02:41:00 | thehesiod | set | type: behavior |
2018-05-18 02:40:46 | thehesiod | create |