classification
Title: strange tracemalloc results
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Mariatta, asvetlov, inada.naoki, thehesiod, vstinner
Priority: normal Keywords:

Created on 2018-05-18 02:40 by thehesiod, last changed 2018-05-24 02:36 by thehesiod. This issue is now closed.

Files
File name Uploaded Description Edit
tracemalloc_test.py thehesiod, 2018-05-18 02:40
tracemalloc_test2.py thehesiod, 2018-05-18 19:38
Messages (32)
msg317002 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-18 02:40
while investigating https://github.com/boto/botocore/issues/1464 I used tracemalloc (like I've done before in 3.5.2) to try to figure out where the leak was.  To my surprise tracemalloc listed stacks that didn't make any sense.


Strangest example is the top result when running the attached script against python 3.6.5 in the following manner: PYTHONMALLOC=malloc /valgrind/bin/python3 /tmp/test.py head_object

The top hit is listed as:

21 memory blocks: 4.7 KiB
  File "/tmp/test.py", line 28
    raise
  File "/tmp/test.py", line 47
    test(s3_client)
  File "/tmp/test.py", line 65
    main()

how is it that the "raise" is a leak?  It doesn't make any sense to me specially given that no contexts are used in that call.  Further that line is never hit because the exception is not re-thrown.


Further a bunch of regular expression allocs don't make any sense either given that I've cleared the cache before doing snapshots.

if someone could shed some light on why this is/isn't a bug that would be great.  It seems to me that the callstacks are not related at all to the leak.
msg317009 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-05-18 07:42
Victor, could you take a look?
msg317023 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-18 11:09
Can you post a reproducer that doesn't involve S3?
msg317052 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-18 19:38
here's a version that tries to do something similar but does not reproduce the issue
msg317082 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-19 00:48
tracemalloc doesn't track memory leaks. It tracks memory allocations.

I'm not sure that you are using tracemalloc properly. I suggest you to enable tracemalloc as soon as possible, take snapshot frequently (eg. at each iteration) and write them on disk, and later compute differences between snapshots.

You may want to try my tracemallocqt UI to analyze snapshots:
http://pytracemalloc.readthedocs.io/tracemallocqt.html

What matters to track a memory leak is to find the largest positive difference.

I don't know what allocated "21 memory blocks: 4.7 KiB" at your "raise" line. tracemalloc tracks *all* memory allocated by Python, including very private internal structures, like the dictionary used for "interned" strings. Sometimes, results can be surprising, but it should be simpler if you focus on differences between two snapshots.

If you have a leak per iteration, after 10 or 25 iterations, the difference should become obvious.
msg317083 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-19 00:49
Once you found and fixed your memory leak, any documentation enhancement is welcome :-)
msg317105 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-19 06:53
I realize it doesn't track leaks, it's a tool to help find leaks when used correctly :)  This example should be similar to using the compare snapshots mechanism as I start tracking from a stable point (several iterations in after a gc), and then compare to another stable point several iterations later.  I have a much more complicated set-up at our company but wanted to keep the example as short as people complain here about large examples.

Further I realize how tracemalloc works, I have a lot of experience in leak hunting from my c++ days, I've even written my own native version of tracemalloc before (it's not hard).

The top stat is what bubbles up as the largest leak after a number of runs, that's why the results are so peculiar.  I've used tracemalloc before to find https://bugs.python.org/issue29870 in 3.5.2 and there the results made sense, here it makes no sense.  To my understanding there should not be any interned strings or other items that would cause this particular callstack to be the top hit of unreleased blocks of memory (leaks).

I still don't see any credible reason from why that callstack would be returned.  I still believe there's a real bug here, perhaps there's a leak inside the python interpreter implementation it's trying to point out?  I think it's worth investigating.
msg317195 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-20 18:19
so based on my last comment I just realized we could easily see if something was amiss by comparing results from python 3.5.2 to 3.6.5 and low and behold the callstack in question does not appear in the tracemalloc results from 3.5.2.
msg317198 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-20 19:05
actually it does show in 3.5.2, it doesn't show when using get_object, but it does when using head_object, and they both throw the same exception, so weird.
msg317200 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-20 20:16
here's another problem, if I change that function signature to:
def test(s3_client):
    try:
        method = getattr(s3_client, sys.argv[1])
        method(Bucket='archpi.dabase.com', Key='doesnotexist')
    except botocore.exceptions.ClientError as e:
        if e.response['ResponseMetadata']['HTTPStatusCode'] != 404:
            raise 

    print('.', end='', flush=True)

the print happens every time and the raise does not ever get hit, yet the raise line is still marked as the top "leak" (unfreed blocks).  It's really strange.  Furthermore the second top "leak" is this stack:
  File "/tmp/Python-3.6.5/Lib/sre_compile.py", line 439
    return prefix, prefix_skip, False
  File "/tmp/Python-3.6.5/Lib/sre_compile.py", line 498
    prefix, prefix_skip, got_all = _get_literal_prefix(pattern)
  File "/tmp/Python-3.6.5/Lib/sre_compile.py", line 548
    _compile_info(code, p, flags)


which doesn't make sense either, why would the return line be marked as having allocated anything?

and btw I did change the implementation to start early, and then compare differences between a snapshot at iter 20 and iter 100 and the same holds true as I stated, there should logically be no difference between starting it later, or starting in the beginning and comparing two snapshots as all what you're doing in either cases is comparing two
msg317288 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-22 13:59
A memory leak is when each iteration adds a fixed number of bytes of the memory: I'm talking about tracemalloc.get_traced_memory()[0] value. For example, if you compare the total traced memory between your iteration 30 and iteration 20, you get a value 10N. If you compare iteration 100 and 20, I would expect the value 80N. Is it the case?

You can get the number of allocated bytes from a snapshot using:

>>> sum(trace.size for trace in snapshot.traces)
2620350
msg317305 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-22 16:19
I believe your method is flawed, when enabling tracemalloc it's going to be using memory as well to store the traces.  I still believe you need to use the method I mentioned and further even if we don't take into account the total memory the traces I mentioned need to be explained.
msg317308 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-22 16:29
Oh. Usually, I strip traces allocated by tracemalloc using filters:

http://pytracemalloc.readthedocs.io/examples.html#pretty-top

    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
msg317309 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-22 16:38
that's not going to affect http://pytracemalloc.readthedocs.io/api.html#get_traced_memory.  There is no filter for that :)

as to your sum that's exactly what my original callstack lists:
21 memory blocks: 4.7 KiB

this means 21 blocks were not released, and in this case leaked because nothing should be held onto after the first iteration (creating the initial connector in the connection pool.  In the head object case that's going to be a new connector per iteration, however the old one should go away.
msg317338 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-22 20:12
> this means 21 blocks were not released, and in this case leaked because nothing should be held onto after the first iteration (creating the initial connector in the connection pool.  In the head object case that's going to be a new connector per iteration, however the old one should go away.

I'm not sure that I understand properly. If you call the function many times, does the memory usage increase?

I'm not sure that this issue is a bug in Python.
msg317341 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-22 20:21
yes, memory does go up.  If you click the botocore bug link you'll see a graph of memory usage over time.
msg317351 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-23 02:34
I can't reproduce it:

  File "test1.py", line 19, in do_get_obj
    response = s3_client.get_object(Bucket='archpi.dabase.com', Key='style.css')
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/client.py", line 599, in _make_api_call
    operation_model, request_dict)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/endpoint.py", line 148, in make_request
    return self._send_request(request_dict, operation_model)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/endpoint.py", line 173, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/endpoint.py", line 157, in create_request
    operation_name=operation_model.name)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/hooks.py", line 227, in emit
    return self._emit(event_name, kwargs)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/hooks.py", line 210, in _emit
    response = handler(**kwargs)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/signers.py", line 156, in sign
    auth.add_auth(request)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/auth.py", line 420, in add_auth
    super(S3SigV4Auth, self).add_auth(request)
  File "/Users/inada-n/tmp/boto-leak/venv/lib/python3.6/site-packages/botocore/auth.py", line 352, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
msg317362 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-23 07:13
INADA Naoki: Unfortunately you'll need to use credentials from a free AWS account: https://aws.amazon.com/free/.  Then create a credentials file in ~/.aws/credentials: https://docs.aws.amazon.com/cli/latest/userguide/cli-config-files.html
msg317393 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-23 12:22
Alexander Mohr:

Thanks, I can reproduce.

I found ResourceWarning is happened when head_object.
re may be used for filtering warning.

You can use `-W always` to see the warnings.

I don't know why the leak happened yet.
msg317396 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-23 12:33
> Thanks, I can reproduce.

Are you testing the master branch? I fixed a memory leak in the _warnings module for ignored ResourceWarning warnings.
msg317397 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-23 12:45
I confirmed the leak on 3.6.5, and I can't reproduce it on 3.7.0b4.
msg317400 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-23 12:55
Hmm, GH-4587 is merged in 3.6 branch in last November, but 3.6.5 doesn't include it?
msg317401 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-23 13:00
I was wrong.  It is not merged into 3.6 branch.
The pull request is just closed.

Now nothing strange in tracemalloc.
msg317410 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-23 14:31
> Hmm, GH-4587 is merged in 3.6 branch in last November, but 3.6.5 doesn't include it?

We decided to not fix the memory leak in Python 3.6:
https://github.com/python/cpython/pull/4587#issuecomment-347659163

"""
This change makes warning.warn() 1.4x slower on Python 3.6 for the ignore action: https://bugs.python.org/issue27535#msg307165

I don't think that the memory leak issue is important enough to justify such slowdown in a stable branch. So I close my PR.
"""

So yes, this issue is a bug in Python: it's bpo-27535. It's a bug in the _warnings module. tracemalloc only tells you the truth, it's just that it's hard to understand the output :-)

The bug occurs when you get ResourceWarning warnings. Such warning means that the code has a bug: the bug should be fixed. If you fix the ResourceWarning, the bug should disappear.

As Naoki INADA wrote: use python3 -Walways (I prefer python3 -Wdefault, but it's almost the same ;-)) to see such warnings.

Python 3.7 comes with a new -X dev option which enables these warnings but also other runtime checks to ease development.

I suggest to close this issue as a duplicate of bpo-27535.

Thanks Alexander Mohr for your bug report, and good luck to fix your warning ;-) If you are motivated, you can propose changes to the tracemalloc documentation!
msg317419 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-23 16:07
I'll try with that fix and see if the leak is fixed, is the reasoning that if the warning happens after the try/except scope and that's why the callstack has it?
msg317428 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-23 16:57
Ok I've verified that the patch does indeed fix the leak detected.  Thank you very much INADA for knowing that there was a leak in the warnings module, I would have never guessed, especially given the tracemalloc stack given.  Had it showed a callstack where the warning was created it would have made a lot more sense.

I agree this can be closed, however can the leak fix PLEASE be put into 3.6 (any any other version that needs it)?  Who cares if warnings are 1.4x slower with the fix?  Are you going to rationally tell me that keeping warnings fast is more important than fixing leaks? In most applications there should be no warnings so it doesn't really matter. This particular leak was causing our application to fail after running for a few days which makes it unusable in production.  It's caused me a lot of days wasted in investigation.  If speed was really a problem that would have been a much worthier thing to spend time on than finding leaks.

leaks should be highest priority, then speed.  No rational developer would have complained that warnings got slower, that's when you fix warnings, not because of leaks! :)
msg317439 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-23 19:25
actually, thinking about this more on my way to work, this should NOT be closed.  The callstack I initially mentioned still has no explanation and we now know is not correct.  It should either have listed something related to warnings, or nothing at all (<unknown>) or something like that.

There is no documentation to describe this behavior, think about it, it would have to be something like:

"tracemalloc may give completely irrelevant callstacks".  

So I think this callstack still needs to be explained, and either:

1) the module should be fixed so it would give something more relevant (to give developers some foothold to realize this was related to warnings

2) For this scenario the callstack should be removed, to inform the developer that they should manually track the allocations in gdb or with some other mechanism.

It would be really nice to know what c-callstacks (with parameters) trigger this tracemalloc stack.
msg317498 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-05-24 01:47
> The callstack I initially mentioned still has no explanation and we now know is not correct.

I don't think so.
The callstack is Python's callstack where caused leak in C.  (Python doesn't call malloc directly)

In this case, local variables are freed "at end of the function", and it freed unclosed socket which cause warning. (*)
So the callstack is completely valid.  While only CPython expert can understand.

(*) or "at the end of the except clause" which frees `e` and `e` have reference to the unclosed socket.
msg317501 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-24 01:57
ok fair enough, however a caveat is that it's not necessarily end of function as I was able to expand the function by a few more lines and the callstack stayed the same, however I'm guessing the interpreter was smart enough to realize the calls below the try/except had no references.  

it would be nice if there was a virtual "socket.__del__` or something at the end of the stack.  Basically if it could plug into the extension callbacks.  That way we could have a little more visibility.

closing, thanks guys. This fixes the issue in botocore, on to the next related leak found via aiobotocore in aiohttp, where it now has no tracemalloc entries, so I'm guessing a leak via the ssl module to openssl :(

thanks again guys for the help, I really appreciate it, I hope in the future, my some mechanism, scenarios like these will be a lot easier to decipher.
msg317513 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2018-05-24 02:34
Alexander, in the future please use gender-neutral words such as "folks" and "y'all" instead of "guys". Thanks.
msg317515 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-24 02:35
of course, sorry, I meant in a gender neutral way
msg317516 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-05-24 02:36
this is how my friends talk here, see: https://english.stackexchange.com/questions/11816/is-guy-gender-neutral
History
Date User Action Args
2018-05-24 02:36:05thehesiodsetmessages: + msg317516
2018-05-24 02:35:03thehesiodsetmessages: + msg317515
2018-05-24 02:34:08Mariattasetnosy: + Mariatta
messages: + msg317513
2018-05-24 01:57:59thehesiodsetstatus: open -> closed
resolution: not a bug
messages: + msg317501

stage: resolved
2018-05-24 01:47:03inada.naokisetmessages: + msg317498
2018-05-23 19:25:15thehesiodsetmessages: + msg317439
2018-05-23 16:57:04thehesiodsetmessages: + msg317428
2018-05-23 16:07:29thehesiodsetmessages: + msg317419
2018-05-23 14:31:54vstinnersetmessages: + msg317410
2018-05-23 13:00:37inada.naokisetmessages: + msg317401
2018-05-23 12:56:33pitrousetnosy: - pitrou
2018-05-23 12:55:23inada.naokisetmessages: + msg317400
2018-05-23 12:45:08inada.naokisetmessages: + msg317397
2018-05-23 12:33:43vstinnersetmessages: + msg317396
2018-05-23 12:22:44inada.naokisetmessages: + msg317393
2018-05-23 07:13:16thehesiodsetmessages: + msg317362
2018-05-23 02:34:35inada.naokisetmessages: + msg317351
2018-05-22 20:21:18thehesiodsetmessages: + msg317341
2018-05-22 20:12:27vstinnersetmessages: + msg317338
2018-05-22 16:38:47thehesiodsetmessages: + msg317309
2018-05-22 16:29:26vstinnersetmessages: + msg317308
2018-05-22 16:19:53thehesiodsetmessages: + msg317305
2018-05-22 13:59:03vstinnersetmessages: + msg317288
2018-05-22 05:33:39inada.naokisetnosy: + inada.naoki
2018-05-20 20:16:58thehesiodsetmessages: + msg317200
2018-05-20 19:05:20thehesiodsetmessages: + msg317198
2018-05-20 18:19:01thehesiodsetmessages: + msg317195
2018-05-19 06:53:29thehesiodsetmessages: + msg317105
2018-05-19 00:49:04vstinnersetmessages: + msg317083
2018-05-19 00:48:38vstinnersetmessages: + msg317082
2018-05-18 19:38:51thehesiodsetfiles: + tracemalloc_test2.py

messages: + msg317052
2018-05-18 11:09:27pitrousetnosy: + pitrou
messages: + msg317023
2018-05-18 07:42:29asvetlovsetnosy: + vstinner, asvetlov
messages: + msg317009
2018-05-18 02:41:00thehesiodsettype: behavior
2018-05-18 02:40:46thehesiodcreate