classification
Title: ResourceWarning: suggest to enable tracemalloc in the message
Type: resource usage Stage: test needed
Components: Versions: Python 3.6
process
Status: open Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: David Ford (FirefighterBlu3), haypo, martin.panter
Priority: normal Keywords:

Created on 2017-02-15 03:11 by David Ford (FirefighterBlu3), last changed 2017-02-15 17:32 by haypo.

Messages (11)
msg287810 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2017-02-15 03:11
the following warning is incorrectly reported as initiated at psycopg2/extras.py:314, it actually occurs inside the Twilio python module.

/usr/lib/python3.6/site-packages/psycopg2/extras.py:314: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('173.12.76.132', 39350), raddr=('52.72.8.76', 443)>
  ts = super(NamedTupleCursor, self).fetchall()
namedtuple_Record:19: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('173.12.76.132', 57692), raddr=('107.23.182.60', 443)>



This is the snippet of my code which makes the Twilio API call over https (noted by the >>> line).

            # get failed/undelivered
            with (yield from self.aiopg_pool.cursor(cursor_factory=psycopg2.extras.NamedTupleCursor)) as c:
                yield from c.execute(stmt_bad, (id,))
                rows = yield from c.fetchall()

            # make calls to twilio and fetch failure reasons
            for r in rows:
>>>             msg = self._twilio.messages.get(r.delivery_id)
                errors[r.delivery_id] = r.recipient, msg.error_code, msg.error_message


my tool uses crossbar.io's python-autobahn asyncio classes which do not use threading to my knowledge. psycopg2/pool.py does use threading. python-twilio does not use threading either.
msg287824 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-02-15 07:40
Without more information about what the relevant code is and why you think the line number is wrong, my best guess is you may not realize how the ResourceWarning is emitted. It happens when the garbage collector runs and destroys the socket object, which happens sometime after the last reference to the socket has been deleted. So the line number can be rather arbitrary, especially if it was a reference cycle that triggered the garbage collector.

If you are having trouble tracking down what socket is involved, the new “python -X tracemalloc=<N>” option <https://docs.python.org/3/whatsnew/3.6.html#warnings> may help. It saves stack traces when objects are created.
msg287830 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2017-02-15 09:48
i do know what's causing it, it's the twilio module not cleanly finishing up. i pointed out my call to this in the initial description.

/usr/lib/python3.6/site-packages/httplib2/__init__.py:857: DeprecationWarning: key_file, cert_file and check_hostname are deprecated, use a custom context instead.
  check_hostname=disable_ssl_certificate_validation ^ True)
/usr/lib/python3.6/site-packages/psycopg2/extras.py:314: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('173.12.76.132', 42016), raddr=('52.7.124.13', 443)>
  ts = super(NamedTupleCursor, self).fetchall()
Object allocated at (most recent call first):
  File "/usr/lib/python3.6/ssl.py", lineno 401
    _context=self, _session=session)
  File "/usr/lib/python3.6/http/client.py", lineno 1400
    server_hostname=server_hostname)
  File "/usr/lib/python3.6/site-packages/httplib2/__init__.py", lineno 994
    conn.connect()
  File "/usr/lib/python3.6/site-packages/httplib2/__init__.py", lineno 1071
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/python3.6/site-packages/httplib2/__init__.py", lineno 1321
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/lib/python3.6/site-packages/twilio/rest/resources/base.py", lineno 117
    resp, content = http.request(url, method, headers=headers, body=data)
  File "/usr/lib/python3.6/site-packages/twilio/rest/resources/base.py", lineno 152
    resp = make_request(method, uri, **kwargs)
  File "/usr/lib/python3.6/site-packages/twilio/rest/resources/base.py", lineno 200
    resp = make_twilio_request(method, uri, auth=self.auth, **kwargs)
  File "/usr/lib/python3.6/site-packages/twilio/rest/resources/base.py", lineno 333
    resp, item = self.request("GET", uri)
  File "/usr/lib/python3.6/site-packages/twilio/rest/resources/base.py", lineno 328
    return self.get_instance(sid)
  File "provider.py", lineno 524
    msg = self._twilio.messages.get(r.delivery_id)
  File "/usr/lib/python3.6/asyncio/events.py", lineno 126
    self._callback(*self._args)
  File "/usr/lib/python3.6/asyncio/base_events.py", lineno 1425
    handle._run()
  File "/usr/lib/python3.6/asyncio/base_events.py", lineno 421
    self._run_once()
  File "/usr/lib/python3.6/site-packages/autobahn/asyncio/wamp.py", lineno 168
    loop.run_forever()
  File "provider.py", lineno 595
    runner.run(make)

the socket is referenced by ip:port correctly and is coherent with the stack trace. nothing about the IP, socket, or stack trace, is related to the referenced file:position that is reported:

/usr/lib/python3.6/site-packages/psycopg2/extras.py:314: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('173.12.76.132', 42016), raddr=('52.7.124.13', 443)>
  ts = super(NamedTupleCursor, self).fetchall()

the psycopg2 connection is against a completely different IP and on the standard pgsql port of 5432.

here's the snippet of psycopg2/extras.py:

    313     def fetchall(self):
    314         ts = super(NamedTupleCursor, self).fetchall()
    315         nt = self.Record
    316         if nt is None:
    317             nt = self.Record = self._make_nt()
    318         return list(map(nt._make, ts))

nothing at all between my pgsql database and twilio share any form of connection or socket and there are no similarly named variables, complex code paths, or global variables. the only object data which has any shared scope is the twilio message fetch using the ID fetched from the pgsql database. if this section of code is commented out, the resource warning vanishes.

msg = self._twilio.messages.get(r.delivery_id)

if i patch the involved calls in twilio so i can touch the socket object and close it such as:

class Response(object):
    """
    Take a httplib2 response and turn it into a requests response
    """
    def __init__(self, httplib_resp, content, url, http_obj):
        self.content = content
        self.cached = False
        self.status_code = int(httplib_resp.status)
        self.ok = self.status_code < 400
        self.url = url
        self.connections = http_obj.connections

(http_obj added which is an instance of httplib2.Http which has a property of .connections wherein sockets are cached)

then after modifying the twilio module to pass the Response object to me as well as the message resource, i can eventually in my code:

            for r in rows:
                resp, msg = self._twilio.messages.get(r.delivery_id)
                errors[r.delivery_id] = r.recipient, msg.error_code, msg.error_message

            for k in resp.connections:
                resp.connections[k].close()

this closes all the cached sockets and ensures the ResourceWarning is never emitted. this messiness is another conversation however, the incorrect file:pos annotation for the ResourceWarning is the purpose of this bug.
========

the only way i can see this inaccurate message being presented as above, is if the resource warning code is not correctly tracking state inside GC runs. supposition:

* psycopg2 opened the socket to the PG database (threading is involved when the pool is used)
* psycopg2 fetches the rows
* psycopg2 closed socket, releasing object
* twilio opens a socket which coincidently uses the same object resource just closed but not purged in GC
* twilio doesn't close it the SSL socket so it is left hanging
* method exits, GC runs but RW state wasn't updated somewhere to say that the twilio/httplib2/http/ssl series of calls now owns that object memory
* RW code emits an unclosed socket warning against the outdated psycopg2/extras.py:314 information

i'm definitely not intimate with GC code so the above is purely speculation. i've seen people post some vague references to confusing resource warnings when threading is involved. as nothing appears to be affected with long running instances, the actual object handling seems solid, it's just the state tracking for the resource warning which is not correct
msg287841 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-02-15 10:33
You say that the line number is incorrect. What would you consider a correct line number to be?

Let me try again to explain my theory about garbage collection and reference cycles. Perhaps you have two objects a and b, linked such that a.x = b and b.y = a. This is a reference cycle. Suppose that the socket is also attached e.g. as the attribute a.socket. Even if all functions have deleted their references to a and b and the socket, the interpreter does not immediately realize that a and b (and therefore the socket) can be cleaned up and released from memory.

At some arbitrary point after everything has released its reference to a, b, and the socket, the interpreter will run the garbage collecter. It will pull apart the reference cycle, call the socket’s destructor, and trigger the warning. So I would expect that the line number that happened to trigger the garbage collector is indeed as reported. It is just not very useful in your case.

Please try calling gc.collect() at some point before just before the line indicated in the warning. That should force the warning to be emitted as part of the garbage collector run, and indicate that was indeed the indicated line that was triggering the warning.
msg287842 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-15 10:37
The report issue is not a bug. By design, Python cannot emits the ResourceWarning warning where you expect that it should be emitted. Martin explained you the thing with reference counter, reference cycles and the garbage collector.

It is exactly *why* I implemented the glue with tracemalloc to provide the traceback where the socket was allocated. I'm very happy to see that this feature (the traceback using tracemalloc) works as expected!

By the way, the warning module might log a message suggesting to enable tracemalloc to get the traceback if tracemalloc is available but disabled? What do you think?

The root issue is that the code doesn't close the SSLSocket: fix the code to close the socket, no?
msg287843 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-15 10:39
FYI the implementation of the ResourceWarning traceback was not as simple as I expected :-) I just wrote a blog post about, search for "ResourceWarning warnings now come with a traceback" in:
https://haypo.github.io/contrib-cpython-2016q1.html
msg287844 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2017-02-15 11:40
i quite understand the way objects can be associated with reference counts by using a, a.b, c=a.b, d=a, etc etc, and how they increase/decrease and when the GC can prune the objects. yes, the root issue is the unclosed web socket and i've already filed a bug with Twilio.

@Martin,

/usr/lib/python3.6/site-packages/psycopg2/extras.py:314: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('173.12.76.132', 42016), raddr=('52.7.124.13', 443)>

/usr/lib/python3.6/site-packages/psycopg2/extras.py:314 is unrelated to the resource in question. there is nothing in psycopg2 that can have any direct or indirect reference to the tcp/443 socket used by the twilio by any path in the code all the way back to glibc under python. nothing in psycopg2 and twilio down to _ssl/glibc have any associated references.

the reference link between the two is at best as vague as connecting the water in the garden hose to the condensation on the hot shower mirror.

the pg socket is created and used which will fetch a result that has no reference to the socket used by that query method. regardless, the socket is a completely different IP/port pair. there is nothing in the pool, connection, cursor, or query that will ever be used by twilio. psycopg2 has nothing to do with any of the http or httplib2 modules.

after the pgsql query is completed, the twilio module gets used. a new https socket is created. the only data every seen by the twilio module that has a reference to any objected created by the psycopg2 module is the r.delivery_id which is a UUID value derived from the pgsql database. it has no references to its parent chain of objects and the twilio module doesn't derive any objects from it.

to be more obtuse about it, if i replace the r.delivery_id with a quoted string i manually insert, the ResourceWarning is now logged against asyncio/events.py. the traceback is still the same. 

/usr/lib/python3.6/asyncio/events.py:126: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('173.12.76.132', 40668), raddr=('54.164.135.121', 443)>
  self._callback(*self._args)

this one happens to exist in the stack trace albeit even further up the call chain.

if the ResourceWarning is unable to emit an accurate file:pos, then it shouldn't print it at all and instead, use the stack print as mentioned by @STINNER. at least at this point, the developer has a legitimate stack to wander through and isn't going on a wild goose chase in an unrelated file.

i argue that printing a completely unrelated file:pos is definitely a bug. if it is likely to emit a wholly unrelated file:pos, it shouldn't be printed at all. unreliable information is just useless.
msg287846 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2017-02-15 11:45
@stinner your traceback suggestion was highly useful and much appreciated. i have spent several dozen hours researching ResourceWarnings in the past and tearing my hair out because the emitted warning was very much useless. i couldn't even make educated guesses about the code as socket pairs were used randomly.

i strongly approve of mentioning the tracemalloc / traceback in the warning, and i would also very much encourage saying "this filename:position could be completely wrong" or do away with it all together.
msg287850 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-02-15 13:02
The file position is often useful when the cleanup is deterministic. Example:

def f1():
    file1 = open("/dev/null")

def f2():
    file2 = open("/dev/null")
    del file2  # ResourceWarning

f1()  # ResourceWarning at function exit
f2()

In the these cases, the line number can identify the affected code:

demo.py:8: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>
  f1()  # ResourceWarning at function exit
demo.py:6: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>
  del file2  # ResourceWarning
msg287869 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-15 17:31
David Ford (FirefighterBlu3) added the comment:
> @stinner your traceback suggestion was highly useful and much appreciated. i have spent several dozen hours researching ResourceWarnings in the past and tearing my hair out because the emitted warning was very much useless. i couldn't even make educated guesses about the code as socket pairs were used randomly.

So did I. Why do you think that I implemented this feature? :-D

> i strongly approve of mentioning the tracemalloc / traceback in the warning, and i would also very much encourage saying "this filename:position could be completely wrong" or do away with it all together.

I dislike removing filename:line information: in some cases, the
information is valid and helps to understand a warning.

Ok, let's go to add a line suggestion to enable tracemalloc. Is there
someone here interested to propose a pull request? It should only be
done in Python 3.7 and later 3.6, it's a new feature in Python 3.6.
msg287870 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-15 17:32
To be clear: warnings should only suggest to enable tracemalloc if tracemalloc.is_tracing() is False.

It is possible to tracemalloc is tracing, but failed (for different reasons) to track where the memory of the resource was allocated.
History
Date User Action Args
2017-02-15 17:32:43hayposetmessages: + msg287870
title: ResourceWarning tracking is attaching incorrect file:position to unclosed socket -> ResourceWarning: suggest to enable tracemalloc in the message
2017-02-15 17:31:04hayposetmessages: + msg287869
2017-02-15 13:02:05martin.pantersetmessages: + msg287850
2017-02-15 11:45:58David Ford (FirefighterBlu3)setmessages: + msg287846
2017-02-15 11:40:37David Ford (FirefighterBlu3)setmessages: + msg287844
2017-02-15 10:39:07hayposetmessages: + msg287843
2017-02-15 10:37:52hayposetnosy: + haypo
messages: + msg287842
2017-02-15 10:33:18martin.pantersetresolution: not a bug
messages: + msg287841
2017-02-15 09:48:06David Ford (FirefighterBlu3)setmessages: + msg287830
2017-02-15 07:40:56martin.pantersetnosy: + martin.panter

messages: + msg287824
stage: test needed
2017-02-15 03:11:36David Ford (FirefighterBlu3)create