New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResourceWarning when urlopen() forgets the HTTPConnection object #63723
Comments
The AbstractHTTPHandler.do_open() method creates a HTTPConnection object but does not save it anywhere. This means a ResourceWarning is eventually triggered, at least when the HTTP server leaves the the connection open. Demonstration code: from urllib.request import urlopen
with urlopen("http://localhost") as response: response.read() When I used the above code, the warning did not trigger until I forced a garbage collection: import gc; gc.collect() Output: __main__:1: ResourceWarning: unclosed <socket.socket object, fd=3, family=2, type=1, proto=6> Alternatively, you can add a line to the bottom of the do_open() method: r.msg = r.reason
del h; import gc; gc.collect() # Add this to force warning
return r When the warning happens without forced garbage collection, it tends to happen here: /usr/lib/python3.3/socket.py:370: ResourceWarning: unclosed <socket.socket object, fd=3, family=2, type=1, proto=6> I tested by using the “socat” CLI program and supplying a chunked HTTP response, without immediately closing the connection at the server end. Using the Content-length header also seems to trigger the issue. $ sudo socat -d TCP-LISTEN:www,reuseaddr,crnl READLINE
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost
Connection: close
User-Agent: Python-urllib/3.3 <HTTP response start> 0 <HTTP response end> If the server leaves the connection open, it only seems to get closed when Python garbage-collects the socket and closes it. Perhaps the connection should be explicitly closed when the urlopen() response object is closed. But I guess that would require wrapping the HTTPResponse object to add to the close behaviour. |
Here are two patches: a test case, and a fix for my issue. They were done against an installed version of Python 3.3.3. I’m not entirely happy with the fix because it is accessing the private HTTPConnection.sock attribute from the urllib.request module, which seems a bit hacky. Other ideas would be to let HTTPConnection grow a new public method (or just a flag?) for cleaning itself up while leaving the response object still open. Or some kind of wrapper to augment the HTTPResponse.close() method to explicitly close the connection as well, like I originally mentioned. Also my test case borrows some classes from the “test_urllib” module into the “test_urllib2”. This is the first time I’ve looked closely at the Python test suite; maybe there is a better to do that as well. A bonus from my fix: it looks like it resolves a couple warnings running “test_urllib2net”. |
I can't reproduce issue. Python just hangs on read() when socat is used as test server. Perhaps I do something wrong. |
Thanks for looking at this. Perhaps you weren’t pasting the HTTP response into “socat”. After the six request lines are printed out, I enter the five lines between <HTTP response start> and <HTTP response end>; I didn’t really make this obvious. Otherwise, urlopen() hangs waiting for the response and read() never even gets called. Here’s a Python-only HTTP server to demonstrate without needing “socat”. Run this in one terminal: from http.server import HTTPServer, BaseHTTPRequestHandler
class Handler(BaseHTTPRequestHandler):
protocol_version = "HTTP/1.1"
def do_GET(self):
self.send_response(200)
self.send_header("Transfer-Encoding", "chunked")
self.end_headers()
self.wfile.write(b"0\r\n\r\n")
HTTPServer(("", 8000), Handler).serve_forever() . . . and in another terminal: from urllib.request import urlopen
with urlopen("http://localhost:8000") as response:
response.read()
import gc; gc.collect() |
Thank you Martin for clarification. Now I see the problem. Here is regenerated for Rietveld patch. Perhaps more safe will be not close socket, but only set the _closed attribute so that it will be closed just after closing SocketIO. if h.sock:
h.sock._closed = True
h.sock = None But this is even more tricky. Yet one approach is to implement __del__() method in the socket.socket class. But this breaks existing tests. |
How is it safer to manually set “h.sock._closed”? Playing with the internals of HTTPConnection is one thing, but playing with the internals of the socket object as well does not seem necessary. Also the ResourceWarning is warning that the socket and connection were closed by the garbage collector at some arbitrary point. I don’t think a new __del__() method is going to help. Sorry to be so negative :) Related issues: |
Just discovered the same fix of manually closing the socket object was already made independently of my patch in the “default” branch! See bpo-12692. http://hg.python.org/cpython/rev/92656b5df2f2 The main difference is my patch should also close the connection if HTTPConnection.getresponse() fails, which could be of some value. And the regression test could still be useful. |
The bpo-12692 fix has been backported to the 3.3 branch, and it fixes this bug. However here is an updated patch (against revision 28337a8fb502 in the “3.3” branch) consisting of two left over bits you might still want to use:
|
LGTM. Do you want to write a test for the case when an invalid response is received? |
Added a new test for the invalid response case. Now both tests are included in test2.patch. I separated the actual fix into a separate close3.4.patch (refreshed for the 3.4 branch). This way it is easier for me to make sure the tests work before applying the fix, but it should be easy to combine the patches again if you prefer it that way. |
New changeset c1fb19907cc4 by Serhiy Storchaka in branch '3.4': New changeset 43bf95480c3c by Serhiy Storchaka in branch 'default': |
Thank you for your contribution Martin. |
New changeset 19e4e0b7f1bd by Serhiy Storchaka in branch '2.7': |
Oh, Roundup Robot reports about a commit made almost 2 years ago! |
Sorry that was actually made by me earlier today. I grafted it from your commit, but I always forget to reset the author and date. |
Interesting. I didn't know that "hg graft" keeps the author and data of the original commit. Maybe this is a Mercurial bug. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: