Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceWarning when urlopen() forgets the HTTPConnection object #63723

Closed
vadmium opened this issue Nov 8, 2013 · 16 comments
Closed

ResourceWarning when urlopen() forgets the HTTPConnection object #63723

vadmium opened this issue Nov 8, 2013 · 16 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@vadmium
Copy link
Member

vadmium commented Nov 8, 2013

BPO 19524
Nosy @warsaw, @orsenthil, @pitrou, @vadmium, @serhiy-storchaka
Files
  • urllib-close-test.diff
  • urlopen-sock-close.diff
  • issue19524.diff
  • test+except-close.patch: Tests + close if exception, on top of existing fix test_urllib2net is triggering a ResourceWarning #56901
  • test2.patch: Test for close after valid & invalid responses
  • close3.4.patch: Close if exception, for 3.4
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-09-06.18:57:39.795>
    created_at = <Date 2013-11-08.03:45:17.035>
    labels = ['type-bug', 'library']
    title = 'ResourceWarning when urlopen() forgets the HTTPConnection object'
    updated_at = <Date 2016-05-16.18:26:27.710>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2016-05-16.18:26:27.710>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-09-06.18:57:39.795>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-11-08.03:45:17.035>
    creator = 'martin.panter'
    dependencies = []
    files = ['33004', '33005', '33187', '33319', '35980', '35981']
    hgrepos = []
    issue_num = 19524
    keywords = ['patch']
    message_count = 16.0
    messages = ['202404', '205366', '206481', '206491', '206508', '206662', '207056', '207373', '223257', '223316', '226502', '226503', '265681', '265687', '265689', '265715']
    nosy_count = 6.0
    nosy_names = ['barry', 'orsenthil', 'pitrou', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19524'
    versions = ['Python 3.4', 'Python 3.5']

    @vadmium
    Copy link
    Member Author

    vadmium commented Nov 8, 2013

    The AbstractHTTPHandler.do_open() method creates a HTTPConnection object but does not save it anywhere. This means a ResourceWarning is eventually triggered, at least when the HTTP server leaves the the connection open.

    Demonstration code:

    from urllib.request import urlopen
    with urlopen("http://localhost") as response: response.read()

    When I used the above code, the warning did not trigger until I forced a garbage collection:

    import gc; gc.collect()

    Output:

    __main__:1: ResourceWarning: unclosed <socket.socket object, fd=3, family=2, type=1, proto=6>

    Alternatively, you can add a line to the bottom of the do_open() method:

            r.msg = r.reason
            del h; import gc; gc.collect()  # Add this to force warning
            return r

    When the warning happens without forced garbage collection, it tends to happen here:

    /usr/lib/python3.3/socket.py:370: ResourceWarning: unclosed <socket.socket object, fd=3, family=2, type=1, proto=6>
    self._sock = None

    I tested by using the “socat” CLI program and supplying a chunked HTTP response, without immediately closing the connection at the server end. Using the Content-length header also seems to trigger the issue.

    $ sudo socat -d TCP-LISTEN:www,reuseaddr,crnl READLINE
    GET / HTTP/1.1
    Accept-Encoding: identity
    Host: localhost
    Connection: close
    User-Agent: Python-urllib/3.3

    <HTTP response start>
    HTTP/1.1 200 OK
    Transfer-encoding: chunked

    0

    <HTTP response end>

    If the server leaves the connection open, it only seems to get closed when Python garbage-collects the socket and closes it. Perhaps the connection should be explicitly closed when the urlopen() response object is closed. But I guess that would require wrapping the HTTPResponse object to add to the close behaviour.

    @vadmium vadmium added the stdlib Python modules in the Lib dir label Nov 8, 2013
    @vadmium
    Copy link
    Member Author

    vadmium commented Dec 6, 2013

    Here are two patches: a test case, and a fix for my issue. They were done against an installed version of Python 3.3.3.

    I’m not entirely happy with the fix because it is accessing the private HTTPConnection.sock attribute from the urllib.request module, which seems a bit hacky. Other ideas would be to let HTTPConnection grow a new public method (or just a flag?) for cleaning itself up while leaving the response object still open. Or some kind of wrapper to augment the HTTPResponse.close() method to explicitly close the connection as well, like I originally mentioned.

    Also my test case borrows some classes from the “test_urllib” module into the “test_urllib2”. This is the first time I’ve looked closely at the Python test suite; maybe there is a better to do that as well.

    A bonus from my fix: it looks like it resolves a couple warnings running “test_urllib2net”.

    @serhiy-storchaka
    Copy link
    Member

    I can't reproduce issue. Python just hangs on read() when socat is used as test server. Perhaps I do something wrong.

    @vadmium
    Copy link
    Member Author

    vadmium commented Dec 18, 2013

    Thanks for looking at this. Perhaps you weren’t pasting the HTTP response into “socat”. After the six request lines are printed out, I enter the five lines between <HTTP response start> and <HTTP response end>; I didn’t really make this obvious. Otherwise, urlopen() hangs waiting for the response and read() never even gets called.

    Here’s a Python-only HTTP server to demonstrate without needing “socat”. Run this in one terminal:

    from http.server import HTTPServer, BaseHTTPRequestHandler
    class Handler(BaseHTTPRequestHandler):
        protocol_version = "HTTP/1.1"
        def do_GET(self):
            self.send_response(200)
            self.send_header("Transfer-Encoding", "chunked")
            self.end_headers()
            self.wfile.write(b"0\r\n\r\n")
    HTTPServer(("", 8000), Handler).serve_forever()

    . . . and in another terminal:

    from urllib.request import urlopen
    with urlopen("http://localhost:8000") as response:
        response.read()
    import gc; gc.collect()

    @serhiy-storchaka
    Copy link
    Member

    Thank you Martin for clarification. Now I see the problem.

    Here is regenerated for Rietveld patch.

    Perhaps more safe will be not close socket, but only set the _closed attribute so that it will be closed just after closing SocketIO.

            if h.sock:
                h.sock._closed = True
                h.sock = None

    But this is even more tricky.

    Yet one approach is to implement __del__() method in the socket.socket class. But this breaks existing tests.

    @vadmium
    Copy link
    Member Author

    vadmium commented Dec 20, 2013

    How is it safer to manually set “h.sock._closed”? Playing with the internals of HTTPConnection is one thing, but playing with the internals of the socket object as well does not seem necessary.

    Also the ResourceWarning is warning that the socket and connection were closed by the garbage collector at some arbitrary point. I don’t think a new __del__() method is going to help. Sorry to be so negative :)

    Related issues:
    bpo-18144: FD leak in urllib2 (probably an exact dupe)
    bpo-11563: test_urllibnet is triggering a ResourceWarning (bug closed, but real issue was only side-stepped IMO)

    @vadmium
    Copy link
    Member Author

    vadmium commented Dec 28, 2013

    Just discovered the same fix of manually closing the socket object was already made independently of my patch in the “default” branch! See bpo-12692.

    http://hg.python.org/cpython/rev/92656b5df2f2

    The main difference is my patch should also close the connection if HTTPConnection.getresponse() fails, which could be of some value. And the regression test could still be useful.

    @vadmium
    Copy link
    Member Author

    vadmium commented Jan 5, 2014

    The bpo-12692 fix has been backported to the 3.3 branch, and it fixes this bug. However here is an updated patch (against revision 28337a8fb502 in the “3.3” branch) consisting of two left over bits you might still want to use:

    1. My test case

    2. Explicitly closing the HTTP connection when an invalid response is received. E.g. urlopen("http://google.com:443") would still leak the connection without this.

    @serhiy-storchaka
    Copy link
    Member

    LGTM. Do you want to write a test for the case when an invalid response is received?

    @serhiy-storchaka serhiy-storchaka self-assigned this Jul 16, 2014
    @vadmium
    Copy link
    Member Author

    vadmium commented Jul 17, 2014

    Added a new test for the invalid response case. Now both tests are included in test2.patch.

    I separated the actual fix into a separate close3.4.patch (refreshed for the 3.4 branch). This way it is easier for me to make sure the tests work before applying the fix, but it should be easy to combine the patches again if you prefer it that way.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 6, 2014

    New changeset c1fb19907cc4 by Serhiy Storchaka in branch '3.4':
    Issue bpo-19524: Fixed resource leak in the HTTP connection when an invalid
    http://hg.python.org/cpython/rev/c1fb19907cc4

    New changeset 43bf95480c3c by Serhiy Storchaka in branch 'default':
    Issue bpo-19524: Fixed resource leak in the HTTP connection when an invalid
    http://hg.python.org/cpython/rev/43bf95480c3c

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Martin.

    @serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Sep 6, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 16, 2016

    New changeset 19e4e0b7f1bd by Serhiy Storchaka in branch '2.7':
    Issue bpo-19524: Port fakehttp() from Py3 c1fb19907cc4 for use in test_urllib2
    https://hg.python.org/cpython/rev/19e4e0b7f1bd

    @serhiy-storchaka
    Copy link
    Member

    Oh, Roundup Robot reports about a commit made almost 2 years ago!

    @vadmium
    Copy link
    Member Author

    vadmium commented May 16, 2016

    Sorry that was actually made by me earlier today. I grafted it from your commit, but I always forget to reset the author and date.

    @serhiy-storchaka
    Copy link
    Member

    Interesting. I didn't know that "hg graft" keeps the author and data of the original commit. Maybe this is a Mercurial bug.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants