This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 does not close sockets properly
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: direvus, georg.brandl, jjlee
Priority: normal Keywords:

Created on 2006-11-22 21:04 by direvus, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg30652 - (view) Author: Brendan Jurd (direvus) Date: 2006-11-22 21:04
Python 2.5 (release25-maint, Oct 29 2006, 12:44:11)
[GCC 4.1.2 20061026 (prerelease) (Debian 4.1.1-18)] on linux2

I first noticed this when a program of mine (which makes a brief HTTPS connection every 20 seconds) started having some weird crashes.  It turned out that the process had a massive number of file descriptors open.  I did some debugging, and it became clear that the program was opening two file descriptors for every HTTPS connection it made with urllib2, and it wasn't closing them, even though I was reading all data from the response objects and then explictly calling close() on them.

I found I could easily reproduce the behaviour using the interactive console.  Try this while keeping an eye on the file descriptors held open by the python process:

To begin with, the process will have the usual FDs 0, 1 and 2 open for std(in|out|err), plus one other.

>>> import urllib2
>>> f = urllib2.urlopen("http://www.google.com")

Now at this point the process has opened two more sockets.

>>> f.read()
[... HTML ensues ...]
>>> f.close()

The two extra sockets are still open.

>>> del f

The two extra sockets are STILL open.

>>> f = urllib2.urlopen("http://www.python.org")
>>> f.read()
[...]
>>> f.close()

And now we have a total of four abandoned sockets open.

It's not until you terminate the process entirely, or the OS (eventually) closes the socket on idle timeout, that they are closed.

Note that if you do the same thing with httplib, the sockets are properly closed:

>>> import httplib
>>> c = httlib.HTTPConnection("www.google.com", 80)
>>> c.connect()

A socket has been opened.

>>> c.putrequest("GET", "/")
>>> c.endheaders()
>>> r = c.getresponse()
>>> r.read()
[...]
>>> r.close()

And the socket has been closed.
msg30653 - (view) Author: John J Lee (jjlee) Date: 2007-01-03 23:54
Confirmed.  The cause is the (ab)use of socket._fileobject by urllib2.AbstractHTTPHandler to provide .readline() and .readlines() methods.  _fileobject simply does not close the socket on _fileobject.close() (since in the original intended use of _fileobject, _socketobject "owns" the socket, and _fileobject only has a reference to it).  The bug was introduced with the upgrade to HTTP/1.1 in revision 36871.

The patch here fixes it:

http://python.org/sf/1627441
msg30654 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-01-21 10:36
Committed patch in rev. 53511, 53512 (2.5).
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44265
2006-11-22 21:04:15direvuscreate