classification
Title: FD leak in urllib2
Type: resource usage Stage: test needed
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: ajaksu2, bohdan, dsm001, gregory.p.smith, jjlee, nevyn, orsenthil, sharmila
Priority: normal Keywords:

Created on 2008-06-09 11:02 by bohdan, last changed 2009-05-03 22:06 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
unnamed bohdan, 2008-06-12 19:40
Messages (7)
msg67860 - (view) Author: Bohdan Vlasyuk (bohdan) Date: 2008-06-09 11:02
In urllib2.AbstractHTTPHandler.do_open, the following like creates a
circular link:

        r.recv = r.read

[r.read is a bound method, so it contains a reference to 'r'. Therefore,
r now refers to itself.]

If the GC is disabled or doesn't run often, this creates a FD leak.

How to reproduce:

import gc
import urllib2
u = urllib2.urlopen("http://google.com")
s = [ u.fp._sock.fp._sock ]
u.close()
del u
print gc.get_referrers(s[0])
[<socket._fileobject object at 0xf7d42c34>, [<socket object, fd=4,
family=2, type=1, protocol=6>]]

I would expect that only one reference to the socket would exist (the
"s" list itself).

I can reproduce with 2.4; the problems seems to still exist in SVN HEAD.
msg67998 - (view) Author: Sharmila Sivakumar (sharmila) Date: 2008-06-11 17:13
Since the socket object is added to a list, a reference to the object
always exists right? That would mean that it would not be garbage
collected as long as the reference exists.  

On the other hand, it should also be noted that in close method, the
socket is not explicitly closed and for a single urlopen, atleast 3
sockets are opened.
msg68074 - (view) Author: Bohdan Vlasyuk (bohdan) Date: 2008-06-12 19:40
The list is not the problem. The problem is the other reference, from
"<socket._fileobject object at 0xf7d42c34>".

Also note that the workaround (u.fp.recv = None) removes the second
reference.

This is fine (at least in CPython), because the socket is destroyed when the
refcount reaches zero, thus calling the finalizer.
msg72147 - (view) Author: James Antill (nevyn) Date: 2008-08-29 18:28
So if I add a:

class _WrapForRecv:
    def __init__(self, obj):
        self.__obj = obj

    def __getattr__(self, name):
        if name == "recv": name = "read"
        return getattr(self.__obj, name)

...and then change:

        r.recv = r.read

...into:

        r = _WrapForRecv(r)

...it stops the leak, and afaics nothing bad happens.
msg81787 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-12 17:50
Has (non-unittest) test and proposed (non-diff) patch inline.
msg86591 - (view) Author: DSM (dsm001) Date: 2009-04-26 02:17
I can't reproduce in python 2.5.4, 2.6.2, or 2.7 trunk (though I can
with 2.4.6 and 2.5) on mac & linux.  

Quick bisection suggests that it was fixed in r53511 while solving
related bug http://bugs.python.org/issue1601399, and the explanation
given there is consistent with the symptom here: the _fileobject doesn't
close itself, and r53511 makes sure that it does.

Suggest closing as fixed.
msg87077 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-05-03 22:06
not reproducable in head as stated.
History
Date User Action Args
2009-05-03 22:06:10gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg87077
2009-04-26 02:17:35dsm001setnosy: + dsm001
messages: + msg86591
2009-02-13 01:19:21ajaksu2setnosy: + jjlee
2009-02-12 17:50:35ajaksu2setnosy: + ajaksu2, orsenthil
stage: test needed
messages: + msg81787
versions: + Python 2.6, - Python 2.4
2008-09-22 01:18:50gregory.p.smithsetassignee: gregory.p.smith
nosy: + gregory.p.smith
2008-08-29 18:28:33nevynsetnosy: + nevyn
messages: + msg72147
2008-06-12 19:40:26bohdansetfiles: + unnamed
messages: + msg68074
2008-06-11 17:13:23sharmilasetnosy: + sharmila
messages: + msg67998
2008-06-09 11:02:32bohdancreate