Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FD leak in urllib2 #47316

Closed
bohdan mannequin opened this issue Jun 9, 2008 · 7 comments
Closed

FD leak in urllib2 #47316

bohdan mannequin opened this issue Jun 9, 2008 · 7 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@bohdan
Copy link
Mannequin

bohdan mannequin commented Jun 9, 2008

BPO 3066
Nosy @gpshead, @orsenthil, @devdanzin
Files
  • unnamed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2009-05-03.22:06:10.948>
    created_at = <Date 2008-06-09.11:02:32.034>
    labels = ['library', 'performance']
    title = 'FD leak in urllib2'
    updated_at = <Date 2009-05-03.22:06:10.947>
    user = 'https://bugs.python.org/bohdan'

    bugs.python.org fields:

    activity = <Date 2009-05-03.22:06:10.947>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2009-05-03.22:06:10.948>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2008-06-09.11:02:32.034>
    creator = 'bohdan'
    dependencies = []
    files = ['10602']
    hgrepos = []
    issue_num = 3066
    keywords = []
    message_count = 7.0
    messages = ['67860', '67998', '68074', '72147', '81787', '86591', '87077']
    nosy_count = 8.0
    nosy_names = ['gregory.p.smith', 'jjlee', 'orsenthil', 'dsm001', 'ajaksu2', 'nevyn', 'sharmila', 'bohdan']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'test needed'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue3066'
    versions = ['Python 2.6']

    @bohdan
    Copy link
    Mannequin Author

    bohdan mannequin commented Jun 9, 2008

    In urllib2.AbstractHTTPHandler.do_open, the following like creates a
    circular link:

            r.recv = r.read

    [r.read is a bound method, so it contains a reference to 'r'. Therefore,
    r now refers to itself.]

    If the GC is disabled or doesn't run often, this creates a FD leak.

    How to reproduce:

    import gc
    import urllib2
    u = urllib2.urlopen("http://google.com")
    s = [ u.fp._sock.fp._sock ]
    u.close()
    del u
    print gc.get_referrers(s[0])
    [<socket._fileobject object at 0xf7d42c34>, [<socket object, fd=4,
    family=2, type=1, protocol=6>]]

    I would expect that only one reference to the socket would exist (the
    "s" list itself).

    I can reproduce with 2.4; the problems seems to still exist in SVN HEAD.

    @bohdan bohdan mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Jun 9, 2008
    @sharmila
    Copy link
    Mannequin

    sharmila mannequin commented Jun 11, 2008

    Since the socket object is added to a list, a reference to the object
    always exists right? That would mean that it would not be garbage
    collected as long as the reference exists.

    On the other hand, it should also be noted that in close method, the
    socket is not explicitly closed and for a single urlopen, atleast 3
    sockets are opened.

    @bohdan
    Copy link
    Mannequin Author

    bohdan mannequin commented Jun 12, 2008

    The list is not the problem. The problem is the other reference, from
    "<socket._fileobject object at 0xf7d42c34>".

    Also note that the workaround (u.fp.recv = None) removes the second
    reference.

    This is fine (at least in CPython), because the socket is destroyed when the
    refcount reaches zero, thus calling the finalizer.

    @nevyn
    Copy link
    Mannequin

    nevyn mannequin commented Aug 29, 2008

    So if I add a:

    class _WrapForRecv:
        def __init__(self, obj):
            self.__obj = obj
    
        def __getattr__(self, name):
            if name == "recv": name = "read"
            return getattr(self.__obj, name)

    ...and then change:

            r.recv = r.read

    ...into:

            r = _WrapForRecv(r)

    ...it stops the leak, and afaics nothing bad happens.

    @gpshead gpshead self-assigned this Sep 22, 2008
    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 12, 2009

    Has (non-unittest) test and proposed (non-diff) patch inline.

    @dsm001
    Copy link
    Mannequin

    dsm001 mannequin commented Apr 26, 2009

    I can't reproduce in python 2.5.4, 2.6.2, or 2.7 trunk (though I can
    with 2.4.6 and 2.5) on mac & linux.

    Quick bisection suggests that it was fixed in r53511 while solving
    related bug http://bugs.python.org/issue1601399, and the explanation
    given there is consistent with the symptom here: the _fileobject doesn't
    close itself, and r53511 makes sure that it does.

    Suggest closing as fixed.

    @gpshead
    Copy link
    Member

    gpshead commented May 3, 2009

    not reproducable in head as stated.

    @gpshead gpshead closed this as completed May 3, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant