classification
Title: circular reference in HTTPResponse by urllib2
Type: resource usage Stage: test needed
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: amaury.forgeotdarc, dstanek, kristjan.jonsson, martin.panter, orsenthil, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2009-12-09 17:27 by kristjan.jonsson, last changed 2013-12-30 11:29 by kristjan.jonsson.

Files
File name Uploaded Description Edit
httpleak.py kristjan.jonsson, 2013-12-19 10:43 file that demonstrates HTTPResponse leak
Messages (14)
msg96175 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-12-09 17:27
in urllib2, you will find these lines:
        # Wrap the HTTPResponse object in socket's file object adapter
        # for Windows.  That adapter calls recv(), so delegate recv()
        # to read().  This weird wrapping allows the returned object to
        # have readline() and readlines() methods.

        # XXX It might be better to extract the read buffering code
        # out of socket._fileobject() and into a base class.

        r.recv = r.read
        fp = socket._fileobject(r, close=True)

This, storing a bound method in the instance, will cause a reference 
cycle that the user knows nothing about.

I propose creating a wrapper instance with a recv() method instead.  Or, 
is there a standard way of storing bound methods on instances?  A 
'weakmethod', perhaps?
msg96224 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-12-10 23:16
I have two solutions for this problem.  The first is a mundane one, and 
what I employed in our production environment:
class RecvAdapter(object):
    def __init__(self, wrapped):
        self.wrapped = wrapped
    def recv(self, amt):
        return self.wrapped.read(amt)
    def close(self):
        self.wrapped.close()

...
    fp = socket._fileobject(RecvAdapter(r), close=True)


The second solution is a bit more interesting.  It involves applying 
what I call a weakmethod: A bound method that holds a weak ref to the 
object instance:

import weakref
class WeakMethod(object):
    def __init__(self, bound):
        self.weakself = weakref.proxy(bound.im_self)
        self.methodname = bound.im_func.func_name
    def __call__(self, *args, **kw):
        return getattr(self.weakself, self.methodname)(*args, **kw)

We then do:
    r.recv = WeakMethod(r.read)
    fp = socket._fileobject(r, close=True)

I've had many uses for a WeakMethod through the years.  I wonder if such 
a class might be considered useful enough to be put into the weakref 
module.
msg96229 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-12-11 00:00
The WeakMethod idea is not new:

http://code.activestate.com/recipes/81253/

http://mindtrove.info/articles/python-weak-references/#toc-extending-weak-
references
msg96235 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-12-11 07:37
weak method idea seems interesting. I have not used it anytime yet, and
in this case,it seems okay.
msg96239 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-12-11 09:40
cool.  The mindtrove one in particular seems nice.  I didn't realize 
that one could build boundobjects oneself.
Is there any chance of getting a weakmethod class into the weakref 
module?
msg96241 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-12-11 09:56
On Fri, Dec 11, 2009 at 09:40:40AM +0000, Kristján Valur Jónsson wrote:
> Is there any chance of getting a weakmethod class into the weakref 
> module?

This is a separate feature request on weakref module. It may opened
and discussion carried out there?
msg110930 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-07-20 17:34
Lets just investigate the circular reference part here for this ticket.
msg112737 - (view) Author: David Stanek (dstanek) Date: 2010-08-04 02:27
Does this issue still exist? I did a little poking around at could not find the quoted code.
msg113112 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-06 17:00
in python/trunk/Lib/urllib2.py, line 1161

It doesn't appear to be an issue in py3k.
msg206502 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-12-18 09:41
This is still a horrible, horrible, cludge.
I've recently done some work in this area and will suggest a different approach.
msg206509 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-18 12:31
Could you please provide some tests?
msg206596 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-12-19 10:43
Here it is.
Notice the incredible nesting depth in python 2.7.
The socket itself is found at
response.fp._sock.fp._sock
There are two socket._fileobjects in use!
msg207020 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-12-28 02:47
Sounds like urlopen() is relying on garbage collection to close the socket and connection. Maybe it would be better to explicitly close the socket, even if you do eliminate all the garbage reference cycles.

My test code for Issue 19524 might be useful here. It verifies close() has been called on the HTTP socket.
msg207093 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-12-30 11:29
No, the socket is actually closed when response's close() method is called.  The problem is that the HTTPResponse object, buried deep within the nested classes returned from do_open(), has a circular reference, and _it_ will not go away.

No one is _relying_ on garbage collection in the sense that this is not, I think, designed behaviour, merely an unintentional effect of storing a bound method in the object inance.

As always, circular reference should be avoided when possible since relying on gc is not something to be done lightly.

Now, I think that changing the complicated wrapping at this stage is not possible, but merely replacing the bound method with a weak method might just do the trick.
History
Date User Action Args
2013-12-30 11:29:52kristjan.jonssonsetmessages: + msg207093
2013-12-28 02:47:37martin.pantersetnosy: + martin.panter
messages: + msg207020
2013-12-21 21:29:54serhiy.storchakasetnosy: + pitrou
2013-12-19 10:43:38kristjan.jonssonsetfiles: + httpleak.py

messages: + msg206596
2013-12-18 12:31:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg206509
2013-12-18 09:41:36kristjan.jonssonsetmessages: + msg206502
2013-12-17 20:12:52serhiy.storchakasetstage: needs patch -> test needed
versions: - Python 3.1, Python 3.2
2010-08-06 17:00:39kristjan.jonssonsetmessages: + msg113112
2010-08-04 02:27:54dstaneksetnosy: + dstanek
messages: + msg112737
2010-07-20 17:34:40orsenthilsetassignee: orsenthil
messages: + msg110930
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 2.5
2009-12-11 09:56:19orsenthilsetmessages: + msg96241
2009-12-11 09:40:38kristjan.jonssonsetmessages: + msg96239
2009-12-11 07:37:37orsenthilsetnosy: + orsenthil
messages: + msg96235
2009-12-11 00:00:03amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg96229
2009-12-10 23:16:28kristjan.jonssonsetkeywords: + patch

messages: + msg96224
2009-12-09 17:27:12kristjan.jonssoncreate