This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: FD leak in urllib2
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: test_urllib2net is triggering a ResourceWarning
View: 12692
Assigned To: Nosy List: Claudio.Freire, javawizard, martin.panter, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-06-05 19:10 by Claudio.Freire, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bogus.py Claudio.Freire, 2013-06-05 19:10 Tornado server associated with the snippet in the description
Messages (5)
msg190687 - (view) Author: Claudio Freire (Claudio.Freire) Date: 2013-06-05 19:10
While other issues already exist about this problem, this particular case is unlike other issues, and I didn't think it a good idea to merge with those.

Under some very specific circumstances (sending a POST request with more data than an unknown threshold), at least one socket remains unclosed after calling close() on urllib2.urlopen's returned file object.

While I marked the only versions I could confirm exhibit the issue, I believe this is an issue on all versions.

This started in pypy[0], although it applies to CPython as well (albeit the reference counting GC is less likely to delay closing of the FD as much as in pypy).

I'm attaching the same server used to trigger this issue in pypy, works the same with CPython.

To trigger the leak, open an interpreter and do this (copypaste from pypy, CPython does not cause the leak because decref immediately closes the leak, but it will issue a wraning if ran with -Wall). See pypy's issue tracker[0] for detilas.

>>>> import os, urllib2
>>>> req = """{"imp": [{"h": 50, "battr": ["9", "10", "12"], "api": 3, "w": 320,
"instl": 0, "impid": "5d6dedf3-17bb-11e2-b5c0-1040f38b83e0"}]""" * 10
>>>> r = urllib2.Request("http://localhost:8000/bogus?src=1", req)
>>>> u = urllib2.urlopen(r)
>>>> v = u.read()
>>>> u.close()
>>>> os.system("ls -alh /proc/%d/fd/*" % os.getpid())
lrwx------ 1 claudiofreire users 64 Jun  4 15:08 /proc/26203/fd/0 -> /dev/pts/5
lrwx------ 1 claudiofreire users 64 Jun  4 15:08 /proc/26203/fd/1 -> /dev/pts/5
lrwx------ 1 claudiofreire users 64 Jun  4 15:08 /proc/26203/fd/2 -> /dev/pts/5
lrwx------ 1 claudiofreire users 64 Jun  4 15:08 /proc/26203/fd/3 ->
socket:[2086998]
lrwx------ 1 claudiofreire users 64 Jun  4 15:08 /proc/26203/fd/5 -> /dev/pts/5
lrwx------ 1 claudiofreire users 64 Jun  4 15:08 /proc/26203/fd/6 -> /dev/pts/5
0
>>>> 


[0] https://bugs.pypy.org/issue867
msg205370 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-12-06 13:38
Confirmed that this happens when the server sends a chunked response, or sends a Content-Length header, but not when the server just sends “Connection: close”. So this looks like the same as Issue 19524, and my patch for that seems to fix the issue here.

Python 3 version of the demo code:

import os, urllib.request
data = b"""{"imp": [{"h": 50, "battr": ["9", "10", "12"], "api": 3, "w": 320,
"instl": 0, "impid": "5d6dedf3-17bb-11e2-b5c0-1040f38b83e0"}]""" * 10
req = urllib.request.Request("http://localhost:8000/bogus?src=1", data)
resp = urllib.request.urlopen(req)
v = resp.read()
resp.close()
os.system("ls -alh /proc/%d/fd/*" % os.getpid())
msg214001 - (view) Author: Claudio Freire (Claudio.Freire) Date: 2014-03-18 18:23
I can confirm the issue is in urllib's open: it fails to close() the HTTP connection, leaving it to the GC to do it.

If addinfourl (and friends) is altered to carry a reference to the HTTP connection and close it on close(), the leak is fixed.

I have a patch but it is incomplete (just a POC), it only handles the common case I use.
msg214021 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-03-18 21:30
Does the fix for Issue 12692 work for you? Namely this revision <http://hg.python.org/cpython/rev/92656b5df2f2>. It was backported to C Python 3.3.4 as I understand.
msg214028 - (view) Author: Claudio Freire (Claudio.Freire) Date: 2014-03-18 21:43
Yes, seems it does.
History
Date User Action Args
2022-04-11 14:57:46adminsetgithub: 62344
2014-07-16 19:30:16serhiy.storchakasetstatus: open -> closed
superseder: test_urllib2net is triggering a ResourceWarning
resolution: duplicate
2014-03-18 21:43:36Claudio.Freiresetmessages: + msg214028
2014-03-18 21:30:28martin.pantersetmessages: + msg214021
2014-03-18 18:23:13Claudio.Freiresetmessages: + msg214001
2013-12-21 21:30:04serhiy.storchakasetnosy: + pitrou, serhiy.storchaka
2013-12-06 13:38:31martin.pantersetnosy: + martin.panter
messages: + msg205370
2013-12-06 00:18:13javawizardsetnosy: + javawizard
2013-06-05 19:10:23Claudio.Freirecreate