This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author stephbul
Recipients a.badger, akuchling, amaury.forgeotdarc, bwelling, holdenweb, jafo, jhylton, manekcz, nswinton, orsenthil, stephbul
Date 2009-06-03.13:31:24
SpamBayes Score 4.8742277e-05
Marked as misclassified No
Message-id <1244035891.65.0.310866535474.issue1208304@psf.upfronthosting.co.za>
In-reply-to
Content
Hello, 

I'm facing a urllib2 memory leak issue in one of my scripts that is not
threaded. I made a few tests in order to check what was going on and I
found this already existing bug thread (but old).

I'm not able to figure out what is the issue yet, but here are a few
informations:
Platform: Debian
Python version 2.5.4

I made a script (2 attached files) in order to make access to a web page
(http://www.google.com) every second, that monitors number of file
descriptors and memory footprint.
I also introduced the gc module (Garbage Collector) in order to retrieve
numbers of objects that are not freed (like already proposed in this
thread but more focussed on gc.DEBUG_LEAK flag)

Here are my results:
First acces output:
gc: collectable <dict 0xb793c604>
gc: collectable <HTTPResponse instance at 0xb7938f6c>
gc: collectable <dict 0xb793c4f4>
gc: collectable <HTTPMessage instance at 0xb793d0ec>
gc: collectable <dict 0xb793c02c>
gc: collectable <list 0xb7938e8c>
gc: collectable <list 0xb7938ecc>
gc: collectable <instancemethod 0xb79cf824>
gc: collectable <dict 0xb793c79c>
gc: collectable <HTTPResponse instance at 0xb793d2cc>
gc: collectable <instancemethod 0xb79cf874>
unreachable objects:  11
File descriptors number: 32
Memory: 4612

Thenth access:
gc: collectable <dict 0xb78f14f4>
gc: collectable <HTTPResponse instance at 0xb78f404c>
gc: collectable <dict 0xb78f13e4>
gc: collectable <HTTPMessage instance at 0xb78f462c>
gc: collectable <dict 0xb78e5f0c>
gc: collectable <list 0xb78eeb4c>
gc: collectable <list 0xb78ee2ac>
gc: collectable <instancemethod 0xb797b7fc>
gc: collectable <dict 0xb78f168c>
gc: collectable <HTTPResponse instance at 0xb78f442c>
gc: collectable <instancemethod 0xb78eaa7c>
unreachable objects:  110
File descriptors number: 32
Memory: 4680

After hundred access:
gc: collectable <dict 0x89e2e84>
gc: collectable <HTTPResponse instance at 0x89e3e2c>
gc: collectable <dict 0x89e2d74>
gc: collectable <HTTPMessage instance at 0x89e3ccc>
gc: collectable <dict 0x89db0b4>
gc: collectable <list 0x89e3cac>
gc: collectable <list 0x89e32ec>
gc: collectable <instancemethod 0x89d8964>
gc: collectable <dict 0x89e60b4>
gc: collectable <HTTPResponse instance at 0x89e50ac>
gc: collectable <instancemethod 0x89ddb1c>
unreachable objects:  1100
File descriptors number: 32
Memory: 5284

Each call to urllib2.urlopen() gives 11 new unreachable objects,
increases memory footprint without giving new open files.

Do you have any idea?
With the hack proposed in message
http://bugs.python.org/issue1208304#msg60751, number of unreachable
objects goes down to 8 unreachable objects remaining, but still memory
increases.

Regards.

stephbul

PS
My urlib2leak.py test calls monitor script (not able to attach it):
#! /bin/sh

PROCS='urllib2leak.py'

RUNPID=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf $2}'`
FDESC=`lsof -p $RUNPID | wc -l`
MEM=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf $6 }'`

echo "File descriptors number: "$FDESC
echo "Memory: "$MEM
History
Date User Action Args
2009-06-03 13:31:32stephbulsetrecipients: + stephbul, jhylton, akuchling, holdenweb, jafo, amaury.forgeotdarc, orsenthil, bwelling, manekcz, nswinton, a.badger
2009-06-03 13:31:31stephbulsetmessageid: <1244035891.65.0.310866535474.issue1208304@psf.upfronthosting.co.za>
2009-06-03 13:31:30stephbullinkissue1208304 messages
2009-06-03 13:31:25stephbulcreate