classification
Title: itertools.groupby() leaks memory with circular reference
Type: resource usage Stage:
Components: Versions: Python 2.5.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: _doublep, aronacher, asmodai, belopolsky, loewis, rhettinger
Priority: normal Keywords: patch

Created on 2008-03-06 19:39 by asmodai, last changed 2008-10-20 21:43 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
testcase.py asmodai, 2008-03-06 19:39 Testcase code
groupby-leak.diff belopolsky, 2008-03-06 21:21
Messages (9)
msg63332 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2008-03-06 19:39
Quoting from my email to Raymond:

In the Trac/Genshi community we've been tracking a bit obscure memory 
leak that causes us a lot of problems.

Please see http://trac.edgewall.org/ticket/6614 and then
http://genshi.edgewall.org/ticket/190 for background.

We reduced the case to the following Python only code and believe it is 
a bug within itertool's groupby. As Armin Ronacher explains in Genshi 
ticket 190:

"Looks like genshi is not to blame. itertools.groupby has a grouper 
with a reference to the groupby type but no traverse func. As soon as a 
circular reference ends up in the groupby (which happens thanks to the 
func_globals in our lambda) genshi leaks."

This can be demonstrated with the following code (testcase attachment 
present with this issue):

import gc
from itertools import groupby

def run():
    keyfunc = lambda x: x
    for i, j in groupby(range(100), key=keyfunc):
        keyfunc.x = j

for x in xrange(20):
    gc.collect()
    run()
    print len(gc.get_objects())

On executing this in will show numerical output of the garbage 
collector, but every iteration will be +4 from the previous, as Armin 
specifies:

  "a frame, a grouper, a keyfunc and a groupby object"

We have been unable to come up with a decent patch and thus I am 
logging this issue now.
msg63335 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-06 20:48
With the following patch:

===================================================================
--- Lib/test/test_itertools.py  (revision 61284)
+++ Lib/test/test_itertools.py  (working copy)
@@ -707,6 +707,12 @@
         a = []
         self.makecycle(takewhile(bool, [1, 0, a, a]), a)
 
+    def test_issue2246(self):
+        n = 10
+        keyfunc = lambda x: x
+        for i, j in groupby(xrange(n), key=keyfunc):
+            keyfunc.__dict__.setdefault('x',[]).append(j)
+                    
 def R(seqn):
     'Regular generator'
     for i in seqn:

$ ./python Lib/test/regrtest.py -R :: test_itertools

reports n*3 + 13 reference leaks.  This should give a clue ...
msg63336 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-06 21:05
It looks like the problem is that the internal grouper object becomes a
part of a cycle: keyfunc -> grouper(x) -> keyfunc(tgtkey), but its type
does not support GC.  I will try to come up with a patch.
msg63337 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-03-06 21:07
No need.  I'm already working on adding GC to the grouper.
msg63338 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-06 21:21
Oops.  Here is my patch anyways.
msg63339 - (view) Author: Paul Pogonyshev (_doublep) Date: 2008-03-06 21:32
Damn, I wrote a patch too ;)
msg63340 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-03-06 22:53
r61286.  Applied a patch substantially similar to Alexanders.  Thanks
for the test case and the report.
msg75009 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-20 21:28
Backport candidate
msg75011 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-20 21:43
Already backported in r61287.
History
Date User Action Args
2008-10-20 21:43:09loewissetstatus: open -> closed
messages: + msg75011
2008-10-20 21:28:28loewissetstatus: closed -> open
2008-10-20 21:28:14loewissetnosy: + loewis
messages: + msg75009
versions: + Python 2.5.3, - Python 2.6, Python 2.5, Python 2.4, Python 3.0
2008-03-06 22:53:11rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg63340
2008-03-06 21:32:20_doublepsetnosy: + _doublep
messages: + msg63339
2008-03-06 21:21:47belopolskysetfiles: + groupby-leak.diff
keywords: + patch
messages: + msg63338
2008-03-06 21:07:18rhettingersetassignee: rhettinger
messages: + msg63337
2008-03-06 21:05:44belopolskysetmessages: + msg63336
2008-03-06 20:53:49aronachersetnosy: + aronacher
2008-03-06 20:48:09belopolskysetnosy: + belopolsky
messages: + msg63335
2008-03-06 19:39:28asmodaicreate