classification
Title: memory leak in random number generation
Type: resource usage Stage:
Components: None Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, facundobatista, gtang, tim.peters
Priority: normal Keywords:

Created on 2008-06-08 15:46 by gtang, last changed 2010-11-25 00:25 by amaury.forgeotdarc. This issue is now closed.

Files
File name Uploaded Description Edit
unnamed gtang, 2008-06-08 17:13
Messages (11)
msg67833 - (view) Author: Grant Tang (gtang) Date: 2008-06-08 15:46
#the following code consume about 800M memory, which is normal
n = 100000000
data = [0.0 for i in xrange(n)]

#however, if I assign random number to data list, it will consume extra 
2.5G memory.
from random import random
for s in xrange(n):
    data[i] = random()

#even if I delete data, only 800M memory released
del data

#call gc.collect() does not help, the extra 2.5G memory not released
import gc
gc.collect()

only when I quit Python, the memory is released. Same effect if I use 
random number generator from numpy. 
Same effect even if I just say data[i] = atpof("1.26")
I tried it in both Python 2.4 and 2.5 on linux 64bit and 32bit.
msg67834 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-06-08 16:01
Confirmed the issue in the trunk right now:

(the number between square brackets point to the 'top' information below)

facundo@pomcat:~/devel/reps/python/trunk$ ./python 
Python 2.6a3+ (trunk:64009, Jun  7 2008, 09:51:56) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
[1]
>>> data = [0.0 for i in xrange(100000000)]
[2]
>>> from random import random
>>> for i in xrange(100000000):
...     data[i] = random()
... 
>>> 
[3]


The memory consumption:

     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
[1] 4054 facundo   20   0  5032 3264 1796 S  0.0  0.2   0:00.02 python
[2] 4054 facundo   20   0  414m 384m 1888 S  0.0 19.1   0:17.72 python
[3] 4054 facundo   20   0 1953m 1.4g 1952 S  0.0 70.7   1:01.40 python
msg67835 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2008-06-08 16:18
Strongly doubt this has anything to do with random number generation. 
Python maintains a freelist for float objects, which is both unbounded
and immortal.  Instead of doing "data[i] = random()", do, e.g., "data[i]
= float(s)", and I bet you'll see the same behavior.  That is, whenever
you create a number of distinct float objects simultaneously alive, the
space they occupy is never released (although it is available to be
reused for other float objects).  The use of random() here simply
creates a large number of distinct float objects simultaneously alive.
msg67836 - (view) Author: Grant Tang (gtang) Date: 2008-06-08 16:36
I agree with Tim's comment. The problem's why these floats keep alive 
even after random() call returns. Then this becomes a garbage 
collection issue?
msg67837 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2008-06-08 16:46
They stayed alive simultaneously because you stored 100 million of them
simultaneously in a list (data[]).  If instead you did, e.g.,

for i in xrange(100000000):
    x = random()

the problem would go away -- then only two float objects are
simultaneously alive at any given time (the "old" float in `x` stays
alive until the "new" float created by random() replaces it).
msg67838 - (view) Author: Grant Tang (gtang) Date: 2008-06-08 16:55
Here I am confused. 100million floats in a list takes about 800M byte 
memory. This is acceptable. 

for i in xrange(100000000):
    data[i] = random()

so it should be 800M plus a float returned by random(). But the problem 
is after this loop, except 800M bytes list, another >2G memory is 
occupied. And delete data list and call gc.collect() does not release 
these memory. I think you mean there are lots of floats used in random
() call, they should be released after random() returned.
msg67839 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-06-08 16:59
So, 0.0 would be cached, and the 414m+384m would be from the list
itself, right? I tried,

>>> data = [(1.0/i) for i in xrange(1,100000000)]

And the memory consumption was the big one.

Grant, the 800 MB is taken by ONE 0.0, and a list of zillion positions.

Furthermore, I did:

>>> for x in xrange(100000000):
...     i = random()

And the memory didn't increase.

Grant, take note that there's no gc issue, the numbers stay alive
because the list itself is pointing to them.

Closing this as invalid.
msg67841 - (view) Author: Grant Tang (gtang) Date: 2008-06-08 17:13
Facundo:

I understand now. You mean every unique float number used will be an object
in memory. And never been released until Python quit. Is there any way to
reclaim these memory? We need 3G memory to create a list of 100million
randum numbers.

Thank you very much,
Grant

On Sun, Jun 8, 2008 at 11:59 AM, Facundo Batista <report@bugs.python.org>
wrote:

>
> Facundo Batista <facundo@taniquetil.com.ar> added the comment:
>
> So, 0.0 would be cached, and the 414m+384m would be from the list
> itself, right? I tried,
>
> >>> data = [(1.0/i) for i in xrange(1,100000000)]
>
> And the memory consumption was the big one.
>
> Grant, the 800 MB is taken by ONE 0.0, and a list of zillion positions.
>
> Furthermore, I did:
>
> >>> for x in xrange(100000000):
> ...     i = random()
>
> And the memory didn't increase.
>
> Grant, take note that there's no gc issue, the numbers stay alive
> because the list itself is pointing to them.
>
> Closing this as invalid.
>
> ----------
> resolution:  -> invalid
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3063>
> _______________________________________
>
msg67843 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-06-08 17:25
Grant,

A float takes 64 bits. 100 million floats take 800 MB, *just* the
floats. You're also building a list of 100 million places.

Maybe you shouldn't be building this structure in memory?

In any case, you should raise this issue in comp.lang.python, to get advice.

Regards,
msg67844 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2008-06-08 17:29
Float objects also require, as do all Python objects, space to hold a
type pointer and a reference count.  So each float object requires at
least 16 bytes (on most 32-bit boxes, 4 bytes for the type pointer, 4
bytes for the refcount, + 8 bytes for the float).  So 100 million float
objects requires at least 1.6 billion bytes.

It is a gc issue in the sense that the float-object free-list is both
unbounded and immortal.  For that matter, so is the int-object
free-list.  This has been discussed many times over the years on
python-dev, but nobody yet has a thoroughly attractive alternative.
msg122321 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-11-25 00:25
For the record, this was finally fixed with issue2862: gc.collect() now clears the free-lists during the collection of the highest generation.
History
Date User Action Args
2010-11-25 00:25:22amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg122321
2008-06-08 17:29:34tim.peterssetmessages: + msg67844
2008-06-08 17:25:11facundobatistasetmessages: + msg67843
2008-06-08 17:13:52gtangsetfiles: + unnamed
messages: + msg67841
2008-06-08 16:59:25facundobatistasetstatus: open -> closed
resolution: not a bug
messages: + msg67839
2008-06-08 16:55:44gtangsetmessages: + msg67838
2008-06-08 16:46:40tim.peterssetmessages: + msg67837
2008-06-08 16:36:41gtangsetmessages: + msg67836
2008-06-08 16:18:26tim.peterssetnosy: + tim.peters
messages: + msg67835
2008-06-08 16:01:38facundobatistasetnosy: + facundobatista
messages: + msg67834
versions: + Python 2.6, - Python 2.5, Python 2.4
2008-06-08 15:46:06gtangcreate