msg67833 - (view) |
Author: Grant Tang (gtang) |
Date: 2008-06-08 15:46 |
#the following code consume about 800M memory, which is normal
n = 100000000
data = [0.0 for i in xrange(n)]
#however, if I assign random number to data list, it will consume extra
2.5G memory.
from random import random
for s in xrange(n):
data[i] = random()
#even if I delete data, only 800M memory released
del data
#call gc.collect() does not help, the extra 2.5G memory not released
import gc
gc.collect()
only when I quit Python, the memory is released. Same effect if I use
random number generator from numpy.
Same effect even if I just say data[i] = atpof("1.26")
I tried it in both Python 2.4 and 2.5 on linux 64bit and 32bit.
|
msg67834 - (view) |
Author: Facundo Batista (facundobatista) * |
Date: 2008-06-08 16:01 |
Confirmed the issue in the trunk right now:
(the number between square brackets point to the 'top' information below)
facundo@pomcat:~/devel/reps/python/trunk$ ./python
Python 2.6a3+ (trunk:64009, Jun 7 2008, 09:51:56)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
[1]
>>> data = [0.0 for i in xrange(100000000)]
[2]
>>> from random import random
>>> for i in xrange(100000000):
... data[i] = random()
...
>>>
[3]
The memory consumption:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
[1] 4054 facundo 20 0 5032 3264 1796 S 0.0 0.2 0:00.02 python
[2] 4054 facundo 20 0 414m 384m 1888 S 0.0 19.1 0:17.72 python
[3] 4054 facundo 20 0 1953m 1.4g 1952 S 0.0 70.7 1:01.40 python
|
msg67835 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2008-06-08 16:18 |
Strongly doubt this has anything to do with random number generation.
Python maintains a freelist for float objects, which is both unbounded
and immortal. Instead of doing "data[i] = random()", do, e.g., "data[i]
= float(s)", and I bet you'll see the same behavior. That is, whenever
you create a number of distinct float objects simultaneously alive, the
space they occupy is never released (although it is available to be
reused for other float objects). The use of random() here simply
creates a large number of distinct float objects simultaneously alive.
|
msg67836 - (view) |
Author: Grant Tang (gtang) |
Date: 2008-06-08 16:36 |
I agree with Tim's comment. The problem's why these floats keep alive
even after random() call returns. Then this becomes a garbage
collection issue?
|
msg67837 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2008-06-08 16:46 |
They stayed alive simultaneously because you stored 100 million of them
simultaneously in a list (data[]). If instead you did, e.g.,
for i in xrange(100000000):
x = random()
the problem would go away -- then only two float objects are
simultaneously alive at any given time (the "old" float in `x` stays
alive until the "new" float created by random() replaces it).
|
msg67838 - (view) |
Author: Grant Tang (gtang) |
Date: 2008-06-08 16:55 |
Here I am confused. 100million floats in a list takes about 800M byte
memory. This is acceptable.
for i in xrange(100000000):
data[i] = random()
so it should be 800M plus a float returned by random(). But the problem
is after this loop, except 800M bytes list, another >2G memory is
occupied. And delete data list and call gc.collect() does not release
these memory. I think you mean there are lots of floats used in random
() call, they should be released after random() returned.
|
msg67839 - (view) |
Author: Facundo Batista (facundobatista) * |
Date: 2008-06-08 16:59 |
So, 0.0 would be cached, and the 414m+384m would be from the list
itself, right? I tried,
>>> data = [(1.0/i) for i in xrange(1,100000000)]
And the memory consumption was the big one.
Grant, the 800 MB is taken by ONE 0.0, and a list of zillion positions.
Furthermore, I did:
>>> for x in xrange(100000000):
... i = random()
And the memory didn't increase.
Grant, take note that there's no gc issue, the numbers stay alive
because the list itself is pointing to them.
Closing this as invalid.
|
msg67841 - (view) |
Author: Grant Tang (gtang) |
Date: 2008-06-08 17:13 |
Facundo:
I understand now. You mean every unique float number used will be an object
in memory. And never been released until Python quit. Is there any way to
reclaim these memory? We need 3G memory to create a list of 100million
randum numbers.
Thank you very much,
Grant
On Sun, Jun 8, 2008 at 11:59 AM, Facundo Batista <report@bugs.python.org>
wrote:
>
> Facundo Batista <facundo@taniquetil.com.ar> added the comment:
>
> So, 0.0 would be cached, and the 414m+384m would be from the list
> itself, right? I tried,
>
> >>> data = [(1.0/i) for i in xrange(1,100000000)]
>
> And the memory consumption was the big one.
>
> Grant, the 800 MB is taken by ONE 0.0, and a list of zillion positions.
>
> Furthermore, I did:
>
> >>> for x in xrange(100000000):
> ... i = random()
>
> And the memory didn't increase.
>
> Grant, take note that there's no gc issue, the numbers stay alive
> because the list itself is pointing to them.
>
> Closing this as invalid.
>
> ----------
> resolution: -> invalid
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3063>
> _______________________________________
>
|
msg67843 - (view) |
Author: Facundo Batista (facundobatista) * |
Date: 2008-06-08 17:25 |
Grant,
A float takes 64 bits. 100 million floats take 800 MB, *just* the
floats. You're also building a list of 100 million places.
Maybe you shouldn't be building this structure in memory?
In any case, you should raise this issue in comp.lang.python, to get advice.
Regards,
|
msg67844 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2008-06-08 17:29 |
Float objects also require, as do all Python objects, space to hold a
type pointer and a reference count. So each float object requires at
least 16 bytes (on most 32-bit boxes, 4 bytes for the type pointer, 4
bytes for the refcount, + 8 bytes for the float). So 100 million float
objects requires at least 1.6 billion bytes.
It is a gc issue in the sense that the float-object free-list is both
unbounded and immortal. For that matter, so is the int-object
free-list. This has been discussed many times over the years on
python-dev, but nobody yet has a thoroughly attractive alternative.
|
msg122321 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-11-25 00:25 |
For the record, this was finally fixed with issue2862: gc.collect() now clears the free-lists during the collection of the highest generation.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:35 | admin | set | github: 47313 |
2010-11-25 00:25:22 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg122321
|
2008-06-08 17:29:34 | tim.peters | set | messages:
+ msg67844 |
2008-06-08 17:25:11 | facundobatista | set | messages:
+ msg67843 |
2008-06-08 17:13:52 | gtang | set | files:
+ unnamed messages:
+ msg67841 |
2008-06-08 16:59:25 | facundobatista | set | status: open -> closed resolution: not a bug messages:
+ msg67839 |
2008-06-08 16:55:44 | gtang | set | messages:
+ msg67838 |
2008-06-08 16:46:40 | tim.peters | set | messages:
+ msg67837 |
2008-06-08 16:36:41 | gtang | set | messages:
+ msg67836 |
2008-06-08 16:18:26 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg67835 |
2008-06-08 16:01:38 | facundobatista | set | nosy:
+ facundobatista messages:
+ msg67834 versions:
+ Python 2.6, - Python 2.5, Python 2.4 |
2008-06-08 15:46:06 | gtang | create | |