classification
Title: PEP MemoryError with a lot of available memory gc not called
Type: resource usage Stage: needs patch
Components: Interpreter Core Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Itai.i, brian.curtin, illume, jimjjewett, loewis, markmat, pitrou, swapnil, ysj.ray
Priority: low Keywords:

Created on 2006-07-19 02:46 by markmat, last changed 2010-08-26 21:21 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
unnamed Itai.i, 2010-08-20 00:58
Messages (20)
msg29202 - (view) Author: Mark Matusevich (markmat) Date: 2006-07-19 02:46
Also the gc behavior is consistent with the
documentation, I beleave it is wrong. I think, that Gc
should be called automatically before any memory
allocation is raised.

Example 1:
for i in range(700): 
   a = [range(5000000)]
   a.append(a)
   print i

This example will crash on any any PC with less then
20Gb RAM. On my PC (Windows 2000, 256Mb) it crashes at
i==7.
Also, this example can be fixed by addition of a call
to gc.collect() in the loop, in real cases it may be
unreasonable. 
msg29203 - (view) Author: Rene Dudfield (illume) Date: 2006-07-19 23:20
Logged In: YES 
user_id=2042

Perhaps better than checking before every memory allocation,
would be to check once a memory error happens in an allocation.

That way there is only the gc hit once there is low memory.

So...

res = malloc(...);
if(!res) {
    gc.collect();
}

res = malloc(...);
if(!res) {
    raise memory error.
}



msg29204 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-07-23 20:00
Logged In: YES 
user_id=21627

This is very difficult to implement. The best way might be
to introduce yet another allocation function, one that
invokes gc before failing, and call that function in all
interesting places (of which there are many).

Contributions are welcome and should probably start with a
PEP first.
msg29205 - (view) Author: Mark Matusevich (markmat) Date: 2006-07-23 20:11
Logged In: YES 
user_id=1337765

This is exectly what I meant. 
For my recollection, this is the policy in Java GC. I never
had to handle MemoryError in Java, because I knew, that I
really do not have any more memory.
msg29206 - (view) Author: Mark Matusevich (markmat) Date: 2006-07-23 20:19
Logged In: YES 
user_id=1337765

Sorry, my last comment was to illume (I am slow typer :( )
msg29207 - (view) Author: Jim Jewett (jimjjewett) Date: 2006-08-02 21:52
Logged In: YES 
user_id=764593

Doing it everywhere would be a lot of painful changes.

Adding the "oops, failed, call gc and try again" to to 
PyMem_* (currently PyMem_Malloc, PyMem_Realloc, PyMem_New, 
and PyMem_Resize, but Brett may be changing that) is far 
more reasonable.

Whether it is safe to call gc from there is a different 
question.
msg29208 - (view) Author: Mark Matusevich (markmat) Date: 2006-08-03 10:02
Logged In: YES 
user_id=1337765

Another problem related to the above example: there is a
time waste due to a memory swap before the MemoryError. 
Possible solution is to use a dynamic memory limit: GC is
called when the limit is reached, then the limit is adjusted
according to the memory left.
msg29209 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-08-03 16:43
Logged In: YES 
user_id=21627

The example is highly constructed, and it is pointless to
optimize for a boundary case. In the average application,
garbage collection is invoked often enough to reclaim memory
before swapping occurs.
msg86769 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-04-28 21:57
Lowering priority since, as Martin said, it shouldn't be needed in
real-life situations.
msg90581 - (view) Author: Mark Matusevich (markmat) Date: 2009-07-16 20:25
It looks like the severity of this problem is underestimated here.

A programmer working with a significant amount of data (e.g SciPy user)
and uses OOP will face this problem. Most OOP designs result in
existence of some loops (e.g. two way connections). Some object in those
loops will include huge amount of data which were allocated by a single
operation if the program deals with some kind of algorithms (signal
processing, image processing or even 3D games).

I apologize that my example is artificial. I had a real-life program of
8000 lines which was going into swap for no apparent reason and then
crashing. But instead of posting those 8000 lines, I posted a simple
example illustrating the problem.
msg91312 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-05 11:05
I'm not sure what we should do anyway. Your program will first swap out
and thrash before the MemoryError is raised. Invoking the GC when memory
allocation fails would avoid the MemoryError, but not the massive
slowdown due to swapping.
msg114225 - (view) Author: Itai (Itai.i) Date: 2010-08-18 14:25
Hi all,

I'm joining Mark's assertion - this is a real issue for me too. I've stumbled into this problem too. 
I have a numpy/scipy kind of application (about 6000+ lines so far) which needs to allocate alot of memory for statistics derived from "real life data" which is then transformed a few times by different algorithms (which means allocating more memory, but dumping the previous objects).

Currently I'm getting MemoryError when I try to use the entire dataset, both on linux and on windows, python 2.5 on 64bit 4gb mem machine. (The windows python is a 32bit version though cause it needs to be compatible with some dlls. This is the same reason I use python 2.5)
msg114230 - (view) Author: ysj.ray (ysj.ray) Date: 2010-08-18 15:13
How about calling gc.collect() explicitly in the loop?
msg114237 - (view) Author: Itai (Itai.i) Date: 2010-08-18 16:08
Sure, that's what i'll do for now. Its an ok workaround for me, I was just
posting to support the
notion that its a bug (lets call it usability bug) and something that people
out there do run into.

There's also a scenerio where you couldn't use this workaround - for example
use a library
precompiled in a pyd..

On Wed, Aug 18, 2010 at 6:13 PM, Ray.Allen <report@bugs.python.org> wrote:

>
> Ray.Allen <ysj.ray@gmail.com> added the comment:
>
> How about calling gc.collect() explicitly in the loop?
>
> ----------
> nosy: +ysj.ray
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1524938>
> _______________________________________
>
msg114262 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-08-18 18:27
Anybody *really* interested in this issue: somebody will need to write a PEP, get it accepted, and provide an implementations. Open source is about scratching your own itches: the ones affected by a problems are the ones which are also expected to provide solutions.
msg114416 - (view) Author: Itai (Itai.i) Date: 2010-08-20 00:58
You are right, ofcourse... I haven't got the time for doing the right thing,
But I've found another workaround that helped me though and might be helpful
to others.

(not sure its for this thread though but...) Windows on default limits the
amount of memory
for 32 bit processes to 2GB. There's a bit in the PE image which tells 64
bit windows
to give it 4GB (on 32 bit windows PAE needs to be enabled too) which is
called
IMAGE_FILE_LARGE_ADDRESS_AWARE. There's a post-build way to enable
it with the editbin.exe utility which comes with visual studio like this:
editbin.exe /LARGEADDRESSAWARE python.exe

It works for me since it gives me x2 memory on my 64 bit os.
I have to say it could be dangerous since it essentially says no where in
python code
pointers are treated as negative numbers. I figured this should be right
since there's a 64 bit
version of python...

On Wed, Aug 18, 2010 at 9:27 PM, Martin v. Löwis <report@bugs.python.org>wrote:

>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> Anybody *really* interested in this issue: somebody will need to write a
> PEP, get it accepted, and provide an implementations. Open source is about
> scratching your own itches: the ones affected by a problems are the ones
> which are also expected to provide solutions.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1524938>
> _______________________________________
>
msg114424 - (view) Author: Swapnil Talekar (swapnil) Date: 2010-08-20 11:11
Mark, are you sure that the above program is sure to cause a crash. I had absolutely no problem running it with Python 3.1.2. With Python 2.6.5, PC went terribly slow but the program managed to run till i==14 without crashing. I did not wait to see if it reaches 700. I'm running it on XP.
msg114425 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-08-20 13:32
> (not sure its for this thread though but...) Windows on default limits
> the amount of memory for 32 bit processes to 2GB. There's a bit in
> the PE image which tells 64 bit windows to give it 4GB (on 32 bit
> windows PAE needs to be enabled too) which is called
> IMAGE_FILE_LARGE_ADDRESS_AWARE. There's a post-build way to enable
> it with the editbin.exe utility which comes with visual studio like
> this: editbin.exe /LARGEADDRESSAWARE python.exe


See #1449496 if you are interested in that.
msg114987 - (view) Author: Mark Matusevich (markmat) Date: 2010-08-26 15:24
This is what I got on computer with 512 MB RAM:

Mandriva Linux 2009.1
=============================
Python 2.6.1 (r261:67515, Jul 14 2010, 09:23:11) [GCC 4.3.2]
-----> Python process killed by operating system after 14


Microsoft Windows XP Professional
Version	5.1.2600 Service Pack 2 Build 2600
=============================================
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)]
-----> MemoryError after 10

Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)]
-----> MemoryError after 10

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
-----> MemoryError after 10

Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)]
-----> Sucessfull finnish in no time!!!

Unfortunately I cannot test the original program I had the problem with, because since the original post (2006) I changed the employer. Now I use Matlab :(
msg115026 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-08-26 21:21
Ok, I'm closing this as "won't fix". The OP doesn't have the issue anymore; anybody else having some issue please report that separately (taking into account that you are likely asked to provide a patch as well).
History
Date User Action Args
2010-08-26 21:21:29loewissetstatus: open -> closed
resolution: wont fix
messages: + msg115026
2010-08-26 15:24:58markmatsetmessages: + msg114987
2010-08-20 13:32:51brian.curtinsetnosy: + brian.curtin
messages: + msg114425
2010-08-20 11:11:27swapnilsetnosy: + swapnil
messages: + msg114424
2010-08-20 00:58:16Itai.isetfiles: + unnamed

messages: + msg114416
2010-08-18 18:41:04belopolskysetfiles: - unnamed
2010-08-18 18:27:06loewissetmessages: + msg114262
2010-08-18 16:08:25Itai.isetfiles: + unnamed

messages: + msg114237
2010-08-18 15:13:13ysj.raysetnosy: + ysj.ray
messages: + msg114230
2010-08-18 14:25:10Itai.isetnosy: + Itai.i

messages: + msg114225
versions: - Python 3.1, Python 2.7
2009-08-05 11:05:50pitrousetmessages: + msg91312
2009-07-16 20:25:05markmatsetmessages: + msg90581
2009-04-28 21:57:57pitrousetpriority: high -> low

nosy: + pitrou
messages: + msg86769

type: enhancement -> resource usage
stage: needs patch
2009-04-27 23:48:03ajaksu2setversions: + Python 3.1, Python 2.7, - Python 2.6
2008-01-05 19:47:02christian.heimessetpriority: normal -> high
versions: + Python 2.6
2006-07-19 02:46:19markmatcreate