The cyclic GC uses a simple and somewhat naive policy to know when it must run. It is based on counting "+1" for every call to _PyObject_GC_Alloc(). Explicit calls to PyObject_GC_Del() are counted as "-1". The cyclic GC will only be executed after the count reaches 700. There is then a scheme with multiple generations, but the point is that nothing is done at all before _PyObject_GC_Alloc() has been called 700 times.
The problem is that each of these _PyObject_GC_Alloc() can be directly or indirectly responsible for a large quantity of memory. Take this example:
while True:
l = [None] * 10000000 # 80 MB, on 64-bit
l[-1] = l
del l
This loop actually consumes 700 times 80 MB, which is unexpected to say the least, and looks like a very fast memory leak. The same program on 32-bit architectures simply runs out of virtual address space and fails with a MemoryError---even if we lower the length of the list to 10**9/700 = 1428571.
The same problem exists whenever a single object is "large", we allocate and forget many such objects in sequence, and they are kept alive by a cycle. This includes the case where the large object is not part of a cycle, but merely referenced from a cycle. For examples of "large" objects with potentially low lifetimes, maybe more natural than large lists, would include bz2 objects (17MB each) or Numpy arrays.
To fix it, the basic idea would be to have the "large" allocations count for more than "+1" in _PyObject_GC_Alloc(). Maybe they would also need to decrease the count by the same amount in PyObject_GC_Del(), though that may be less important. Still, I am unsure about how it could be implemented. Maybe a new C API is needed, which could then be used by a few built-in types (lists, bz2 objects, numpy arrays...) where the bulk of the memory allocation is not actually done by _PyObject_GC_Alloc() but by a separate call. I am thinking about something like PyMem_AddPressure(size), which would simply increase the count by a number based on 'size'.
|