This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pickle uses O(n) memory overhead
Type: performance Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, ebfe, eric.smith, martin.panter, pitrou, prinsherbert, serhiy.storchaka, skrah
Priority: normal Keywords:

Created on 2015-10-23 09:37 by prinsherbert, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (11)
msg253371 - (view) Author: Herbert (prinsherbert) Date: 2015-10-23 09:37
I very often want to use pickle to store huge objects, such that I do not need to recalculate them again.

However, I noticed that pickle uses O(n) (for n the object size in memory) amount of memory. That is, using python 3:

    data = {'%06d' % i: i for i in range(30 * 1000 ** 2)}
    # data consumes a lot of my 8GB ram
    import pickle
    with open('dict-database.p3', 'wb') as f: pickle.dump(data, f)
    # I have to kill the process, in order to not overflow in memory. If I don't, the OS crashes. IMHO the OS should never crash due to python.

I don't think pickle should require a O(n) memory overhead.
msg253374 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-10-23 10:12
That is because a pickler keeps track of all pickled objects. This is needed to preserve identity and support recursive objects.

You can disable memoizing by setting the "fast" attribute of the Pickler object.

def fastdump(obj, file):
    p = pickle.Pickler(file)
    p.fast = True
    p.dump(obj)

But you can't pickle recursive objects in the "fast" mode.
msg253378 - (view) Author: Herbert (prinsherbert) Date: 2015-10-23 11:22
That sound reasonable regarding why O(n), but it does not explain why linux crashes (I've seen this on two ubuntu systems)if pickle runs out of memory.
msg253399 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-10-24 08:25
In what way does the OS crash? Are there any kernel messages? Or is this the python executable crashing? Again, if so, what messages are printed?

In any event, if this really is an OS crash, then it's a Linux bug and should be reported to them.
msg253469 - (view) Author: Herbert (prinsherbert) Date: 2015-10-26 12:18
Hi Eric,

I would assume that for the right range-parameter (in my case 30 * 1000 ** 2), which just fits in memory, your system would also crash after a pickle.dump. That is, I had this behavior on two of my machine both running a Ubuntu setup though.

Nevertheless, if you give me some time I'm happy to check my dmesg and any log you wish. I find it strange that sometimes I get a MemoryError when I run out of memory (in particular when using numpy), and sometimes the system crashes (in particular when using other python-stuff). Therefore I don't think this is pickle-specific, or even if this is a bug instead of a 'feature'.
msg253516 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-27 04:28
Perhaps by OS crash you mean either the Linux out-of-memory (OOM) killer, that takes a hueristic stab at killing the right process, or Linux running almost out of memory, and everything grinding to a halt presumably because each task switch needs to re-read its program off the hard disk.

If either is the case, I understand this is part of Linux’s design, called “memory overcommit” or something. It is possible to disable it, though I haven’t tried myself, and many programs (probably including Python) are apparently not compatible.
msg254142 - (view) Author: Lukas Lueg (ebfe) Date: 2015-11-05 21:10
I very strongly doubt that it actually crashes your kernel - it basically can't. Your desktop becomes unresponsive for up to several minutes as the kernel has paged out about every single bit of memory to disk, raising access times by several orders of magnitude. Disable your swap and try again, it will just die.
msg254381 - (view) Author: Herbert (prinsherbert) Date: 2015-11-09 11:56
It may be fair to note that I have no swap installed on one of the machines, just 16GB of RAM, on which the 'crash' happens. Hence I'm not sure how this affects paging, I would think there is no paging if there is no swap.

I can verify that the machine is 'stuck' for more than just several minutes (at least 30 minutes), nevertheless cannot confirm if this is due to the desktop environment or actually the kernel. I would agree to verify this first when I have access to the specific machines again.

Thank you for your input so far!
msg254387 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-11-09 13:16
It's a Linux issue. Disable overcommitting of memory (at your own
peril) or set user limits (for example with djb's softlimit), then
the process will be killed instead of freezing the machine.
msg254390 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 14:37
There is a workaround for memory consumption, and Linux freezing is not Python issue.
msg254424 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-10 00:33
FWIW my usual workaround is to enable Linux’s SysRq handler, then press Ctrl+Alt+(SysRq, F) to manually invoke the OOM killer. It beats waiting between 30 and infinity minutes for it to manually kick in :)
History
Date User Action Args
2022-04-11 14:58:23adminsetgithub: 69651
2015-11-10 00:33:14martin.pantersetmessages: + msg254424
2015-11-09 14:37:24serhiy.storchakasetstatus: open -> closed
resolution: wont fix
messages: + msg254390

stage: resolved
2015-11-09 13:16:51skrahsetnosy: + skrah
messages: + msg254387
2015-11-09 11:56:49prinsherbertsetmessages: + msg254381
2015-11-05 21:10:24ebfesetnosy: + ebfe
messages: + msg254142
2015-10-27 04:28:35martin.pantersetnosy: + martin.panter
messages: + msg253516
2015-10-26 12:18:05prinsherbertsetmessages: + msg253469
2015-10-24 08:25:07eric.smithsetnosy: + eric.smith
messages: + msg253399
2015-10-23 11:22:42prinsherbertsetmessages: + msg253378
2015-10-23 10:12:14serhiy.storchakasetnosy: + alexandre.vassalotti, serhiy.storchaka, pitrou
messages: + msg253374
2015-10-23 09:38:47prinsherbertsettype: performance
versions: + Python 3.4
2015-10-23 09:37:23prinsherbertcreate