msg149027 - (view) |
Author: Philipp Lies (phillies) |
Date: 2011-12-08 12:52 |
When I try to load a large file (>1GB) cPickle crashes with a MemoryError:
$python test.py
Traceback (most recent call last):
File "/tmp/test.py", line 8, in <module>
A2 = cPickle.load(f2)
MemoryError
test.py contains following code:
import numpy as np
import cPickle
A = np.random.randn(196,240000)
f = open('test.pydat', 'w')
cPickle.dump(A,f)
f.close()
f2 = open('test.pydat', 'rb')
A2 = cPickle.load(f2)
System:
cPickle 1.71
python 2.7.2
Ubuntu 11.10 amd64
Memory is not an issue as a) pickle works nicely and b) my computer has 122GB free RAM
|
msg149028 - (view) |
Author: Ramchandra Apte (Ramchandra Apte) * |
Date: 2011-12-08 12:56 |
Maybe Ubuntu doesn't think it is safe to allocate the memory.
|
msg149029 - (view) |
Author: Philipp Lies (phillies) |
Date: 2011-12-08 13:02 |
Well, replace cPickle by pickle and it works. So if there is a memory allocation problem cPickle should be able to handle it, especially since it should be completely compatible to pickle.
|
msg149034 - (view) |
Author: Ramchandra Apte (Ramchandra Apte) * |
Date: 2011-12-08 13:38 |
Have you checked the system monitor after all cPickle can use more memory than 1GB.
|
msg149035 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-12-08 13:40 |
Is there a way to reproduce that doesn't involve numpy?
|
msg149049 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-08 18:57 |
> my computer has 122GB free RAM
122, or 1.22?
> Well, replace cPickle by pickle and it works.
cPickle makes some direct call to malloc()/realloc()/free(), contrarily to pickle which uses pymalloc. This could lead to heap fragmentation.
What does the following return ?
$ /usr/bin/time -v python test.py
|
msg149300 - (view) |
Author: Philipp Lies (phillies) |
Date: 2011-12-12 13:39 |
a) it's 122GB free RAM (out of 128GB total RAM)
b) when I convert the numpy array to a list it works. So seems to be a problem with cPickle and numpy at/from a certain array size
c) $ /usr/bin/time -v python test_np.py
Traceback (most recent call last):
File "test_np.py", line 12, in <module>
A2 = cPickle.load(f2)
MemoryError
Command exited with non-zero status 1
Command being timed: "python test_np.py"
User time (seconds): 73.72
System time (seconds): 4.56
Percent of CPU this job got: 87%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:29.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 7402448
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 726827
Voluntary context switches: 41043
Involuntary context switches: 7793
Swaps: 0
File system inputs: 3368
File system outputs: 2180744
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
hth
|
msg149319 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-12 16:00 |
@Antoine
Couldn't this be linked to #11564 (pickle not 64-bit ready)? AFAICT this wasn't fixed in 2.7. Basically, an integer overflow, and malloc() would bail out when asked a ridiculous size.
@Philipp
I'd be curious to see the last lines of
$ ltrace -e malloc python test_np.py
you'll probably see a call to malloc() return NULL.
|
msg149326 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-12-12 16:20 |
> Couldn't this be linked to #11564 (pickle not 64-bit ready)?
Well, I don't know anything about numpy, but:
>>> 196 * 240000
47040000
>>> 196 * 240000 * 8 # assuming 8 bytes per float
376320000
>>> 2**31
2147483648
So it seems unlikely to be the explanation. After all the report says ">1GB" for the file size, which is still comfortably in the capabilities of a 32-bit module.
Philipp, perhaps you could try to run Python under gdb and try to diagnose where the MemoryError occurs?
Also, posting the disassembly (using pickletools.dis()) of a very small equivalent pickle (e.g. of randn(2,3)) could help us know what is involved.
|
msg149327 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-12-12 16:35 |
Oh, what's more interesting is that it works here (Python 2.7.1 and numpy 1.6.1 under Mageia with 8GB RAM). Looking at a pickle disassembly, the only remarkable thing is the presence of a long binary string (the raw serialization of all IEEE floats), which shouldn't give any problem to cPickle (and indeed doesn't). What is your numpy version, Philipp?
|
msg149673 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-17 16:21 |
> So it seems unlikely to be the explanation.
Victor reproduced in on IRC, and it's indeed an overflow.
The problematic code is in readline_file:
"""
bigger = self->buf_size << 1;
if (bigger <= 0) { /* overflow */
PyErr_NoMemory();
return -1;
}
newbuf = (char *)realloc(self->buf, bigger);
if (!newbuf) {
PyErr_NoMemory();
return -1;
}
"""
self->buf_size is an int, which overflow pretty easily.
>>> 196 * 240000
47040000
>>> 196 * 240000 * 8 # assuming 8 bytes per float
376320000
>>> 2**31
2147483648
Hmmm... A byte is 8 bit, which gives:
>>> 196 * 240000 * 8 * 8
3010560000L
>>> 196 * 240000 * 8 * 8 > 2**31
True
Now, if it works on your box, it's probably due to the compiler optimizing the check away. Since `bigger` is cast to an unsigned 64-bit (size_t) when calling malloc(), it happens to work.
Maybe your distro doesn't build python with -fwrapv.
So, what do you suggest? Should we fix this (Py_ssize_t, overflow check before computation), as in #11564?
|
msg149674 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-12-17 16:24 |
> Should we fix this (Py_ssize_t, overflow check before computation), as in #11564?
Yes. Use Py_ssize_t type for the buf_size attribute, and replace "bigger <= 0" (test if an overflow occurred) by "self->buf_size > (PY_SSIZE_T_MAX >> 1)".
|
msg149676 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-12-17 16:35 |
Ah, I see. It's a bit of a pity not to be able to load files > 1GB, especially on a 64-bit build (!). Perhaps cPickle could be made partly 64-bit compatible? Or at least, indeed, do a proper anti-overflow check.
|
msg149711 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-17 22:15 |
Here's a patch which should fix this. However, I'm unable to test it.
|
msg149713 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-12-17 22:25 |
I think there's a problem here:
+ self->data = realloc(self->data, self->size * sizeof(PyObject *));
+ if (self->data == NULL)
goto nomemory;
If realloc() fails, the old data pointer is lost and therefore will never get free()ed.
Same for:
+ self->buf = (char *)realloc(self->buf, self->buf_size);
Here:
- int *marks;
- s=self->marks_size+20;
- if (s <= self->num_marks) s=self->num_marks + 1;
+ size_t alloc;
+ Py_ssize_t *marks;
+
+ /* Use the size_t type to check for overflow. */
+ alloc = ((size_t)self->num_marks << 1) + 20;
It seems you are changing the overallocation algorithm (from additive to multiplicative). I'm not sure it should be in the scope of the patch, although multiplicative overallocation is generally better.
|
msg149715 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-17 22:46 |
New version.
|
msg149757 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-12-18 14:21 |
> New version.
Looks good to me.
|
msg150117 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-22 22:40 |
Here's a new version with a test (untested).
Note that I'm absolutely not sure that the 'memsize' argument to
bigmemtest is correct.
|
msg150762 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2012-01-06 19:16 |
Antoine, could you test the last version (test_pickle and if possible
with the OP testcase)?
I can't test it myself (32-bit machine with 1 GB).
|
msg150765 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-01-06 19:33 |
Le vendredi 06 janvier 2012 à 19:17 +0000, Charles-François Natali a
écrit :
> Charles-François Natali <neologix@free.fr> added the comment:
>
> Antoine, could you test the last version (test_pickle and if possible
> with the OP testcase)?
> I can't test it myself (32-bit machine with 1 GB).
Well, first, there are compilation warnings on a 64-bit box:
gcc -pthread -fPIC -fno-strict-aliasing -g -O2 -g -O0 -Wall -Wstrict-prototypes -I. -IInclude -I./Include -I/usr/local/include -I/home/antoine/cpython/27/Include -I/home/antoine/cpython/27 -c /home/antoine/cpython/27/Modules/cPickle.c -o build/temp.linux-x86_64-2.7-pydebug/home/antoine/cpython/27/Modules/cPickle.o
/home/antoine/cpython/27/Modules/cPickle.c: In function ‘put2’:
/home/antoine/cpython/27/Modules/cPickle.c:810:9: attention : format ‘%d’ expects type ‘int’, but argument 4 has type ‘Py_ssize_t’
/home/antoine/cpython/27/Modules/cPickle.c: In function ‘load_mark’:
/home/antoine/cpython/27/Modules/cPickle.c:4610:21: attention : assignment from incompatible pointer type
gcc -pthread -shared build/temp.linux-x86_64-2.7-pydebug/home/antoine/cpython/27/Modules/cPickle.o -L/usr/local/lib -o build/lib.linux-x86_64-2.7-pydebug/cPickle.so
Second, I can't seem to get the test to work with 8GB RAM (approximately
6.5GB free according to "free"). The MemoryError is quite expectable for
test_pickle, though, since the code there doesn't try to conserve memory
at all:
test test_pickle failed -- Traceback (most recent call last):
File "/home/antoine/cpython/27/Lib/test/test_support.py", line 983, in wrapper
return f(self, maxsize)
File "/home/antoine/cpython/27/Lib/test/pickletester.py", line 1298, in test_huge_str_32b
pickled = self.dumps(data, proto)
File "/home/antoine/cpython/27/Lib/test/test_pickle.py", line 74, in dumps
return pickle.dumps(arg, proto)
File "/home/antoine/cpython/27/Lib/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/home/antoine/cpython/27/Lib/pickle.py", line 224, in dump
self.save(obj)
File "/home/antoine/cpython/27/Lib/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/antoine/cpython/27/Lib/pickle.py", line 488, in save_string
self.write(STRING + repr(obj) + '\n')
MemoryError
I would therefore suggest to enable the test only for cPickle.
For test_cpickle the behaviour is different:
- for protocol 0, I get a MemoryError (which may be expected, if the
test truly needs more than 6.5GB, for example because of suboptimal
buffer management)
- for protocol 1 and 2, I get "SystemError: error return without
exception set"
|
msg178042 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2012-12-24 09:14 |
Could someone with a 64-bit box take over this one?
I won't go anywhere with my 32-bit box...
|
msg178559 - (view) |
Author: Ramchandra Apte (Ramchandra Apte) * |
Date: 2012-12-30 08:24 |
Bump.
@neologix
I have a 64-bit laptop with 2 GB memory so I don't think I can do so. (though one could use swap)
|
msg178564 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-12-30 10:42 |
I may tackle this but rare 2.7-only bugs are pretty low on my priorities list.
|
msg178566 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2012-12-30 11:29 |
> I have a 64-bit laptop with 2 GB memory so I don't think I can do so. (though one could use swap)
AFAICT, a binary string a little longer than 1GB should be enough to
reproduce the bug.
Just make sure Python isn't built with '-fwrapv'.
|
msg180763 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-27 12:32 |
Charles-François, are you going to finish this before 2.7.4 RC released? The patch should be updated to address Antoine's comments.
|
msg180777 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-01-27 17:41 |
> Charles-François, are you going to finish this before 2.7.4 RC released? The patch should be updated to address Antoine's comments.
No.
I don't have access to a 64-bit box, which makes it difficult to write
(I don't get compiler warnings) and test.
Feel free to pick up the patch and update it!
|
msg180792 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-27 20:49 |
I have picked up it.
Actually BINSTRING format supports only strings less 2GiB, so test should be changed. But there are other bugs which prohibit pickling a lot of data. For one I open issue17054.
|
msg181750 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-02-09 18:21 |
Here is an updated patch. Fixed two bugs found by Antoine (an inappropriate format and a memory error in bigmemtest), fixed resizing of marks array and one possible integer overflow in write_other(). Used workaround to bypass limitations of cStringIO API.
|
msg181932 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-02-11 21:09 |
Test updated too. Now it doesn't try to write a string larger than 2 GiB (it's impossible), instead writes a lot of shorter strings with total size larger than 2 GiB.
|
msg181973 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-02-12 19:37 |
New changeset 680959a3ae2e by Serhiy Storchaka in branch '2.7':
Issue #13555: cPickle now supports files larger than 2 GiB.
http://hg.python.org/cpython/rev/680959a3ae2e
|
msg182979 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *  |
Date: 2013-02-25 19:57 |
680959a3ae2e has caused that g-ir-scanner tool from gobject-introspection (which seems to create some pickles and next load them?) fails with MemoryError, e.g. during building of GTK+.
MemoryError occurs only for a subset of pickles.
I attach a small pickle, which allows to reproduce this problem.
Download this pickle and save it in /tmp.
Additional preparation:
$ cd /tmp
$ wget -q http://ftp.gnome.org/pub/gnome/sources/gobject-introspection/1.34/gobject-introspection-1.34.2.tar.xz
$ tar -xJf gobject-introspection-1.34.2.tar.xz
$ sed -e '/^with LibtoolImporter/,/^$/d' -i gobject-introspection-1.34.2/giscanner/xmlwriter.py
Behavior before 680959a3ae2e:
$ PYTHONPATH="gobject-introspection-1.34.2" python2.7 -c 'import cPickle; print(cPickle.load(open("pickle")))'
<giscanner.girparser.GIRParser object at 0x7f506299be90>
Behavior after 680959a3ae2e:
$ PYTHONPATH="gobject-introspection-1.34.2" python2.7 -c 'import cPickle; print(cPickle.load(open("pickle")))'
Traceback (most recent call last):
File "<string>", line 1, in <module>
MemoryError
|
msg183025 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-02-26 08:08 |
New changeset f3f23ecdb1c6 by Serhiy Storchaka in branch '2.7':
Issue #13555: Fix an integer overflow check.
http://hg.python.org/cpython/rev/f3f23ecdb1c6
|
msg183026 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-02-26 08:18 |
Thank you for the report. Standard tests do not cover pickling/unpickling to real files.
|
msg183030 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-02-26 08:42 |
I have opened issue17299 for testing issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:24 | admin | set | github: 57764 |
2013-02-26 14:10:33 | jcea | set | nosy:
+ jcea
|
2013-02-26 08:42:24 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg183030
stage: resolved |
2013-02-26 08:18:14 | serhiy.storchaka | set | messages:
+ msg183026 |
2013-02-26 08:08:41 | python-dev | set | messages:
+ msg183025 |
2013-02-25 19:57:14 | Arfrever | set | status: closed -> open files:
+ pickle
nosy:
+ benjamin.peterson stage: resolved -> (no value) messages:
+ msg182979 resolution: fixed -> (no value) priority: normal -> release blocker |
2013-02-15 08:26:40 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2013-02-12 19:37:47 | python-dev | set | nosy:
+ python-dev messages:
+ msg181973
|
2013-02-11 21:09:50 | serhiy.storchaka | set | files:
+ pickle_overflow-4.diff
messages:
+ msg181932 |
2013-02-11 21:07:19 | serhiy.storchaka | set | files:
- pickle_overflow-3.diff |
2013-02-09 18:21:40 | serhiy.storchaka | set | files:
+ pickle_overflow-3.diff assignee: serhiy.storchaka messages:
+ msg181750
stage: needs patch -> patch review |
2013-01-28 09:04:30 | serhiy.storchaka | set | dependencies:
+ cStringIO not 64-bit safe, - cStringIO.StringIO aborted when more then INT_MAX bytes written |
2013-01-27 20:49:24 | serhiy.storchaka | set | dependencies:
+ cStringIO.StringIO aborted when more then INT_MAX bytes written messages:
+ msg180792 |
2013-01-27 17:41:53 | neologix | set | messages:
+ msg180777 |
2013-01-27 12:32:38 | serhiy.storchaka | set | messages:
+ msg180763 |
2012-12-30 11:29:16 | neologix | set | messages:
+ msg178566 |
2012-12-30 10:42:36 | pitrou | set | messages:
+ msg178564 |
2012-12-30 08:24:03 | Ramchandra Apte | set | messages:
+ msg178559 |
2012-12-24 09:14:23 | neologix | set | messages:
+ msg178042 |
2012-12-02 10:23:39 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
stage: needs patch |
2012-07-30 18:21:26 | Arfrever | set | nosy:
+ Arfrever
|
2012-01-06 19:33:26 | pitrou | set | messages:
+ msg150765 |
2012-01-06 19:16:59 | neologix | set | messages:
+ msg150762 |
2011-12-22 22:40:59 | neologix | set | files:
+ pickle_overflow-2.diff
messages:
+ msg150117 |
2011-12-18 14:21:10 | pitrou | set | messages:
+ msg149757 |
2011-12-18 11:13:33 | flox | set | nosy:
+ flox
|
2011-12-17 22:46:05 | neologix | set | files:
+ pickle_overflow-1.diff
messages:
+ msg149715 |
2011-12-17 22:25:28 | pitrou | set | messages:
+ msg149713 |
2011-12-17 22:15:35 | neologix | set | files:
+ pickle_overflow.diff keywords:
+ patch messages:
+ msg149711
|
2011-12-17 16:35:30 | pitrou | set | messages:
+ msg149676 |
2011-12-17 16:24:08 | vstinner | set | nosy:
+ vstinner messages:
+ msg149674
|
2011-12-17 16:21:29 | neologix | set | messages:
+ msg149673 |
2011-12-12 16:35:35 | pitrou | set | messages:
+ msg149327 |
2011-12-12 16:20:22 | pitrou | set | messages:
+ msg149326 |
2011-12-12 16:03:24 | neologix | set | messages:
- msg149321 |
2011-12-12 16:02:21 | neologix | set | messages:
+ msg149321 |
2011-12-12 16:00:33 | neologix | set | messages:
+ msg149319 |
2011-12-12 13:39:15 | phillies | set | messages:
+ msg149300 |
2011-12-08 18:57:44 | neologix | set | nosy:
+ neologix messages:
+ msg149049
|
2011-12-08 13:40:36 | pitrou | set | nosy:
+ mark.dickinson
|
2011-12-08 13:40:28 | pitrou | set | nosy:
+ pitrou messages:
+ msg149035 components:
+ Extension Modules, - None
|
2011-12-08 13:38:18 | Ramchandra Apte | set | messages:
+ msg149034 |
2011-12-08 13:02:08 | phillies | set | messages:
+ msg149029 |
2011-12-08 12:56:55 | Ramchandra Apte | set | nosy:
+ Ramchandra Apte messages:
+ msg149028
|
2011-12-08 12:52:02 | phillies | create | |