msg95482 - (view) |
Author: Ryan Shaw (rybesh) |
Date: 2009-11-19 15:50 |
def save_object(r, key, m):
r.set(key, cPickle.dumps(m))
[4] >>> save_object(r, 'cluster', cluster)
python: ./Modules/cStringIO.c:419: O_cwrite: Assertion `oself->pos + l <
2147483647' failed.
Aborted
Linux 2.6.30.9-96.fc11.x86_64 #1 SMP x86_64 GNU/Linux
|
msg95484 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2009-11-19 16:05 |
What are the types and values of cluster and r?
Can you reproduce this in a self-contained script?
|
msg95502 - (view) |
Author: Ryan Shaw (rybesh) |
Date: 2009-11-19 17:43 |
r is the Redis python client. cluster is a large cluster tree along the
lines of the cluster_node class found here:
http://jesolem.blogspot.com/2009/04/hierarchical-clustering-in-python.html
|
msg95504 - (view) |
Author: Ryan Shaw (rybesh) |
Date: 2009-11-19 17:47 |
I can't reproduce this in a self-contained script. Pickling a smaller
cluster object and storing it in Redis works fine. The cluster object that
caused the crash was large, a binary tree with >5000 leaves holding the
results of a nine-hour calculation.
|
msg127633 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2011-01-31 18:01 |
Ryan,
Do you still have the setup that can reproduce this error? If so, can you try running it with pickle rather than cPickle? If it works with pickle, please see if you can reproduce the error by unpickling the result of pickle and repickling it with cPickle. This may allow you to create a self-contained script that you can post together with a pickle file.
|
msg127637 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-01-31 18:24 |
The issue looks quite clear: cStringIO.write() asserts that the required storage size is less than INT_MAX. Therefore, in all likelihood, the pickle dump is simply larger than 2GB.
Now, the cStringIO structures seem 64-bit safe, so the comparison to INT_MAX seems useless. Also, in non-debug builds, the assert is skipped but the Py_ssize_t values are first cast to int before being saved into a Py_ssize_t struct member! Which would lead to data corruption and/or crashes.
The INT_MAX asserts were added in r42382 which also made cStringIO internals 64-bit safe. Martin, do you remember why you did this?
The asserts don't even protect against the following crash:
$ ./python -c "from cStringIO import StringIO; b=b'x'*(2**31+1); s=StringIO(); s.write(b)"
Erreur de segmentation
gdb shows the following stack excerpt:
#0 0x00007ffff724e8c2 in memcpy () from /lib64/libc.so.6
#1 0x00007ffff6b47a46 in O_cwrite (self=<unknown at remote 0xa3f790>, c=0x7fff76b44054 'x' <repeats 200 times>...,
l=-2147483647) at /home/antoine/cpython/27/Modules/cStringIO.c:415
Now, onto the problem of reproducing, here's another interesting thing: while some internal structures of cStringIO are 64-bit safe, the Python-facing API isn't:
>>> from cStringIO import StringIO
>>> b = b"x" * (2**32+1)
>>> s = StringIO()
>>> s.write(b)
>>> s.tell()
1
(the module doesn't use Py_SSIZE_T_CLEAN)
|
msg170177 - (view) |
Author: Gilles Louppe (Gilles.Louppe) |
Date: 2012-09-10 12:01 |
Hi,
Any solution regarding that issue? We are currently encountering the exact same bug when pickling too large objects.
Best,
Gilles
|
msg170243 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-10 22:48 |
Gilles: it's really puzzling that you run into the issue. This is an assert, so in a regular build of Python, it should never trigger. Instead, it should trigger only in a debug build. Why are you running a debug build?
|
msg170246 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-10 23:19 |
Antoine: if you look at
http://svn.python.org/view/python/branches/ssize_t/Modules/cStringIO.c?revision=41731&view=markup&pathrev=42382
you'll notice that IOobject and Iobject have pos and stringsize as int, whereas Oobject has it as Py_ssize_t. This must have been a bug; I think the structures were meant as compatible. If they all should have been defined in terms of int, the assert would make sense.
That said, what actually got merged included all the fields as Py_ssize_t, in which case the assert makes no sense. OTOH, it prevents the cast below (of (int)l) to produce bogus results, so it actually helped to detect the bugs :-)
What is now needed (from people running into the issue) is a patch that resolves all aspects of >2GB handling in these modules. I believe the issue is resolved in Python 3 (by both StringIO and BytesIO properly using Py_ssize_t throughout there), so unless somebody interested in this problem actually contributes a patch, the issue is unlikely to advance.
|
msg180845 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-28 11:11 |
Here is a patch.
|
msg181287 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-02-03 19:33 |
Suggested workaround: use io.BytesIO instead of cStringIO.
comments on the patch... in short: your patch is changing an ABI and isn't suitable for a maintenance release.
I'm not sure how much we can usefully do to cStringIO in a maintenance release. I'd be inclined to close as wontfix as io.BytesIO is available.
|
msg181289 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-02-03 19:42 |
Not making cStringIO 64-bit compatible is one thing. But we can still fix the crashes by detecting an overflow before it happens ;)
|
msg181290 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-02-03 19:49 |
> Suggested workaround: use io.BytesIO instead of cStringIO.
Well, maybe we can replace cStringIO by io.BytesIO in cPickle (and other places).
> comments on the patch... in short: your patch is changing an ABI and isn't suitable for a maintenance release.
Isn't this a private API?
> I'm not sure how much we can usefully do to cStringIO in a maintenance release. I'd be inclined to close as wontfix as io.BytesIO is available.
We can't just close this issue. cStringIO crashes Python. We can use another approach. Change cStringIO so that user can't write or read more than INT_MAX bytes at a time (an exception will be raised instead of crash), but can write more than INT_MAX bytes in sum and get all string with getvalue(). This will preserve an ABI.
|
msg181295 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-02-03 20:17 |
Include/cStringIO.h is public and the name of the structure "PycStringIO" lacks an _ which implies that it is public.
There appear to be a few references to it in external projects doing some searching. A (very old) version of twisted/protocols/_c_urlarg.c uses it for example.
Doing the latter and limiting writes to INT_MAX but allowing greater to accumulate makes sense.
|
msg181312 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-02-04 09:21 |
Thank you for review and enlightenment Gregory. Here is an updated patch which doesn't change an ABI.
|
msg181733 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-02-09 11:48 |
New changeset a025b04332fe by Serhiy Storchaka in branch '2.7':
Issue #7358: cStringIO.StringIO now supports writing to and reading from
http://hg.python.org/cpython/rev/a025b04332fe
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:54 | admin | set | github: 51607 |
2013-02-26 14:13:40 | jcea | set | nosy:
+ jcea
|
2013-02-09 15:04:06 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2013-02-09 11:48:37 | python-dev | set | nosy:
+ python-dev messages:
+ msg181733
|
2013-02-04 09:21:55 | serhiy.storchaka | set | files:
+ cStringIO64_2.patch
messages:
+ msg181312 |
2013-02-03 20:17:18 | gregory.p.smith | set | messages:
+ msg181295 |
2013-02-03 19:49:20 | serhiy.storchaka | set | messages:
+ msg181290 |
2013-02-03 19:42:06 | pitrou | set | messages:
+ msg181289 |
2013-02-03 19:33:52 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages:
+ msg181287
|
2013-01-28 11:11:34 | serhiy.storchaka | set | files:
+ cStringIO64.patch keywords:
+ patch messages:
+ msg180845
stage: needs patch -> patch review |
2013-01-28 09:06:45 | serhiy.storchaka | set | versions:
- Python 2.6 nosy:
+ serhiy.storchaka
assignee: serhiy.storchaka components:
+ Extension Modules, IO, - Library (Lib) stage: needs patch |
2013-01-28 09:04:30 | serhiy.storchaka | link | issue13555 dependencies |
2013-01-28 09:03:45 | serhiy.storchaka | link | issue17054 superseder |
2012-09-10 23:19:22 | loewis | set | messages:
+ msg170246 |
2012-09-10 22:48:34 | loewis | set | messages:
+ msg170243 |
2012-09-10 12:01:15 | Gilles.Louppe | set | nosy:
+ Gilles.Louppe messages:
+ msg170177
|
2011-01-31 18:24:57 | pitrou | set | nosy:
+ pitrou, loewis title: cPickle crash on failed assertion -> cStringIO not 64-bit safe messages:
+ msg127637
versions:
+ Python 2.7 |
2011-01-31 18:04:42 | vstinner | set | nosy:
+ vstinner
|
2011-01-31 18:01:47 | belopolsky | set | nosy:
+ belopolsky messages:
+ msg127633
|
2009-11-19 17:47:41 | rybesh | set | messages:
+ msg95504 |
2009-11-19 17:43:10 | rybesh | set | messages:
+ msg95502 |
2009-11-19 16:05:11 | eric.smith | set | nosy:
+ eric.smith messages:
+ msg95484
|
2009-11-19 15:50:21 | rybesh | create | |