classification
Title: cStringIO not 64-bit safe
Type: crash Stage: resolved
Components: Extension Modules, IO Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Gilles.Louppe, belopolsky, eric.smith, gregory.p.smith, jcea, loewis, pitrou, python-dev, rybesh, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2009-11-19 15:50 by rybesh, last changed 2013-02-26 14:13 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
cStringIO64.patch serhiy.storchaka, 2013-01-28 11:11 review
cStringIO64_2.patch serhiy.storchaka, 2013-02-04 09:21 review
Messages (16)
msg95482 - (view) Author: Ryan Shaw (rybesh) Date: 2009-11-19 15:50
def save_object(r, key, m):
    r.set(key, cPickle.dumps(m))

[4] >>> save_object(r, 'cluster', cluster)
python: ./Modules/cStringIO.c:419: O_cwrite: Assertion `oself->pos + l < 
2147483647' failed.
Aborted

Linux 2.6.30.9-96.fc11.x86_64 #1 SMP x86_64 GNU/Linux
msg95484 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-19 16:05
What are the types and values of cluster and r?

Can you reproduce this in a self-contained script?
msg95502 - (view) Author: Ryan Shaw (rybesh) Date: 2009-11-19 17:43
r is the Redis python client. cluster is a large cluster tree along the 
lines of the cluster_node class found here: 
http://jesolem.blogspot.com/2009/04/hierarchical-clustering-in-python.html
msg95504 - (view) Author: Ryan Shaw (rybesh) Date: 2009-11-19 17:47
I can't reproduce this in a self-contained script. Pickling a smaller 
cluster object and storing it in Redis works fine. The cluster object that 
caused the crash was large, a binary tree with >5000 leaves holding the 
results of a nine-hour calculation.
msg127633 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-01-31 18:01
Ryan,

Do you still have the setup that can reproduce this error?  If so, can you try running it with pickle rather than cPickle?  If it works with pickle, please see if you can reproduce the error by unpickling the result of pickle and repickling it with cPickle.  This may allow you to create a self-contained script that you can post together with a pickle file.
msg127637 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-31 18:24
The issue looks quite clear: cStringIO.write() asserts that the required storage size is less than INT_MAX. Therefore, in all likelihood, the pickle dump is simply larger than 2GB.

Now, the cStringIO structures seem 64-bit safe, so the comparison to INT_MAX seems useless. Also, in non-debug builds, the assert is skipped but the Py_ssize_t values are first cast to int before being saved into a Py_ssize_t struct member! Which would lead to data corruption and/or crashes.

The INT_MAX asserts were added in r42382 which also made cStringIO internals 64-bit safe. Martin, do you remember why you did this?

The asserts don't even protect against the following crash:

$ ./python -c "from cStringIO import StringIO; b=b'x'*(2**31+1); s=StringIO(); s.write(b)"
Erreur de segmentation

gdb shows the following stack excerpt:

#0  0x00007ffff724e8c2 in memcpy () from /lib64/libc.so.6
#1  0x00007ffff6b47a46 in O_cwrite (self=<unknown at remote 0xa3f790>, c=0x7fff76b44054 'x' <repeats 200 times>..., 
    l=-2147483647) at /home/antoine/cpython/27/Modules/cStringIO.c:415


Now, onto the problem of reproducing, here's another interesting thing: while some internal structures of cStringIO are 64-bit safe, the Python-facing API isn't:

>>> from cStringIO import StringIO
>>> b = b"x" * (2**32+1)
>>> s = StringIO()
>>> s.write(b)
>>> s.tell()
1

(the module doesn't use Py_SSIZE_T_CLEAN)
msg170177 - (view) Author: Gilles Louppe (Gilles.Louppe) Date: 2012-09-10 12:01
Hi,

Any solution regarding that issue? We are currently encountering the exact same bug when pickling too large objects. 

Best,

Gilles
msg170243 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-10 22:48
Gilles: it's really puzzling that you run into the issue. This is an assert, so in a regular build of Python, it should never trigger. Instead, it should trigger only in a debug build. Why are you running a debug build?
msg170246 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-10 23:19
Antoine: if you look at

http://svn.python.org/view/python/branches/ssize_t/Modules/cStringIO.c?revision=41731&view=markup&pathrev=42382

you'll notice that IOobject and Iobject have pos and stringsize as int, whereas Oobject has it as Py_ssize_t. This must have been a bug; I think the structures were meant as compatible. If they all should have been defined in terms of int, the assert would make sense.

That said, what actually got merged included all the fields as Py_ssize_t, in which case the assert makes no sense. OTOH, it prevents the cast below (of (int)l) to produce bogus results, so it actually helped to detect the bugs :-)

What is now needed (from people running into the issue) is a patch that resolves all aspects of >2GB handling in these modules. I believe the issue is resolved in Python 3 (by both StringIO and BytesIO properly using Py_ssize_t throughout there), so unless somebody interested in this problem actually contributes a patch, the issue is unlikely to advance.
msg180845 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-28 11:11
Here is a patch.
msg181287 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-02-03 19:33
Suggested workaround: use io.BytesIO instead of cStringIO.

comments on the patch... in short: your patch is changing an ABI and isn't suitable for a maintenance release.

I'm not sure how much we can usefully do to cStringIO in a maintenance release.  I'd be inclined to close as wontfix as io.BytesIO is available.
msg181289 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-03 19:42
Not making cStringIO 64-bit compatible is one thing. But we can still fix the crashes by detecting an overflow before it happens ;)
msg181290 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-03 19:49
> Suggested workaround: use io.BytesIO instead of cStringIO.

Well, maybe we can replace cStringIO by io.BytesIO in cPickle (and other places).

> comments on the patch... in short: your patch is changing an ABI and isn't suitable for a maintenance release.

Isn't this a private API?

> I'm not sure how much we can usefully do to cStringIO in a maintenance release.  I'd be inclined to close as wontfix as io.BytesIO is available.

We can't just close this issue. cStringIO crashes Python. We can use another approach. Change cStringIO so that user can't write or read more than INT_MAX bytes at a time (an exception will be raised instead of crash), but can write more than INT_MAX bytes in sum and get all string with getvalue(). This will preserve an ABI.
msg181295 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-02-03 20:17
Include/cStringIO.h is public and the name of the structure "PycStringIO" lacks an _ which implies that it is public.

There appear to be a few references to it in external projects doing some searching.  A (very old) version of twisted/protocols/_c_urlarg.c uses it for example.

Doing the latter and limiting writes to INT_MAX but allowing greater to accumulate makes sense.
msg181312 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-04 09:21
Thank you for review and enlightenment Gregory. Here is an updated patch which doesn't change an ABI.
msg181733 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-09 11:48
New changeset a025b04332fe by Serhiy Storchaka in branch '2.7':
Issue #7358: cStringIO.StringIO now supports writing to and reading from
http://hg.python.org/cpython/rev/a025b04332fe
History
Date User Action Args
2013-02-26 14:13:40jceasetnosy: + jcea
2013-02-09 15:04:06serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-02-09 11:48:37python-devsetnosy: + python-dev
messages: + msg181733
2013-02-04 09:21:55serhiy.storchakasetfiles: + cStringIO64_2.patch

messages: + msg181312
2013-02-03 20:17:18gregory.p.smithsetmessages: + msg181295
2013-02-03 19:49:20serhiy.storchakasetmessages: + msg181290
2013-02-03 19:42:06pitrousetmessages: + msg181289
2013-02-03 19:33:52gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg181287
2013-01-28 11:11:34serhiy.storchakasetfiles: + cStringIO64.patch
keywords: + patch
messages: + msg180845

stage: needs patch -> patch review
2013-01-28 09:06:45serhiy.storchakasetversions: - Python 2.6
nosy: + serhiy.storchaka

assignee: serhiy.storchaka
components: + Extension Modules, IO, - Library (Lib)
stage: needs patch
2013-01-28 09:04:30serhiy.storchakalinkissue13555 dependencies
2013-01-28 09:03:45serhiy.storchakalinkissue17054 superseder
2012-09-10 23:19:22loewissetmessages: + msg170246
2012-09-10 22:48:34loewissetmessages: + msg170243
2012-09-10 12:01:15Gilles.Louppesetnosy: + Gilles.Louppe
messages: + msg170177
2011-01-31 18:24:57pitrousetnosy: + pitrou, loewis
title: cPickle crash on failed assertion -> cStringIO not 64-bit safe
messages: + msg127637

versions: + Python 2.7
2011-01-31 18:04:42vstinnersetnosy: + vstinner
2011-01-31 18:01:47belopolskysetnosy: + belopolsky
messages: + msg127633
2009-11-19 17:47:41rybeshsetmessages: + msg95504
2009-11-19 17:43:10rybeshsetmessages: + msg95502
2009-11-19 16:05:11eric.smithsetnosy: + eric.smith
messages: + msg95484
2009-11-19 15:50:21rybeshcreate