classification
Title: Fix error handler of _PyString_Resize() on allocation failure
Type: crash Stage: patch review
Components: Extension Modules Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, eric.snow, haypo, kristjan.jonsson, python-dev, qualab, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2014-01-29 08:36 by qualab, last changed 2014-11-17 11:55 by kristjan.jonsson. This issue is now closed.

Files
File name Uploaded Description Edit
issue20434.diff asvetlov, 2014-01-29 10:21
string_resize.patch kristjan.jonsson, 2014-04-10 16:43 review
string_resize.patch kristjan.jonsson, 2014-04-17 16:58 review
Messages (25)
msg209624 - (view) Author: Vladimir Kerimov (qualab) Date: 2014-01-29 08:36
In the file _io\fileio.c in function 

static PyObject *
fileio_readall(fileio *self)

this code is incorrect and crashes the process of Python:

 if (_PyBytes_Resize(&result, newsize) < 0) {
                if (total == 0) {
                    Py_DECREF(result);
                    return NULL;
                }
                PyErr_Clear();
                break;
            }

In the call of _PyBytes_Resize there the result variable passed by reference and changes value to NULL-pointer when function fails and return < 0. So on the line Py_DECREF(result); the Python process crashes.
msg209631 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2014-01-29 10:21
Bug can be reproduced if _PyBytes_Resize fails with out-of-memory error than NULL object decrefed.
Buggy modules are _io, binascii and zlib.

3.4 hasn't the problem.
Patch for 3.3 is attached.
Fix goes to mimic 3.4 -- (replace Py_DECREF with Py_CLEAR), while for 
_PyBytes_Resize that does not make sense, object has been set to NULL inside _PyBytes_Resize on fail.

Also 2.7 has the same issue for _PyString_Resize calls.
msg209632 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-29 10:27
Ah yes, the bytes object is set to NULL. In fact, you don't need to call Py_CLEAR(&retval) in case of error, because retval is already NULL. Could you please update your patch to just do nothing on retval in case of error please?

> 3.4 hasn't the problem.

Yes, I played some weeks with pyfailmalloc, a new tool for Python 3.4 to inject random MemoryError errors. I fixed many other different issues.
msg209642 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-29 14:03
There are other places where Py_DECREF is called after failed _PyBytes_Resize(). For example in PyBytes_FromObject():

            if (_PyBytes_Resize(&new, size) < 0)
                goto error;
...
  error:
    /* Error handling when new != NULL */
    Py_XDECREF(it);
    Py_DECREF(new);
    return NULL;
msg209777 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-01-31 12:18
These are very unusual semantics.
The convention in the python api is that functions are refernece-invariant when there are errors.  i.e. if a function fails or not does not change the caller's reference passing assumptions.

For example, Py_BuildValue("N", myobject);
takes care to always steal the reference of myobject, even when Py_BuildValue fails.

Thi tehe case of _PyBytes_Resize(), the caller owns the (single) reference to the operand, and owns the reference to it (or a new one) on success.  It is highly unusual that the case of failure causes it to no longer own this reference.

Python 3 should have taken the opportunity to remove remove this unusual inheritance from _PyString_Resize()
msg209780 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-31 13:00
> Python 3 should have taken the opportunity to remove remove this unusual inheritance from _PyString_Resize()

It's not so unusual. PyUnicode_InternInPlace() replaces also a pointer
to a object in the caller for example.

There are many other functions taking PyObject** parameters:
PyUnicode_Append(), PyDict_Next(), PyEval_EvalCodeEx(), PyErr_Fetch(),
etc.

Anyway, it's too late to change such major API, so it's not very
useful the discuss this theorical change :-) And the function is well
documented:
http://docs.python.org/dev/c-api/bytes.html#_PyBytes_Resize
"If the reallocation fails, the original bytes object at *bytes is
deallocated, *bytes is set to NULL, a memory exception is set, and -1
is returned."
msg209794 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-01-31 15:15
I'm not talking about the PyObject** argument, Victor.
I'm talking about reference counting semantics.  It is a rule that reference counting semantics should be the same over a function call whether that function raised an exception or not.

The this function effectively steals a reference in case of error. The caller owns the reference to the argument (passed by ref) if it succeeds, but if id doesn't, then he doesn't own it anymore.

Reference counting invariance with errors is, as I mentioned, observed with e.g. the 'N' argument to Py_BuildValue(), which is defined to "steal" a reference and does so even if the call fails.  This behaviour is observed by other reference-stealing functions, such as PyTuple_SetItem().  Similarly, functions who don't steal a reference, i.e. take their own, will not change that behaviour if they error.

If you don't want to think about this in terms of reference counting semantics, think about it in terms of the fact that in case of error, most functions leave everything as it was.  PyList_Append(), if it fails, leaves everything as it was.
This function does not.  In case of failure, it will, as a convenience to the caller, release the original object.
It is equivalent to () realloc() freeing its operand if it cannot succeed.

It is precisely these 'unusual' exceptions from established semantics that cause these kind of programming errors.  Originally, this was probably designed as a convenience to the programmer for the handful of places that the function (_PyString_Resize) was used.  But this decision forces every new user of this function (and its descendents) to be acutely aware of its unusual error behaviour.
msg215836 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2014-04-09 19:02
Looks like the issue has gone after 3.4
Anf it's still present for 2.7.
If somebody like to make a patch for 2.7 - you are welcome!
msg215845 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-09 20:33
Here we are.  There were a lot of places where this was being incorrectly done.  And some places where this was being considered a recoverable error, which it isn't because the source is freed.

Which sort of supports my opinion that this is bad general api design, but perhaps a good conveneice function for limited use.
msg215885 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-10 14:13
Could someone please review this patch? I'd like to see it committed asap.
msg215890 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-10 16:04
I don't see a review link.  Looks like your patch wasn't against tip (at attach-time) or you used the --git flag in diff.  Having the patch in the review tool would be really would be really helpful.  I'll take a look otherwise, but won't be able to immediately regardless.
msg215891 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-10 16:43
Ok, retrying without the --git flag (I thought that was recommended, it was once...)
msg216083 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-14 14:39
I was going to say we should consider changing the API of _PyBytes_Resize() and _PyString_Resize().  However, having looked at the two functions, I guess it makes sense.

Looking at the patch, I'd argue that we still need to set the string to NULL in the error case.  Only in the out-of-memory case do the two functions change it to NULL for you.
msg216087 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-14 14:57
I would also advocate for a better api, that leaves it up to the caller what to do, much like realloc() does.  A convenience macro that frees the block on error could then be provided.  But this is 2.7 and we don't change stuff there :)

Can you elaborate on your second comment?  Is there some place where I forgot to clear the object?
msg216091 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-14 15:21
For example, in the patch binascii_b2a_uu() in Modules/binascii.c no longer sets rv to NULL even though in one of the _PyString_Resize() error cases rv is not automatically set to NULL.  And simply setting rv to NULL would be backward-incompatible as well.
msg216095 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-14 15:30
This is _PyString_Resize(). I don't immediatlly see an error case where the string isn't freed:

int
_PyString_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyStringObject *sv;
    v = *pv;
    if (!PyString_Check(v) || Py_REFCNT(v) != 1 || newsize < 0 ||
        PyString_CHECK_INTERNED(v)) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyStringObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyStringObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}
msg216119 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-14 17:19
Yeah, I missed the "*pv = 0;" line in the first error case.
msg216133 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-14 17:46
Ok, are we good to go then?
msg216160 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-14 18:29
Okay from me, but Victor has a better idea of the string APIs. :)
msg216166 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-14 18:41
I'm busy right now with Pycon (I organize a sprint to port OpenStack
to Python 3). I will try to review the patch later this week.
msg216167 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-14 18:42
Sure, there was at least one case in the patch, where the string resize was consider optional, and the code tried to recover if it didn't succeed.
But I don't think we should be trying to change apis, even internal ones in python 2.7.
msg216711 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-17 16:58
Add comments and explicit (void) on the ignored value from _PyString_Resize as suggested by Victor
msg217158 - (view) Author: Roundup Robot (python-dev) Date: 2014-04-25 09:53
New changeset 4f79c3827adc by Kristján Valur Jónsson in branch '2.7':
Issue #20434 Correct error handlin of _PyString_Resize and _PyBytes_Resize
http://hg.python.org/cpython/rev/4f79c3827adc
msg231270 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-17 08:12
Is there anything left for this issue?
msg231280 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-11-17 11:55
Nope, closing as fixed :)
History
Date User Action Args
2014-11-17 11:55:45kristjan.jonssonsetstatus: open -> closed
resolution: fixed
messages: + msg231280
2014-11-17 08:12:12serhiy.storchakasetmessages: + msg231270
2014-04-25 09:53:59python-devsetnosy: + python-dev
messages: + msg217158
2014-04-17 16:58:22kristjan.jonssonsetfiles: + string_resize.patch

messages: + msg216711
2014-04-16 07:43:43hayposettitle: Process crashes if not enough memory to import module -> Fix error handler of _PyString_Resize() on allocation failure
2014-04-14 18:42:03kristjan.jonssonsetmessages: + msg216167
2014-04-14 18:41:54hayposetmessages: + msg216166
2014-04-14 18:29:56eric.snowsetmessages: + msg216160
2014-04-14 17:46:50kristjan.jonssonsetmessages: + msg216133
2014-04-14 17:19:39eric.snowsetmessages: + msg216119
2014-04-14 15:30:41kristjan.jonssonsetmessages: + msg216095
2014-04-14 15:21:21eric.snowsetmessages: + msg216091
2014-04-14 14:57:56kristjan.jonssonsetmessages: + msg216087
2014-04-14 14:39:36eric.snowsetmessages: + msg216083
2014-04-10 16:53:17kristjan.jonssonsetfiles: - string_resize.patch
2014-04-10 16:43:35kristjan.jonssonsetfiles: + string_resize.patch

messages: + msg215891
2014-04-10 16:04:37eric.snowsetnosy: + eric.snow
messages: + msg215890
2014-04-10 14:13:38kristjan.jonssonsetmessages: + msg215885
2014-04-09 20:33:23kristjan.jonssonsetfiles: + string_resize.patch

messages: + msg215845
2014-04-09 19:02:15asvetlovsetmessages: + msg215836
versions: - Python 3.3
2014-01-31 15:15:04kristjan.jonssonsetmessages: + msg209794
2014-01-31 13:00:05hayposetmessages: + msg209780
2014-01-31 12:18:53kristjan.jonssonsetnosy: + kristjan.jonsson
messages: + msg209777
2014-01-29 14:03:33serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg209642
2014-01-29 10:27:54hayposetnosy: + haypo
messages: + msg209632
2014-01-29 10:21:46asvetlovsetkeywords: + needs review
stage: patch review
2014-01-29 10:21:01asvetlovsetfiles: + issue20434.diff
versions: + Python 2.7
messages: + msg209631

components: + Extension Modules, - Interpreter Core
keywords: + patch
2014-01-29 08:36:40qualabcreate