This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _PyObject_VAR_SIZE should avoid arithmetic overflow
Type: Stage: patch review
Components: Interpreter Core Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Greg Price, christian.heimes, sir-sigurd, twouters
Priority: normal Keywords: patch

Created on 2019-09-10 06:03 by Greg Price, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 14838 sir-sigurd, 2019-09-10 06:05
Messages (2)
msg351573 - (view) Author: Greg Price (Greg Price) * Date: 2019-09-10 06:03
Currently `_PyObject_VAR_SIZE` effectively has a precondition that it must not be passed arguments which would overflow in its arithmetic. If that's violated, it overflows... which is undefined behavior already, and in fact the likely next thing that happens is we allocate a short buffer for a giant array of items, then overflow the buffer, bam!

(The arithmetic is on `Py_ssize_t`, a signed integer type; that's why overflow is UB.)

It's quite difficult to maintain that precondition at call sites, because it effectively means writing a check that duplicates the content of `_PyObject_VAR_SIZE` modulo some algebraic rearrangement.  Empirically we don't consistently manage that: we just fixed an 11-year-old bug where such a check was wrong (luckily in the safe/over-conservative direction), and some other call paths don't immediately appear to be checking at all.

So, I think it would be helpful to move that check into `_PyObject_VAR_SIZE` itself.

Cc'ing the three folks on the interest list "coverity scan", as that one seems the closest match for who'd likely be interested in commenting on this.

Details below, largely from GH-14838 which got me looking at this.

---

In GH-14838, @sir-sigurd fixed an overflow check on a call to PyObject_GC_NewVar.  The call site looked like this:

        if ((size_t)size > ((size_t)PY_SSIZE_T_MAX - sizeof(PyTupleObject) -
                    sizeof(PyObject *)) / sizeof(PyObject *)) {
            return (PyTupleObject *)PyErr_NoMemory();
        }
        op = PyObject_GC_NewVar(PyTupleObject, &PyTuple_Type, size);

but the `- sizeof(PyObject *)` had the wrong sign.  Happily the bug made the check too conservative, but that's basically a lucky coin-flip we had -- a similar bug could just as easily have gone the other direction, making the code vulnerable to arithmetic overflow, then memory corruption from overflowing a too-small buffer.

The bug was present for 11 years.

Sergey, I am still very curious how you discovered the bug. :-)

I feel like there's an opportunity for a more complete fix as a followup to this; that's this thread.

---

My next question on reading that fix was, OK, how can we write this so it's hard to get wrong?

I think the point is that `PyObject_GC_NewVar` really has a precondition, that the `nitems` it's passed (`size` in this caller) not be so big that the allocation will overflow. Specifically `_PyObject_VAR_SIZE(tp, nitems)` needs to not overflow. All the trickiness in this overflow check is basically an algebraic rearrangement of what `_PyObject_VAR_SIZE(&PyTuple_Type, size)` would say, plus substituting in the values actually found at `&PyTuple_Type`.

So, a sort of minimal fix would be to make something with a name like `PyObject_GC_NewVar_WouldOverflow` that encodes that. Maybe that in turn would just be a macro to delegate to a `_PyObject_VAR_SIZE_WOULD_OVERFLOW`, so that each layer only has to know what it actually knows.

I think that immediately raises a next question, though, which is: are other call sites of `PyObject_GC_NewVar` checking properly for overflow? At a quick grep, I see some that don't appear to be checking.

And: why should they have to? It seems like the best place for this check is immediately before the possibly-overflowing computation; or if not there, then the closer to it the better.

So, the ideal solution: `_PyObject_VAR_SIZE` itself would be augmented to first do this check, and only if it won't overflow then go and actually multiply.

There are only a handful of call sites of `_PyObject_VAR_SIZE`, so it wouldn't even be much of a migration to then take care of each of them.

Then we wouldn't need these prophylactic checks at callers' callers -- which (a) this bug demonstrates are hard to consistently get right, and (b) the grep mentioned above suggests we may not be consistently doing in the first place.
msg351575 - (view) Author: Greg Price (Greg Price) * Date: 2019-09-10 06:09
(The tracker just linked GH-14838 to this issue because I mentioned it in a comment there, but it's not for this issue -- it's that recent fix for an 11-year-old bug in a callsite's overflow check.)
History
Date User Action Args
2022-04-11 14:59:20adminsetgithub: 82260
2019-11-27 18:46:11brett.cannonsetnosy: - brett.cannon
2019-09-10 06:09:27Greg Pricesetmessages: + msg351575
2019-09-10 06:05:35sir-sigurdsetkeywords: + patch
stage: patch review
pull_requests: + pull_request15471
2019-09-10 06:03:54Greg Pricecreate