This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bytearray.extend lacks overflow check when increasing buffer
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Decorater, martin.panter, python-dev, serhiy.storchaka, xiang.zhang, ztane
Priority: normal Keywords: patch

Created on 2016-07-13 17:12 by xiang.zhang, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
add_bytearray_extend_overflow_check.patch xiang.zhang, 2016-07-13 17:12 review
Messages (17)
msg270327 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-13 17:12
As the title, bytearray.extend simply use `buf_size = len + (len >> 1) + 1;` to determine next *buf_size* when increasing buffer. But this can overflow in theory. So I suggest adding overflow check.
msg270328 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-13 17:15
Ohh, sorry for the disturb. I made a mistake. This is not needed and now close it.
msg270331 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-13 17:26
Sorry, after checking again, this is still needed but maybe `if (len == PY_SSIZE_T_MAX)` part is not necessary since the iterable's length should not be larger than PY_SSIZE_T_MAX. But keep it to avoid theoretically bug is not a bad idea.
msg270362 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-14 04:14
It is possible to make an infinite iterable, e.g. iter(int, 1), so it is definitely worth checking for overflow. The patch looks okay to me. An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first, but I am okay with either way.
msg270366 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-14 04:45
Nice to see the real example. I don't think of that.
msg270391 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-14 10:17
Totally agreed with Martin.
msg270394 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-14 10:40
So would you like to merge it like this or Martin's alternative way? But honestly speaking I don't get Martin's point, "without trying to allocate Py_SSIZE_T_MAX first", where is this place?
msg270395 - (view) Author: Antti Haapala (ztane) * Date: 2016-07-14 10:41
if (len == PY_SSIZE_T_MAX) is necessary for the case that the iterable is already PY_SSIZE_T_MAX items. However it could be moved inside the *other* if because if (len == PY_SSIZE_T_MAX) should also fail the overflow check.

However, I believe it is theoretical at most with stuff that Python supports that it would even be possible to allocate an array of PY_SSIZE_T_MAX *pointers*. The reason is that the maximum size of object can be only that of `size_t`, and Py_ssize_t should be a signed type of that size; and it would thus be possible only to allocate an array of PY_SSIZE_T_MAX pointers only if they're 16 bits wide.

In any case, this would be another place where my proposed macro "SUM_OVERFLOWS_PY_SSIZE_T" or something would be in order to make it read

    if (SUM_OVERFLOWS_PY_SSIZE_T(len, (len >> 1) + 1)

which would make it easier to spot mistakes in the sign preceding 1.
msg270400 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-14 10:54
Thanks for the analysis Antti. I don't think if (len == PY_SSIZE_T_MAX) can be moved inside the *other* if. Their handlings are different. if (len == PY_SSIZE_T_MAX) is True, it should exit immediately. But in the *other* if, you can still have a try to allocate PY_SSIZE_T_MAX memory. 

As for your overflow macro, I think it's not very useful. First, not only Py_ssize_t can overflow, all signed types can. So a single SUM_OVERFLOWS_PY_SSIZE_T is not enough. Second, current overflow check pattern like a > PY_SSIZE_T_MAX - b is very obvious in my opinion.
msg270564 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-16 15:35
Not particularly related, but the special fast case in Objects/listobject.c:811, listextend(), also seems to lack an overflow check.

“An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first”: what I meant was removing the special case to allocate PY_SSIZE_T_MAX. As soon as it attempts to overallocate 2+ GiB of memory it fails. Something more like

addition = len >> 1;
if (addition > PY_SSIZE_T_MAX - len - 1) {
    /* . . . */
    return PyErr_NoMemory();
}
buf_size = len + addition;

Antti: in this case we are allocating an array of _bytes_, not pointers. So maybe it is possible to reach the limit with a 32-bit address space.
msg270568 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-16 15:51
I can't totally agree the point. The code means every time we increase the buffer by half the current length. So when the length arrives 2/3 * PY_SSIZE_T_MAX it's going to overflow. There is a 1/3 * PY_SSIZE_T_MAX gap between the theoretical upper limit. I think leaving such a gap and exiting is somewhat too rude. So in such case I make it have a try.
msg270602 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-17 02:36
Yeah I see your point. Anyway I think the current patch is fine.
msg270603 - (view) Author: Decorater (Decorater) * Date: 2016-07-17 02:43
Only 1 way to find out, test it till it breaks. :P
msg270622 - (view) Author: Antti Haapala (ztane) * Date: 2016-07-17 09:17
Ah indeed, this is a bytearray and it is indeed possible to theoretically allocate PY_SSIZE_T_MAX bytes, if on an architecture that does segmented memory.

As for 

    if (addition > PY_SSIZE_T_MAX - len - 1) {

it is very clear to *us* but it is not quite self-documenting on why to do it this way to someone who doesn't know undefined behaviours in C (hint: next to no one knows, judging from the amount of complaints that the GCC "bug" received), instead of say

    if (INT_ADD_OVERFLOW(len, addition))

Where the INT_ADD_OVERFLOW would have a comment above explaining why it has to be done that way. But more discussion about it at https://bugs.python.org/issue1621
msg270726 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-07-18 08:22
New changeset 54cc0480904c by Martin Panter in branch '3.5':
Issue #27507: Check for integer overflow in bytearray.extend()
https://hg.python.org/cpython/rev/54cc0480904c

New changeset 6e166b66aa44 by Martin Panter in branch '2.7':
Issue #27507: Check for integer overflow in bytearray.extend()
https://hg.python.org/cpython/rev/6e166b66aa44

New changeset 646ad4894c32 by Martin Panter in branch 'default':
Issue #27507: Merge overflow check from 3.5
https://hg.python.org/cpython/rev/646ad4894c32
msg270742 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-18 09:57
Committed without any macro for the time being
msg270743 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-18 10:00
Thanks for your work.
History
Date User Action Args
2022-04-11 14:58:33adminsetgithub: 71694
2016-07-18 10:00:24xiang.zhangsetmessages: + msg270743
2016-07-18 09:57:33martin.pantersetstatus: open -> closed
resolution: not a bug -> fixed
messages: + msg270742

stage: patch review -> resolved
2016-07-18 08:22:55python-devsetnosy: + python-dev
messages: + msg270726
2016-07-17 09:17:35ztanesetmessages: + msg270622
2016-07-17 02:43:42Decoratersetmessages: + msg270603
2016-07-17 02:36:24martin.pantersetmessages: + msg270602
2016-07-16 15:51:14xiang.zhangsetmessages: + msg270568
2016-07-16 15:35:25martin.pantersetmessages: + msg270564
2016-07-14 10:54:05xiang.zhangsetmessages: + msg270400
2016-07-14 10:41:14ztanesetnosy: + ztane
messages: + msg270395
2016-07-14 10:40:35xiang.zhangsetmessages: + msg270394
2016-07-14 10:21:38Decoratersetnosy: + Decorater
2016-07-14 10:17:53serhiy.storchakasetmessages: + msg270391
2016-07-14 04:45:00xiang.zhangsetmessages: + msg270366
2016-07-14 04:14:14martin.pantersetversions: + Python 2.7
nosy: + martin.panter

messages: + msg270362

stage: patch review
2016-07-13 17:26:23xiang.zhangsetstatus: closed -> open

messages: + msg270331
2016-07-13 17:15:21xiang.zhangsetstatus: open -> closed
resolution: not a bug
messages: + msg270328
2016-07-13 17:12:28xiang.zhangcreate