Issue 27507: bytearray.extend lacks overflow check when increasing buffer

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71694

classification

Title:	bytearray.extend lacks overflow check when increasing buffer
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.6, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Decorater, martin.panter, python-dev, serhiy.storchaka, xiang.zhang, ztane
Priority:	normal	Keywords:	patch

Created on 2016-07-13 17:12 by xiang.zhang, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
add_bytearray_extend_overflow_check.patch	xiang.zhang, 2016-07-13 17:12		review

Messages (17)
msg270327 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-13 17:12
As the title, bytearray.extend simply use `buf_size = len + (len >> 1) + 1;` to determine next buf_size when increasing buffer. But this can overflow in theory. So I suggest adding overflow check.
msg270328 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-13 17:15
Ohh, sorry for the disturb. I made a mistake. This is not needed and now close it.
msg270331 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-13 17:26
Sorry, after checking again, this is still needed but maybe `if (len == PY_SSIZE_T_MAX)` part is not necessary since the iterable's length should not be larger than PY_SSIZE_T_MAX. But keep it to avoid theoretically bug is not a bad idea.
msg270362 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-14 04:14
It is possible to make an infinite iterable, e.g. iter(int, 1), so it is definitely worth checking for overflow. The patch looks okay to me. An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first, but I am okay with either way.
msg270366 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-14 04:45
Nice to see the real example. I don't think of that.
msg270391 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-07-14 10:17
Totally agreed with Martin.
msg270394 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-14 10:40
So would you like to merge it like this or Martin's alternative way? But honestly speaking I don't get Martin's point, "without trying to allocate Py_SSIZE_T_MAX first", where is this place?
msg270395 - (view)	Author: Antti Haapala (ztane) *	Date: 2016-07-14 10:41
if (len == PY_SSIZE_T_MAX) is necessary for the case that the iterable is already PY_SSIZE_T_MAX items. However it could be moved inside the other if because if (len == PY_SSIZE_T_MAX) should also fail the overflow check. However, I believe it is theoretical at most with stuff that Python supports that it would even be possible to allocate an array of PY_SSIZE_T_MAX pointers. The reason is that the maximum size of object can be only that of `size_t`, and Py_ssize_t should be a signed type of that size; and it would thus be possible only to allocate an array of PY_SSIZE_T_MAX pointers only if they're 16 bits wide. In any case, this would be another place where my proposed macro "SUM_OVERFLOWS_PY_SSIZE_T" or something would be in order to make it read if (SUM_OVERFLOWS_PY_SSIZE_T(len, (len >> 1) + 1) which would make it easier to spot mistakes in the sign preceding 1.
msg270400 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-14 10:54
Thanks for the analysis Antti. I don't think if (len == PY_SSIZE_T_MAX) can be moved inside the other if. Their handlings are different. if (len == PY_SSIZE_T_MAX) is True, it should exit immediately. But in the other if, you can still have a try to allocate PY_SSIZE_T_MAX memory. As for your overflow macro, I think it's not very useful. First, not only Py_ssize_t can overflow, all signed types can. So a single SUM_OVERFLOWS_PY_SSIZE_T is not enough. Second, current overflow check pattern like a > PY_SSIZE_T_MAX - b is very obvious in my opinion.
msg270564 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-16 15:35
Not particularly related, but the special fast case in Objects/listobject.c:811, listextend(), also seems to lack an overflow check. “An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first”: what I meant was removing the special case to allocate PY_SSIZE_T_MAX. As soon as it attempts to overallocate 2+ GiB of memory it fails. Something more like addition = len >> 1; if (addition > PY_SSIZE_T_MAX - len - 1) { /* . . . */ return PyErr_NoMemory(); } buf_size = len + addition; Antti: in this case we are allocating an array of _bytes_, not pointers. So maybe it is possible to reach the limit with a 32-bit address space.
msg270568 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-16 15:51
I can't totally agree the point. The code means every time we increase the buffer by half the current length. So when the length arrives 2/3 * PY_SSIZE_T_MAX it's going to overflow. There is a 1/3 * PY_SSIZE_T_MAX gap between the theoretical upper limit. I think leaving such a gap and exiting is somewhat too rude. So in such case I make it have a try.
msg270602 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-17 02:36
Yeah I see your point. Anyway I think the current patch is fine.
msg270603 - (view)	Author: Decorater (Decorater) *	Date: 2016-07-17 02:43
Only 1 way to find out, test it till it breaks. :P
msg270622 - (view)	Author: Antti Haapala (ztane) *	Date: 2016-07-17 09:17
Ah indeed, this is a bytearray and it is indeed possible to theoretically allocate PY_SSIZE_T_MAX bytes, if on an architecture that does segmented memory. As for if (addition > PY_SSIZE_T_MAX - len - 1) { it is very clear to us but it is not quite self-documenting on why to do it this way to someone who doesn't know undefined behaviours in C (hint: next to no one knows, judging from the amount of complaints that the GCC "bug" received), instead of say if (INT_ADD_OVERFLOW(len, addition)) Where the INT_ADD_OVERFLOW would have a comment above explaining why it has to be done that way. But more discussion about it at https://bugs.python.org/issue1621
msg270726 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-07-18 08:22
New changeset 54cc0480904c by Martin Panter in branch '3.5': Issue #27507: Check for integer overflow in bytearray.extend() https://hg.python.org/cpython/rev/54cc0480904c New changeset 6e166b66aa44 by Martin Panter in branch '2.7': Issue #27507: Check for integer overflow in bytearray.extend() https://hg.python.org/cpython/rev/6e166b66aa44 New changeset 646ad4894c32 by Martin Panter in branch 'default': Issue #27507: Merge overflow check from 3.5 https://hg.python.org/cpython/rev/646ad4894c32
msg270742 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-18 09:57
Committed without any macro for the time being
msg270743 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-07-18 10:00
Thanks for your work.

History
Date	User	Action	Args
2022-04-11 14:58:33	admin	set	github: 71694
2016-07-18 10:00:24	xiang.zhang	set	messages: + msg270743
2016-07-18 09:57:33	martin.panter	set	status: open -> closed resolution: not a bug -> fixed messages: + msg270742 stage: patch review -> resolved
2016-07-18 08:22:55	python-dev	set	nosy: + python-dev messages: + msg270726
2016-07-17 09:17:35	ztane	set	messages: + msg270622
2016-07-17 02:43:42	Decorater	set	messages: + msg270603
2016-07-17 02:36:24	martin.panter	set	messages: + msg270602
2016-07-16 15:51:14	xiang.zhang	set	messages: + msg270568
2016-07-16 15:35:25	martin.panter	set	messages: + msg270564
2016-07-14 10:54:05	xiang.zhang	set	messages: + msg270400
2016-07-14 10:41:14	ztane	set	nosy: + ztane messages: + msg270395
2016-07-14 10:40:35	xiang.zhang	set	messages: + msg270394
2016-07-14 10:21:38	Decorater	set	nosy: + Decorater
2016-07-14 10:17:53	serhiy.storchaka	set	messages: + msg270391
2016-07-14 04:45:00	xiang.zhang	set	messages: + msg270366
2016-07-14 04:14:14	martin.panter	set	versions: + Python 2.7 nosy: + martin.panter messages: + msg270362 stage: patch review
2016-07-13 17:26:23	xiang.zhang	set	status: closed -> open messages: + msg270331
2016-07-13 17:15:21	xiang.zhang	set	status: open -> closed resolution: not a bug messages: + msg270328
2016-07-13 17:12:28	xiang.zhang	create