msg270327 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-13 17:12 |
As the title, bytearray.extend simply use `buf_size = len + (len >> 1) + 1;` to determine next *buf_size* when increasing buffer. But this can overflow in theory. So I suggest adding overflow check.
|
msg270328 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-13 17:15 |
Ohh, sorry for the disturb. I made a mistake. This is not needed and now close it.
|
msg270331 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-13 17:26 |
Sorry, after checking again, this is still needed but maybe `if (len == PY_SSIZE_T_MAX)` part is not necessary since the iterable's length should not be larger than PY_SSIZE_T_MAX. But keep it to avoid theoretically bug is not a bad idea.
|
msg270362 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-07-14 04:14 |
It is possible to make an infinite iterable, e.g. iter(int, 1), so it is definitely worth checking for overflow. The patch looks okay to me. An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first, but I am okay with either way.
|
msg270366 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-14 04:45 |
Nice to see the real example. I don't think of that.
|
msg270391 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-07-14 10:17 |
Totally agreed with Martin.
|
msg270394 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-14 10:40 |
So would you like to merge it like this or Martin's alternative way? But honestly speaking I don't get Martin's point, "without trying to allocate Py_SSIZE_T_MAX first", where is this place?
|
msg270395 - (view) |
Author: Antti Haapala (ztane) * |
Date: 2016-07-14 10:41 |
if (len == PY_SSIZE_T_MAX) is necessary for the case that the iterable is already PY_SSIZE_T_MAX items. However it could be moved inside the *other* if because if (len == PY_SSIZE_T_MAX) should also fail the overflow check.
However, I believe it is theoretical at most with stuff that Python supports that it would even be possible to allocate an array of PY_SSIZE_T_MAX *pointers*. The reason is that the maximum size of object can be only that of `size_t`, and Py_ssize_t should be a signed type of that size; and it would thus be possible only to allocate an array of PY_SSIZE_T_MAX pointers only if they're 16 bits wide.
In any case, this would be another place where my proposed macro "SUM_OVERFLOWS_PY_SSIZE_T" or something would be in order to make it read
if (SUM_OVERFLOWS_PY_SSIZE_T(len, (len >> 1) + 1)
which would make it easier to spot mistakes in the sign preceding 1.
|
msg270400 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-14 10:54 |
Thanks for the analysis Antti. I don't think if (len == PY_SSIZE_T_MAX) can be moved inside the *other* if. Their handlings are different. if (len == PY_SSIZE_T_MAX) is True, it should exit immediately. But in the *other* if, you can still have a try to allocate PY_SSIZE_T_MAX memory.
As for your overflow macro, I think it's not very useful. First, not only Py_ssize_t can overflow, all signed types can. So a single SUM_OVERFLOWS_PY_SSIZE_T is not enough. Second, current overflow check pattern like a > PY_SSIZE_T_MAX - b is very obvious in my opinion.
|
msg270564 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-07-16 15:35 |
Not particularly related, but the special fast case in Objects/listobject.c:811, listextend(), also seems to lack an overflow check.
“An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first”: what I meant was removing the special case to allocate PY_SSIZE_T_MAX. As soon as it attempts to overallocate 2+ GiB of memory it fails. Something more like
addition = len >> 1;
if (addition > PY_SSIZE_T_MAX - len - 1) {
/* . . . */
return PyErr_NoMemory();
}
buf_size = len + addition;
Antti: in this case we are allocating an array of _bytes_, not pointers. So maybe it is possible to reach the limit with a 32-bit address space.
|
msg270568 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-16 15:51 |
I can't totally agree the point. The code means every time we increase the buffer by half the current length. So when the length arrives 2/3 * PY_SSIZE_T_MAX it's going to overflow. There is a 1/3 * PY_SSIZE_T_MAX gap between the theoretical upper limit. I think leaving such a gap and exiting is somewhat too rude. So in such case I make it have a try.
|
msg270602 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-07-17 02:36 |
Yeah I see your point. Anyway I think the current patch is fine.
|
msg270603 - (view) |
Author: Decorater (Decorater) * |
Date: 2016-07-17 02:43 |
Only 1 way to find out, test it till it breaks. :P
|
msg270622 - (view) |
Author: Antti Haapala (ztane) * |
Date: 2016-07-17 09:17 |
Ah indeed, this is a bytearray and it is indeed possible to theoretically allocate PY_SSIZE_T_MAX bytes, if on an architecture that does segmented memory.
As for
if (addition > PY_SSIZE_T_MAX - len - 1) {
it is very clear to *us* but it is not quite self-documenting on why to do it this way to someone who doesn't know undefined behaviours in C (hint: next to no one knows, judging from the amount of complaints that the GCC "bug" received), instead of say
if (INT_ADD_OVERFLOW(len, addition))
Where the INT_ADD_OVERFLOW would have a comment above explaining why it has to be done that way. But more discussion about it at https://bugs.python.org/issue1621
|
msg270726 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-07-18 08:22 |
New changeset 54cc0480904c by Martin Panter in branch '3.5':
Issue #27507: Check for integer overflow in bytearray.extend()
https://hg.python.org/cpython/rev/54cc0480904c
New changeset 6e166b66aa44 by Martin Panter in branch '2.7':
Issue #27507: Check for integer overflow in bytearray.extend()
https://hg.python.org/cpython/rev/6e166b66aa44
New changeset 646ad4894c32 by Martin Panter in branch 'default':
Issue #27507: Merge overflow check from 3.5
https://hg.python.org/cpython/rev/646ad4894c32
|
msg270742 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-07-18 09:57 |
Committed without any macro for the time being
|
msg270743 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2016-07-18 10:00 |
Thanks for your work.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:33 | admin | set | github: 71694 |
2016-07-18 10:00:24 | xiang.zhang | set | messages:
+ msg270743 |
2016-07-18 09:57:33 | martin.panter | set | status: open -> closed resolution: not a bug -> fixed messages:
+ msg270742
stage: patch review -> resolved |
2016-07-18 08:22:55 | python-dev | set | nosy:
+ python-dev messages:
+ msg270726
|
2016-07-17 09:17:35 | ztane | set | messages:
+ msg270622 |
2016-07-17 02:43:42 | Decorater | set | messages:
+ msg270603 |
2016-07-17 02:36:24 | martin.panter | set | messages:
+ msg270602 |
2016-07-16 15:51:14 | xiang.zhang | set | messages:
+ msg270568 |
2016-07-16 15:35:25 | martin.panter | set | messages:
+ msg270564 |
2016-07-14 10:54:05 | xiang.zhang | set | messages:
+ msg270400 |
2016-07-14 10:41:14 | ztane | set | nosy:
+ ztane messages:
+ msg270395
|
2016-07-14 10:40:35 | xiang.zhang | set | messages:
+ msg270394 |
2016-07-14 10:21:38 | Decorater | set | nosy:
+ Decorater
|
2016-07-14 10:17:53 | serhiy.storchaka | set | messages:
+ msg270391 |
2016-07-14 04:45:00 | xiang.zhang | set | messages:
+ msg270366 |
2016-07-14 04:14:14 | martin.panter | set | versions:
+ Python 2.7 nosy:
+ martin.panter
messages:
+ msg270362
stage: patch review |
2016-07-13 17:26:23 | xiang.zhang | set | status: closed -> open
messages:
+ msg270331 |
2016-07-13 17:15:21 | xiang.zhang | set | status: open -> closed resolution: not a bug messages:
+ msg270328
|
2016-07-13 17:12:28 | xiang.zhang | create | |