Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytearray.extend lacks overflow check when increasing buffer #71694

Closed
zhangyangyu opened this issue Jul 13, 2016 · 17 comments
Closed

bytearray.extend lacks overflow check when increasing buffer #71694

zhangyangyu opened this issue Jul 13, 2016 · 17 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@zhangyangyu
Copy link
Member

BPO 27507
Nosy @vadmium, @serhiy-storchaka, @ztane, @zhangyangyu, @AraHaan
Files
  • add_bytearray_extend_overflow_check.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-07-18.09:57:33.670>
    created_at = <Date 2016-07-13.17:12:28.083>
    labels = ['interpreter-core', 'type-feature']
    title = 'bytearray.extend lacks overflow check when increasing buffer'
    updated_at = <Date 2016-07-18.10:00:24.338>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2016-07-18.10:00:24.338>
    actor = 'xiang.zhang'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-07-18.09:57:33.670>
    closer = 'martin.panter'
    components = ['Interpreter Core']
    creation = <Date 2016-07-13.17:12:28.083>
    creator = 'xiang.zhang'
    dependencies = []
    files = ['43707']
    hgrepos = []
    issue_num = 27507
    keywords = ['patch']
    message_count = 17.0
    messages = ['270327', '270328', '270331', '270362', '270366', '270391', '270394', '270395', '270400', '270564', '270568', '270602', '270603', '270622', '270726', '270742', '270743']
    nosy_count = 6.0
    nosy_names = ['python-dev', 'martin.panter', 'serhiy.storchaka', 'ztane', 'xiang.zhang', 'Decorater']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue27507'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @zhangyangyu
    Copy link
    Member Author

    As the title, bytearray.extend simply use buf_size = len + (len >> 1) + 1; to determine next buf_size when increasing buffer. But this can overflow in theory. So I suggest adding overflow check.

    @zhangyangyu zhangyangyu added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jul 13, 2016
    @zhangyangyu
    Copy link
    Member Author

    Ohh, sorry for the disturb. I made a mistake. This is not needed and now close it.

    @zhangyangyu
    Copy link
    Member Author

    Sorry, after checking again, this is still needed but maybe if (len == PY_SSIZE_T_MAX) part is not necessary since the iterable's length should not be larger than PY_SSIZE_T_MAX. But keep it to avoid theoretically bug is not a bad idea.

    @zhangyangyu zhangyangyu reopened this Jul 13, 2016
    @vadmium
    Copy link
    Member

    vadmium commented Jul 14, 2016

    It is possible to make an infinite iterable, e.g. iter(int, 1), so it is definitely worth checking for overflow. The patch looks okay to me. An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first, but I am okay with either way.

    @zhangyangyu
    Copy link
    Member Author

    Nice to see the real example. I don't think of that.

    @serhiy-storchaka
    Copy link
    Member

    Totally agreed with Martin.

    @zhangyangyu
    Copy link
    Member Author

    So would you like to merge it like this or Martin's alternative way? But honestly speaking I don't get Martin's point, "without trying to allocate Py_SSIZE_T_MAX first", where is this place?

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Jul 14, 2016

    if (len == PY_SSIZE_T_MAX) is necessary for the case that the iterable is already PY_SSIZE_T_MAX items. However it could be moved inside the *other* if because if (len == PY_SSIZE_T_MAX) should also fail the overflow check.

    However, I believe it is theoretical at most with stuff that Python supports that it would even be possible to allocate an array of PY_SSIZE_T_MAX pointers. The reason is that the maximum size of object can be only that of size_t, and Py_ssize_t should be a signed type of that size; and it would thus be possible only to allocate an array of PY_SSIZE_T_MAX pointers only if they're 16 bits wide.

    In any case, this would be another place where my proposed macro "SUM_OVERFLOWS_PY_SSIZE_T" or something would be in order to make it read

    if (SUM_OVERFLOWS_PY_SSIZE_T(len, (len >> 1) + 1)
    

    which would make it easier to spot mistakes in the sign preceding 1.

    @zhangyangyu
    Copy link
    Member Author

    Thanks for the analysis Antti. I don't think if (len == PY_SSIZE_T_MAX) can be moved inside the *other* if. Their handlings are different. if (len == PY_SSIZE_T_MAX) is True, it should exit immediately. But in the *other* if, you can still have a try to allocate PY_SSIZE_T_MAX memory.

    As for your overflow macro, I think it's not very useful. First, not only Py_ssize_t can overflow, all signed types can. So a single SUM_OVERFLOWS_PY_SSIZE_T is not enough. Second, current overflow check pattern like a > PY_SSIZE_T_MAX - b is very obvious in my opinion.

    @vadmium
    Copy link
    Member

    vadmium commented Jul 16, 2016

    Not particularly related, but the special fast case in Objects/listobject.c:811, listextend(), also seems to lack an overflow check.

    “An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first”: what I meant was removing the special case to allocate PY_SSIZE_T_MAX. As soon as it attempts to overallocate 2+ GiB of memory it fails. Something more like

    addition = len >> 1;
    if (addition > PY_SSIZE_T_MAX - len - 1) {
        /* . . . */
        return PyErr_NoMemory();
    }
    buf_size = len + addition;

    Antti: in this case we are allocating an array of _bytes_, not pointers. So maybe it is possible to reach the limit with a 32-bit address space.

    @zhangyangyu
    Copy link
    Member Author

    I can't totally agree the point. The code means every time we increase the buffer by half the current length. So when the length arrives 2/3 * PY_SSIZE_T_MAX it's going to overflow. There is a 1/3 * PY_SSIZE_T_MAX gap between the theoretical upper limit. I think leaving such a gap and exiting is somewhat too rude. So in such case I make it have a try.

    @vadmium
    Copy link
    Member

    vadmium commented Jul 17, 2016

    Yeah I see your point. Anyway I think the current patch is fine.

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Jul 17, 2016

    Only 1 way to find out, test it till it breaks. :P

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Jul 17, 2016

    Ah indeed, this is a bytearray and it is indeed possible to theoretically allocate PY_SSIZE_T_MAX bytes, if on an architecture that does segmented memory.

    As for

        if (addition > PY_SSIZE_T_MAX - len - 1) {

    it is very clear to *us* but it is not quite self-documenting on why to do it this way to someone who doesn't know undefined behaviours in C (hint: next to no one knows, judging from the amount of complaints that the GCC "bug" received), instead of say

    if (INT_ADD_OVERFLOW(len, addition))
    

    Where the INT_ADD_OVERFLOW would have a comment above explaining why it has to be done that way. But more discussion about it at https://bugs.python.org/issue1621

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 18, 2016

    New changeset 54cc0480904c by Martin Panter in branch '3.5':
    Issue bpo-27507: Check for integer overflow in bytearray.extend()
    https://hg.python.org/cpython/rev/54cc0480904c

    New changeset 6e166b66aa44 by Martin Panter in branch '2.7':
    Issue bpo-27507: Check for integer overflow in bytearray.extend()
    https://hg.python.org/cpython/rev/6e166b66aa44

    New changeset 646ad4894c32 by Martin Panter in branch 'default':
    Issue bpo-27507: Merge overflow check from 3.5
    https://hg.python.org/cpython/rev/646ad4894c32

    @vadmium
    Copy link
    Member

    vadmium commented Jul 18, 2016

    Committed without any macro for the time being

    @vadmium vadmium closed this as completed Jul 18, 2016
    @vadmium vadmium removed the invalid label Jul 18, 2016
    @zhangyangyu
    Copy link
    Member Author

    Thanks for your work.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants