Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integer overflow in iterator object #67128

Closed
httpsgithubcomhakril mannequin opened this issue Nov 25, 2014 · 24 comments
Closed

integer overflow in iterator object #67128

httpsgithubcomhakril mannequin opened this issue Nov 25, 2014 · 24 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@httpsgithubcomhakril
Copy link
Mannequin

httpsgithubcomhakril mannequin commented Nov 25, 2014

BPO 22939
Nosy @rhettinger, @ronaldoussoren, @mdickinson, @vstinner, @bitdancer, @serhiy-storchaka, @https://github.com/hakril
Files
  • issue22939.patch: First try for the patch
  • issue22939v2.patch: Second try for the patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-05-21.18:00:05.944>
    created_at = <Date 2014-11-25.10:56:33.086>
    labels = ['interpreter-core', 'type-bug']
    title = 'integer overflow in iterator object'
    updated_at = <Date 2015-05-21.18:00:05.943>
    user = 'https://github.com/httpsgithubcomhakril'

    bugs.python.org fields:

    activity = <Date 2015-05-21.18:00:05.943>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-05-21.18:00:05.944>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2014-11-25.10:56:33.086>
    creator = 'hakril'
    dependencies = []
    files = ['37375', '37382']
    hgrepos = []
    issue_num = 22939
    keywords = ['patch']
    message_count = 24.0
    messages = ['231651', '231659', '231660', '231664', '231669', '231676', '232250', '232267', '232269', '232282', '232284', '232285', '232362', '232365', '232367', '232368', '232385', '233021', '233203', '234233', '242960', '243568', '243769', '243771']
    nosy_count = 8.0
    nosy_names = ['rhettinger', 'ronaldoussoren', 'mark.dickinson', 'vstinner', 'r.david.murray', 'python-dev', 'serhiy.storchaka', 'hakril']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22939'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @httpsgithubcomhakril
    Copy link
    Mannequin Author

    httpsgithubcomhakril mannequin commented Nov 25, 2014

    I found an interger overflow in the standard iterator object (Objects/iterobject.c)

    The it_index attribute used by the iterator is a Py_ssize_t but overflow is never checked. So after the index PY_SSIZE_T_MAX, the iterator object will ask for the index PY_SSIZE_T_MIN.

    Here is an example:

        import sys
    
        class Seq:
            def __getitem__(self, item):
                print("[-] Asked for item at <{0}>".format(item))
                return 0
    
        s = Seq()
        i = iter(s)
        # Manually set `it_index` to PY_SSIZE_T_MAX without a loop
        i.__setstate__(sys.maxsize)
    
        next(i)
        [-] Asked for item at <9223372036854775807>
        next(i)
        [-] Asked for item at <-9223372036854775808>

    I would be really interested in writing a patch but first I wanted to discuss the expected behaviour and fix.

    The iterator could stop after PY_SSIZE_T_MAX but it seems strange as other iterator (like enumobject) handle values bigger than PY_SSIZE_T_MAX.

    Or the same technique used in enumobject could be used: adding a field PyObject* en_longindex (a python long) which would be used for values bigger than PY_SSIZE_T_MAX

    @httpsgithubcomhakril httpsgithubcomhakril mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 25, 2014
    @bitdancer
    Copy link
    Member

    For possibly relevant background information, see bpo-21444 and the issues linked from it, and bpo-14794.

    @vstinner
    Copy link
    Member

    The it_index attribute used by the iterator is a Py_ssize_t but overflow is never checked.

    Yes it is a bug. iter_iternext() must raises an OverflowError if it->it_index is equal to PY_SSIZE_T_MAX.

    @serhiy-storchaka
    Copy link
    Member

    I think OverflowError is good for maintained releases, but for 3.5 Clement's idea with long index looks attractive to me. In any case an exception should be raised for negative argument in __setstate__(). Let split this issue on two parts. First fix the bug by raising exceptions, and then add long index (if anyone will need it).

    @vstinner
    Copy link
    Member

    I think OverflowError is good for maintained releases, but for 3.5 Clement's idea with long index looks attractive to me.

    What is your use case?

    @serhiy-storchaka
    Copy link
    Member

    What is your use case?

    Something like range(). I agree that it is very unlike to encounter this
    problem in real work, and we can live with implementation-related limitation
    (for which OverflowError exists).

    @httpsgithubcomhakril
    Copy link
    Mannequin Author

    httpsgithubcomhakril mannequin commented Dec 6, 2014

    Here is a first try for a patch.

    There are two points I am not sure about:

    1. The message for the OverflowError: is that explicit enough ?

    2. The behaviour of the iterator after the raise of OverflowError.
      With this patch every call to next(it) where it have overflowed will raise OverflowError again.
      Does this behaviour seems correct our should it raise StopIteration after the first OverflowError ?

    @vstinner
    Copy link
    Member

    vstinner commented Dec 7, 2014

    You should not rely on undefined behaviour: check if the index is greater
    or equal than PY_SSIZET_MAX - 1. __setstate__ must implement the same check.

    You must always raise an error, not hide it in a second call to next (raise
    StopIteration). Your unit test must check this behaviour.

    @serhiy-storchaka
    Copy link
    Member

    check if the index is greater or equal than PY_SSIZET_MAX - 1.

    Just PY_SSIZET_MAX.

    Added other comments on Rietveld (look the email in the Spam folder).

    @httpsgithubcomhakril
    Copy link
    Mannequin Author

    httpsgithubcomhakril mannequin commented Dec 7, 2014

    Thanks for the comments.
    I corrected the patch and updated the test. I also added a test that check the behavior of setstate with negative indice.

    Just one question:

    __setstate__ must implement the same check.

    Why should __setstate__ check for PY_SSIZE_T_MAX (throwing OverflowError when unpickling) if the check will be done when calling next on the resulting iterator ?

    @serhiy-storchaka
    Copy link
    Member

    __setstate__ should check that an index is not negative. All values from 0 to PY_SSIZE_T_MAX are acceptable.

    @serhiy-storchaka
    Copy link
    Member

    Ah, __setstate__ already handles negative indices. Then the patch LGTM.

    @rhettinger
    Copy link
    Contributor

    I would think that the PY_SSIZE_T_MAX check belongs inside the:

        if (result != NULL) {
            it->it_index++;
            return result;
        }

    just before the increment which could cause the overflow. Also, PY_SSIZE_T_MAX is a valid value to pass to PySequence_GetItem(), so it shouldn't be blocked unless necessary.

    @serhiy-storchaka
    Copy link
    Member

    This doesn't matter because next() will raise an exception if it_index == PY_SSIZE_T_MAX in any case. The code should be changed much more to allow yielding an item with index PY_SSIZE_T_MAX, use other (negative) signal value and change the behavior of __setstate__ for negative values.

    @httpsgithubcomhakril
    Copy link
    Mannequin Author

    httpsgithubcomhakril mannequin commented Dec 9, 2014

    Also, PY_SSIZE_T_MAX is a valid value to pass to PySequence_GetItem(), so it shouldn't be blocked unless necessary.

    I agree with you, that's why my first path was checking at the next call if it->it_index had overflowed. But then it relies on undefined behaviour.

    I would think that the PY_SSIZE_T_MAX check belongs inside the:

    if (result != NULL) {
        it-\>it_index++;
        return result;
    }
    

    If we raise the OverflowError when it->it_index really overflow (just after the getitem PY_SSIZE_T_MAX).
    Is it really necessary to do the overflow check after the GetItem ? because the value returned by PySequence_GetItem(seq, PY_SSIZE_T_MAX); will be never used.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2014

    I prefer to raise an OverflowError *before* calling
    PySequence_GetItem(). The call to PySequence_GetItem() may be
    expensive, and we have to drop the result if an OverflowError is
    raised after the call. At the end, the behaviour is the same: an
    OverflowError is raised.

    @rhettinger
    Copy link
    Contributor

    The call to PySequence_GetItem() may be expensive, and we have to drop
    the result if an OverflowError is raised after the call.

    You do realize that this error will be very rare and therefore inconsequential.

    @httpsgithubcomhakril
    Copy link
    Mannequin Author

    httpsgithubcomhakril mannequin commented Dec 22, 2014

    > The call to PySequence_GetItem() may be expensive, and we have to drop
    > the result if an OverflowError is raised after the call.

    You do realize that this error will be very rare and therefore inconsequential.

    The real question is: why would you call the iterator for a new value if it will be discarded anyway ? I think it could be very misleading to see _getitem__ being called and have an OverflowError being raised afterward.

    @ronaldoussoren
    Copy link
    Contributor

    Is it necessary to raise when it_index is PY_SSIZE_T_MAX?

    An alternative is to set it_index to -1 when there would be overflow and raise an exception on the next call to next(). That way a virtual sequence with PY_SSIZE_T_MAX-1 items would still work (instead of failing unexpectedly).

    @serhiy-storchaka
    Copy link
    Member

    That way a virtual sequence with PY_SSIZE_T_MAX-1 items would still work (instead of failing unexpectedly).

    Actually with PY_SSIZE_T_MAX+1 items (indices from 0 to PY_SSIZE_T_MAX inclusive).

    If Raymond insists I can write more complicated patch, but I don't think that we should complicate the code for this pretty hypotetical case. I'm for committing issue22939v2.patch.

    @httpsgithubcomhakril
    Copy link
    Mannequin Author

    httpsgithubcomhakril mannequin commented May 12, 2015

    After a few months, I am back to you on this issue.
    What should be the next step of the process ?

    @serhiy-storchaka
    Copy link
    Member

    If Raymond will not stop me, I'll commit the patch tomorrow.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 19, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 21, 2015

    New changeset d6179accca20 by Serhiy Storchaka in branch '2.7':
    Fixed issue number for issue bpo-22939.
    https://hg.python.org/cpython/rev/d6179accca20

    New changeset 7fa2f4afcf5a by Serhiy Storchaka in branch '3.4':
    Fixed issue number for issue bpo-22939.
    https://hg.python.org/cpython/rev/7fa2f4afcf5a

    New changeset f23533fa6afa by Serhiy Storchaka in branch 'default':
    Fixed issue number for issue bpo-22939.
    https://hg.python.org/cpython/rev/f23533fa6afa

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Clement.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants