classification
Title: integer overflow in iterator object
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: hakril, haypo, mark.dickinson, python-dev, r.david.murray, rhettinger, ronaldoussoren, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-11-25 10:56 by hakril, last changed 2015-05-21 18:00 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
issue22939.patch hakril, 2014-12-06 20:08 First try for the patch review
issue22939v2.patch hakril, 2014-12-07 21:26 Second try for the patch review
Messages (24)
msg231651 - (view) Author: Clement Rouault (hakril) * Date: 2014-11-25 10:56
I found an interger overflow in the standard iterator object (Objects/iterobject.c)

The `it_index` attribute used by the iterator is a `Py_ssize_t` but overflow is never checked. So after the index `PY_SSIZE_T_MAX`, the iterator object will ask for the index `PY_SSIZE_T_MIN`.

Here is an example:

    import sys

    class Seq:
        def __getitem__(self, item):
            print("[-] Asked for item at <{0}>".format(item))
            return 0

    s = Seq()
    i = iter(s)
    # Manually set `it_index` to PY_SSIZE_T_MAX without a loop
    i.__setstate__(sys.maxsize)

    next(i)
    [-] Asked for item at <9223372036854775807>
    next(i)
    [-] Asked for item at <-9223372036854775808>


I would be really interested in writing a patch but first I wanted to discuss the expected behaviour and fix.

The iterator could stop after `PY_SSIZE_T_MAX` but it seems strange as other iterator (like `enumobject`) handle values bigger than `PY_SSIZE_T_MAX`.

Or the same technique used in `enumobject` could be used: adding a field `PyObject* en_longindex` (a python long) which would be used for values bigger than `PY_SSIZE_T_MAX`
msg231659 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-25 13:51
For possibly relevant background information, see issue 21444 and the issues linked from it, and issue 14794.
msg231660 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-11-25 14:23
> The `it_index` attribute used by the iterator is a `Py_ssize_t` but overflow is never checked.

Yes it is a bug. iter_iternext() must raises an OverflowError if it->it_index is equal to PY_SSIZE_T_MAX.
msg231664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 15:22
I think OverflowError is good for maintained releases, but for 3.5 Clement's idea with long index looks attractive to me. In any case an exception should be raised for negative argument in __setstate__(). Let split this issue on two parts. First fix the bug by raising exceptions, and then add long index (if anyone will need it).
msg231669 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-11-25 16:40
> I think OverflowError is good for maintained releases, but for 3.5 Clement's idea with long index looks attractive to me.

What is your use case?
msg231676 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 18:21
> What is your use case?

Something like range(). I agree that it is very unlike to encounter this 
problem in real work, and we can live with implementation-related limitation 
(for which OverflowError exists).
msg232250 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-06 20:08
Here is a first try for a patch.

There are two points I am not sure about:

1) The message for the OverflowError: is that explicit enough ?

2) The behaviour of the iterator after the raise of OverflowError. 
With this patch every call to `next(it)` where `it` have overflowed will raise `OverflowError` again.
Does this behaviour seems correct our should it raise StopIteration after the first OverflowError ?
msg232267 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-12-07 08:44
You should not rely on undefined behaviour: check if the index is greater
or equal than PY_SSIZET_MAX - 1. __setstate__ must implement the same check.

You must always raise an error, not hide it in a second call to next (raise
StopIteration). Your unit test must check this behaviour.
msg232269 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-07 09:20
> check if the index is greater or equal than PY_SSIZET_MAX - 1.

Just PY_SSIZET_MAX.

Added other comments on Rietveld (look the email in the Spam folder).
msg232282 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-07 21:26
Thanks for the comments.
I corrected the patch and updated the test. I also added a test that check the behavior of setstate with negative indice.

Just one question:

>  __setstate__ must implement the same check.

Why should __setstate__ check for PY_SSIZE_T_MAX (throwing OverflowError when unpickling) if the check will be done when calling next on the resulting iterator ?
msg232284 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-07 21:31
__setstate__ should check that an index is not negative. All values from 0 to PY_SSIZE_T_MAX are acceptable.
msg232285 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-07 21:39
Ah, __setstate__ already handles negative indices. Then the patch LGTM.
msg232362 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-09 10:32
I would think that the PY_SSIZE_T_MAX check belongs inside the:

    if (result != NULL) {
        it->it_index++;
        return result;
    }

just before the increment which could cause the overflow.  Also, PY_SSIZE_T_MAX is a valid value to pass to PySequence_GetItem(), so it shouldn't be blocked unless necessary.
msg232365 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-09 10:41
This doesn't matter because next() will raise an exception if it_index == PY_SSIZE_T_MAX in any case. The code should be changed much more to allow yielding an item with index PY_SSIZE_T_MAX, use other (negative) signal value and change the behavior of __setstate__ for negative values.
msg232367 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-09 10:46
> Also, PY_SSIZE_T_MAX is a valid value to pass to PySequence_GetItem(), so it shouldn't be blocked unless necessary.

I agree with you, that's why my first path was checking at the next call if it->it_index had overflowed. But then it relies on undefined behaviour.

> I would think that the PY_SSIZE_T_MAX check belongs inside the:
> 
>     if (result != NULL) {
>         it->it_index++;
>         return result;
>     }

If we raise the OverflowError when it->it_index really overflow (just after the getitem PY_SSIZE_T_MAX).
Is it really necessary to do the overflow check after the GetItem ? because the value returned by `PySequence_GetItem(seq, PY_SSIZE_T_MAX);` will be never used.
msg232368 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-12-09 11:22
I prefer to raise an OverflowError *before* calling
PySequence_GetItem(). The call to PySequence_GetItem() may be
expensive, and we have to drop the result if an OverflowError is
raised after the call. At the end, the behaviour is the same: an
OverflowError is raised.
msg232385 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-09 16:03
> The call to PySequence_GetItem() may be expensive, and we have to drop
> the result if an OverflowError is raised after the call.

You do realize that this error will be very rare and therefore inconsequential.
msg233021 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-22 15:36
> > The call to PySequence_GetItem() may be expensive, and we have to drop
> > the result if an OverflowError is raised after the call.

> You do realize that this error will be very rare and therefore inconsequential.

The real question is: why would you call the iterator for a new value if it will be discarded anyway ? I think it could be very misleading to see  _getitem__ being called and have an OverflowError being raised afterward.
msg233203 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-12-30 11:50
Is it necessary to raise when it_index is PY_SSIZE_T_MAX? 

An alternative is to set it_index to -1 when there would be overflow and raise an exception on the next call to next(). That way a virtual sequence with PY_SSIZE_T_MAX-1 items would still work (instead of failing unexpectedly).
msg234233 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-18 09:56
> That way a virtual sequence with PY_SSIZE_T_MAX-1 items would still work (instead of failing unexpectedly).

Actually with PY_SSIZE_T_MAX+1 items (indices from 0 to PY_SSIZE_T_MAX inclusive).

If Raymond insists I can write more complicated patch, but I don't think that we should complicate the code for this pretty hypotetical case. I'm for committing issue22939v2.patch.
msg242960 - (view) Author: Clement Rouault (hakril) * Date: 2015-05-12 11:02
After a few months, I am back to you on this issue.
What should be the next step of the process ?
msg243568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-19 09:17
If Raymond will not stop me, I'll commit the patch tomorrow.
msg243769 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-21 17:55
New changeset d6179accca20 by Serhiy Storchaka in branch '2.7':
Fixed issue number for issue #22939.
https://hg.python.org/cpython/rev/d6179accca20

New changeset 7fa2f4afcf5a by Serhiy Storchaka in branch '3.4':
Fixed issue number for issue #22939.
https://hg.python.org/cpython/rev/7fa2f4afcf5a

New changeset f23533fa6afa by Serhiy Storchaka in branch 'default':
Fixed issue number for issue #22939.
https://hg.python.org/cpython/rev/f23533fa6afa
msg243771 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-21 18:00
Thank you for your contribution Clement.
History
Date User Action Args
2015-05-21 18:00:05serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg243771

stage: commit review -> resolved
2015-05-21 17:55:28python-devsetnosy: + python-dev
messages: + msg243769
2015-05-19 09:17:32serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg243568
2015-05-16 15:28:30serhiy.storchakasetversions: + Python 2.7, Python 3.4
2015-05-12 11:02:38hakrilsetstatus: pending -> open

messages: + msg242960
2015-01-18 09:56:24serhiy.storchakasetstatus: open -> pending

messages: + msg234233
2014-12-30 11:50:39ronaldoussorensetnosy: + ronaldoussoren
messages: + msg233203
2014-12-22 15:36:13hakrilsetmessages: + msg233021
2014-12-09 16:03:24rhettingersetmessages: + msg232385
2014-12-09 11:22:02hayposetmessages: + msg232368
2014-12-09 10:46:39hakrilsetmessages: + msg232367
2014-12-09 10:41:19serhiy.storchakasetmessages: + msg232365
2014-12-09 10:32:32rhettingersetmessages: + msg232362
2014-12-07 21:39:01serhiy.storchakasetmessages: + msg232285
stage: needs patch -> commit review
2014-12-07 21:31:54serhiy.storchakasetmessages: + msg232284
2014-12-07 21:26:59hakrilsetfiles: + issue22939v2.patch

messages: + msg232282
2014-12-07 09:20:21serhiy.storchakasetmessages: + msg232269
2014-12-07 08:44:12hayposetmessages: + msg232267
2014-12-06 20:08:16hakrilsetfiles: + issue22939.patch
keywords: + patch
messages: + msg232250
2014-11-25 18:21:29serhiy.storchakasetmessages: + msg231676
2014-11-25 16:40:32hayposetmessages: + msg231669
2014-11-25 15:22:08serhiy.storchakasetnosy: + mark.dickinson
messages: + msg231664
2014-11-25 14:23:33hayposetnosy: + haypo
messages: + msg231660
2014-11-25 13:51:23r.david.murraysetnosy: + r.david.murray
messages: + msg231659
2014-11-25 13:14:40serhiy.storchakasetnosy: + rhettinger, serhiy.storchaka

stage: needs patch
2014-11-25 10:56:33hakrilcreate