classification
Title: bisect module: Overflow at index computation
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Voo, akira, amaury.forgeotdarc, benjamin.peterson, jcea, mark.dickinson, pitrou, python-dev, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2011-11-28 22:34 by Voo, last changed 2012-04-15 15:44 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
issue13496.patch mark.dickinson, 2012-01-28 10:03 review
issue13496_v2.patch mark.dickinson, 2012-03-10 15:29 review
Messages (18)
msg148517 - (view) Author: Daniel Sturm (Voo) Date: 2011-11-28 22:34
The mid index computation in _bisectmodule.c in both internal_bisect_right and internal_bisect_left is done with:

mid = (lo + hi) / 2; // all three variables Py_ssize_t

which is  susceptible to overflows for large arrays, which would lead to undefined behavior (and in practice almost certainly a crash with a negative index)

The fix is trivial - mid = lo + (hi - lo) / 2; - but since I'm just starting to look into the code base I may be missing some undocumented assertions that guarantee this can't happen.
msg148520 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-28 23:04
This looks like a reasonable suggestion.
msg148524 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2011-11-28 23:45
FWIW, I doubt there's a real issue here.  Objects in Python consume a lot more than a byte or two of memory, so the index range of a Python list is generally a lot less than ssize_t allows for.  In other words, quantify "large" in "large arrays".  How large can a Python list actually be, relative to ssize_t?  Similar reasoning accounts for why we never worry about overflow when mucking with refcounts:  the size of a refcount member  exceeds the maximum number of references that could exist.
msg148531 - (view) Author: Daniel Sturm (Voo) Date: 2011-11-29 00:25
TBH I saw this more as an opportunity to get used to the whole system, how to create a patch, etc. :) Should've made it clearer at the start that this is unlikely to ever be a problem, sorry (didn't see a way to set priority to low myself).


If my math isn't completely off, the highest possible value here is 
hi + (hi - 1) so this only occurs if hi > (MAX_SSIZE_T + 1) / 2. So this would only work out if we had such an array containing only a handful different objects. And then that's me assuming that a list is actually a thin wrapper around an array of PyObject*.
msg148567 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-11-29 14:03
Given that we typically need at least 4 bytes just for the PyObject * pointer for each item in a list, I guess real lists are safe.

But how about list-like objects, implementing __len__ and __getitem__?  The following appears to run forever on my machine:



class SquaresList(object):
    def __init__(self, length):
        self._length = length

    def __len__(self):
        return self._length

    def __getitem__(self, index):
        if not 0 <= index <= self._length:
            raise IndexError
        return index**2


import bisect, sys

squareslist = SquaresList(sys.maxsize)
print bisect.bisect(squareslist, (sys.maxsize - 3)**2)
msg148569 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-29 14:05
Very good point, Mark.
msg148574 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-29 14:56
[x]range is enough to trigger the bug::
   bisect.bisect(range(sys.maxsize), sys.maxsize-3)
msg148607 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-29 20:55
The fix is trivial.  I'll do it soonish.
msg149239 - (view) Author: Akira Li (akira) * Date: 2011-12-11 19:10
Related bug in Java: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Do not consider any change as trivial: http://www.solipsys.co.uk/new/BinarySearchReconsidered.html (the author ran "binary search" coding challenge about 10 years ago http://www.solipsys.co.uk/b_search/ )
msg152152 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-01-28 10:03
Here's a patch.
msg152392 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-01-31 16:01
I think the patch is *NOT* correct. Does it work with http://bugs.python.org/msg148567, in a 32 bits python?
msg152400 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-01-31 17:50
> I think the patch is *NOT* correct.

Can you elaborate?  It works fine for that example on a 64-bit build;  not sure why 32-bit would be any different.

The idea is just that the single cast forces the addition to be done as an addition of integers of type size_t.  Since the integers being added are (a) nonnegative and (b) both fit into a Py_ssize_t, the sum fits into a size_t without overflow, and the result of dividing the sum by 2 fits into a size_t.

(This is all assuming that 2*Py_SSIZE_T_MAX fits into a size_t, but that's a fairly safe assumption in practice.)
msg152401 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-01-31 17:59
> Does it work with http://bugs.python.org/msg148567, in a 32 bits python?

Yes.
msg152402 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-31 18:11
Perhaps adding a comment before those two lines would make the reason for the cast clearer.
msg155318 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-03-10 15:29
Updated patch, with comment.
msg155328 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-10 17:53
Thanks for the updated patch.
msg155436 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-03-12 03:44
LGTM
msg158340 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-15 15:43
New changeset 35a3a7e0d66d by Mark Dickinson in branch '3.2':
Issue 13496: Fix bisect.bisect overflow bug for large collections.
http://hg.python.org/cpython/rev/35a3a7e0d66d

New changeset 1a9252280f07 by Mark Dickinson in branch 'default':
Issue #13496: Merge from 3.2
http://hg.python.org/cpython/rev/1a9252280f07

New changeset 709af2d0e862 by Mark Dickinson in branch '2.7':
Issue 13496: Fix bisect.bisect overflow bug for large collections.
http://hg.python.org/cpython/rev/709af2d0e862
History
Date User Action Args
2012-04-15 15:44:15mark.dickinsonsetstatus: open -> closed
resolution: fixed
2012-04-15 15:43:46python-devsetnosy: + python-dev
messages: + msg158340
2012-03-12 03:44:06benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg155436
2012-03-10 17:53:52rhettingersetmessages: + msg155328
2012-03-10 15:29:29mark.dickinsonsetfiles: + issue13496_v2.patch

messages: + msg155318
2012-01-31 18:11:49pitrousetmessages: + msg152402
2012-01-31 17:59:14mark.dickinsonsetmessages: + msg152401
2012-01-31 17:50:57mark.dickinsonsetmessages: + msg152400
2012-01-31 16:01:07jceasetmessages: + msg152392
2012-01-28 20:26:50mark.dickinsonsetstage: patch review
2012-01-28 10:03:49mark.dickinsonsetfiles: + issue13496.patch
keywords: + patch
messages: + msg152152
2011-12-11 19:10:15akirasetnosy: + akira
messages: + msg149239
2011-12-02 18:51:07jceasetnosy: + jcea
2011-11-29 20:55:07rhettingersetpriority: low -> normal

messages: + msg148607
2011-11-29 14:56:21amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg148574
2011-11-29 14:05:44pitrousetnosy: + pitrou
messages: + msg148569
2011-11-29 14:03:40mark.dickinsonsetmessages: + msg148567
2011-11-29 00:25:36Voosetmessages: + msg148531
2011-11-28 23:45:41tim.peterssetnosy: + tim.peters
messages: + msg148524
2011-11-28 23:04:30rhettingersetpriority: normal -> low
assignee: rhettinger
messages: + msg148520
2011-11-28 22:37:01pitrousetnosy: + rhettinger, mark.dickinson

versions: + Python 2.7, Python 3.2, Python 3.3, - Python 3.4
2011-11-28 22:34:31Voocreate