Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1 #70610

Closed
DaveHibbitts mannequin opened this issue Feb 23, 2016 · 24 comments
Closed

Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1 #70610

DaveHibbitts mannequin opened this issue Feb 23, 2016 · 24 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows type-bug An unexpected behavior, bug, or error

Comments

@DaveHibbitts
Copy link
Mannequin

DaveHibbitts mannequin commented Feb 23, 2016

BPO 26423
Nosy @birkenfeld, @pfmoore, @mdickinson, @pitrou, @vstinner, @tjguk, @zware, @serhiy-storchaka, @eryksun, @zooba, @RazerM, @ZackerySpytz, @miss-islington, @iritkatriel
PRs
  • bpo-26423: Fix possible overflow in wrap_lenfunc() #13606
  • [3.7] bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606) #13622
  • [2.7] bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606) #13625
  • [2.7] bpo-26423: Fix test_descr.test_wrap_lenfunc_bad_cast() on 32-bit Windows #13629
  • Files
  • wrap_lenfunc.patch
  • wrap_lenfunc-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-10-18.09:17:15.471>
    created_at = <Date 2016-02-23.23:04:47.977>
    labels = ['interpreter-core', '3.8', 'type-bug', '3.7', 'OS-windows']
    title = 'Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1'
    updated_at = <Date 2020-10-18.09:17:15.470>
    user = 'https://bugs.python.org/DaveHibbitts'

    bugs.python.org fields:

    activity = <Date 2020-10-18.09:17:15.470>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-10-18.09:17:15.471>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core', 'Windows']
    creation = <Date 2016-02-23.23:04:47.977>
    creator = 'Dave Hibbitts'
    dependencies = []
    files = ['42017', '42020']
    hgrepos = []
    issue_num = 26423
    keywords = ['patch']
    message_count = 24.0
    messages = ['260749', '260770', '260773', '260774', '260775', '260776', '260777', '260793', '260799', '260809', '260811', '260812', '260813', '260814', '260815', '260817', '292122', '343774', '343777', '343792', '343793', '343795', '343800', '378773']
    nosy_count = 15.0
    nosy_names = ['georg.brandl', 'paul.moore', 'mark.dickinson', 'pitrou', 'vstinner', 'tim.golden', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'RazerM', 'Dave Hibbitts', 'ZackerySpytz', 'miss-islington', 'iritkatriel']
    pr_nums = ['13606', '13622', '13625', '13629']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26423'
    versions = ['Python 2.7', 'Python 3.7', 'Python 3.8']

    @DaveHibbitts
    Copy link
    Mannequin Author

    DaveHibbitts mannequin commented Feb 23, 2016

    __len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit. len() however returns an int for values less than sys.maxint and a long above that.

    Returning an int in __len__() causes it to return negative lengths for objects of size greater than sys.maxint, below you can see a quick test on how to reproduce it.

    And here's an explanation from \u\Rhomboid on Reddit of why we believe the issue happens.

    "You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows.

    The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary.

    This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():

    return PyLong_FromLong((long)res);
    

    That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t()."

    Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    Anaconda is brought to you by Continuum Analytics.
    Please check out: http://continuum.io/thanks and https://binstar.org
    >>> a = 'a'*2500000000
    >>> a.__len__()
    -1794967296
    >>> len(a)
    2500000000L
    >>> a = [1]*2500000000
    >>> len(a)
    2500000000L
    >>> a.__len__()
    -1794967296

    @DaveHibbitts DaveHibbitts mannequin added OS-windows type-bug An unexpected behavior, bug, or error labels Feb 23, 2016
    @mdickinson
    Copy link
    Member

    If I understand the report correctly, this affects Python 3, too. Adding 3.5 and 3.6 to the versions.

    @vstinner
    Copy link
    Member

    __len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit.

    Hum, I don't understand this statement. It looks like the code uses Py_ssize_t everywhere and Py_ssize_t is supposed to be able to store a whole pointer, so be 64-bit when Python is compiled in 64-bit mode.

    Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32

    This is a 32-bit build ("win32"), no? max.size is 2147483647 on 32-bit mode if I recall correctly. On 64-bit, it's 9223372036854775807. By the way, on 64-bit, sys.maxsize == sys.maxint.

    In Python 2:

    len(obj) => builtin_len() => PyObject_Size() which returns a Py_ssize_t

    For string, PyObject_Size() => string_length() => Py_SIZE(obj) => ((PyVarObject *)obj)->ob_size

    PyVarObject.ob_size has the type Py_ssize_t.

    builtin_len() gets a Py_ssize_t which is converted to a Python int or long with PyInt_FromSsize_t().

    PyInt_FromSsize_t() creates an int if the value fits into a C long, or it calls _PyLong_FromSsize_t().

    Difference in Python 3:

    builtin_len() also gets a Py_ssize_t, but it calls PyLong_FromSsize_t() (since Python short integers as gone, long became int in Python 3).

    string_length() is replaced with unicode_length() => PyUnicode_GET_LENGTH() => (PyASCIIObject *)obj)->length and PyASCIIObject.length type is Py_ssize_t.

    @mdickinson
    Copy link
    Member

    This is a 32-bit build ("win32"), no?

    No. "win32" appears for both 32-bit and 64-bit builds on Windows. This is a 64-bit build. sys.maxint is 231 - 1, and sys.maxsize is 263 - 1. They're not equal.

    I can reproduce the issue with a stock Python, on both 2.7 and 3.5.

    @mdickinson mdickinson added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 24, 2016
    @vstinner
    Copy link
    Member

    I tested Python 2.7 on Windows, downloaded from python.org:

    • 32-bit:

      • 'x'*(2**31+5) fails with OverflowError as expected
      • len('x' * (sys.maxsize // 2)) returns 2**30-1 as a Python int
      • for length larger with 'x' * (sys.maxsize // 2), I get a MemoryError, probably because of a integer overflow in the C code, or maybe because malloc() failed, I don't know
    • 64-bit:

      • len('x' * (2**32 + 5)) returns 2**32 + 5 (a positive number) as a Pyhon long
      • len('x' * (2**31 + 5)) returns 2**31 + 5 (a positive number) as a Pyhon long
      • len('x' * (2**31 - 5)) returns 2**31 - 5 (a positive number) as a Pyhon int

    Limits :

    • 32-bit: sys.maxint == 2**31-1, sys.maxsize == 2**31-1 == sys.maxint
    • 64-bit: sys.maxint == 2**31-1, sys.maxsize == 2**63-1

    It looks like the issue is specific to Python compiled by Continuum ("Anaconda" flavor)?

    @birkenfeld
    Copy link
    Member

    You need to call x.__len__() explicitly.

    @vstinner vstinner changed the title __len__() returns 32 bit int on windows leading to overflows Integer overflow in wrap_lenfunc() on 64-bit build of Windows Feb 24, 2016
    @vstinner
    Copy link
    Member

    You need to call x.__len__() explicitly.

    Ah ok :-) I didn't read carefully the initial message.

    Attached patch should fix the issue on Python 2.7. It uses the same function than builtin_len() to cast the C Py_ssize_t to a Python int or long.

    @vstinner vstinner changed the title Integer overflow in wrap_lenfunc() on 64-bit build of Windows Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1 Feb 24, 2016
    @serhiy-storchaka
    Copy link
    Member

    Could you write a test? May be on 3.x you can use range(sys.maxsize).__len__().

    @vstinner
    Copy link
    Member

    Could you write a test?

    Done.

    @serhiy-storchaka
    Copy link
    Member

    Using a bigmem test for this issue looks excessive. Could we use the xrange() object for testing? Or weakref?

    @vstinner
    Copy link
    Member

    Could we use the xrange() object for testing?

    xrange() constructor is implemented in C and looks to use C long for parameters. This issue is specific to 64-bit Windows with 32-bit C long, so xrange() doesn't seem to work here.

    Or weakref?

    What do you mean? Which object? weakref.ref() has no length.

    Using a bigmem test for this issue looks excessive.

    Well, I was too lazy to develop a C extension (add something to _testcapi) just for this tiny bug.

    I also like tests really testing the final feature (the bug reported in this issue).

    I failed to find an existing Python object with __len__() implemented in C which doesn't need to allocate 2^31 bytes to test this bug.

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 24, 2016

    mmap.mmap(-1, 2**31 + 5) could be used here. If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages.

    Unpatched:

        >>> mmap.mmap(-1, 2**31 + 5).__len__()
        -2147483643

    Patched:

        >>> mmap.mmap(-1, 2**31 + 5).__len__()
        2147483653L

    @vstinner
    Copy link
    Member

    mmap.mmap(-1, 2**31 + 5) could be used here.

    Yeah, I hesitated to use mmap, but I'm not really a big fan of using directly a mmap object in test_descr. Unit tests are supposed to only test one thing.

    If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages.

    It depends on the Kernel configuration. On Solaris, overcommit is disable by default.

    I prefer to use a regular Python str. Or maybe use a custom Python object for which we can control the output of __len__()?

    @serhiy-storchaka
    Copy link
    Member

    xrange() constructor is implemented in C and looks to use C long for
    parameters. This issue is specific to 64-bit Windows with 32-bit C long, so
    xrange() doesn't seem to work here.

    What about xrange(-1, 2**31-1)? In any case the fact that xrange works not
    with Py_ssize_t, but with long, looks as a bug. I'll open a separate issue for
    this.

    > Or weakref?

    What do you mean? Which object? weakref.ref() has no length.

    I see that the sq_length slot in the weakproxy type is set to proxy_length.

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 24, 2016

    the sq_length slot in the weakproxy type is set to proxy_length.

    Nice. Its tp_getattro gets in the way of using __len__ directly, but this can be side stepped by manually binding the descriptor:

        class Test(object):
            def __len__(self):
                return 2**31 + 5
        >>> t = Test()
        >>> p = weakref.proxy(t)
        >>> p.__len__()
        2147483653L
        >>> type(p).__len__.__get__(p)()
        -2147483643

    @serhiy-storchaka
    Copy link
    Member

    With fixed bpo-26428 you could use xrange() for testing this issue.

    @serhiy-storchaka
    Copy link
    Member

    Could you create a PR for your patch Victor?

    @vstinner
    Copy link
    Member

    New changeset 05f1641 by Victor Stinner (Zackery Spytz) in branch 'master':
    bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606)
    05f1641

    @ZackerySpytz ZackerySpytz mannequin added 3.7 (EOL) end of life 3.8 only security fixes labels May 28, 2019
    @miss-islington
    Copy link
    Contributor

    New changeset e7ddf58 by Miss Islington (bot) in branch '3.7':
    bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606)
    e7ddf58

    @vstinner
    Copy link
    Member

    New changeset 80dfe99 by Victor Stinner in branch '2.7':
    bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606) (GH-13625)
    80dfe99

    @vstinner
    Copy link
    Member

    Thanks Zackery Spytz for the fix. It's now applied to 2.7, 3.7 and master branches.

    @vstinner
    Copy link
    Member

    Oh. The test fails on Python 2.7 on Windows:

    https://buildbot.python.org/all/#/builders/26/builds/287

    ERROR: test_wrap_lenfunc_bad_cast (test.test_descr.OperatorsTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\buildbot.python.org\2.7.kloth-win64vs9\build\lib\test\test_descr.py", line 407, in test_wrap_lenfunc_bad_cast
        self.assertEqual(xrange(sys.maxsize).__len__(), sys.maxsize)
    OverflowError: Python int too large to convert to C long

    @vstinner vstinner reopened this May 28, 2019
    @vstinner
    Copy link
    Member

    New changeset aaed2c3 by Victor Stinner in branch '2.7':
    bpo-26423: Fix test_descr.test_wrap_lenfunc_bad_cast() on 32-bit Windows (GH-13629)
    aaed2c3

    @iritkatriel
    Copy link
    Member

    Can this be closed?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants