classification
Title: Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1
Type: behavior Stage: patch review
Components: Interpreter Core, Windows Versions: Python 3.8, Python 3.7, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dave Hibbitts, RazerM, ZackerySpytz, eryksun, georg.brandl, mark.dickinson, miss-islington, paul.moore, pitrou, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority: high Keywords: patch

Created on 2016-02-23 23:04 by Dave Hibbitts, last changed 2019-05-28 16:15 by vstinner.

Files
File name Uploaded Description Edit
wrap_lenfunc.patch vstinner, 2016-02-24 09:24 review
wrap_lenfunc-2.patch vstinner, 2016-02-24 13:47 review
Pull Requests
URL Status Linked Edit
PR 13606 merged ZackerySpytz, 2019-05-28 00:01
PR 13622 merged miss-islington, 2019-05-28 12:55
PR 13625 merged vstinner, 2019-05-28 13:58
PR 13629 merged vstinner, 2019-05-28 16:02
Messages (23)
msg260749 - (view) Author: Dave Hibbitts (Dave Hibbitts) Date: 2016-02-23 23:04
__len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit. len() however returns an int for values less than sys.maxint and a long above that. 

Returning an int in __len__() causes it to return negative lengths for objects of size greater than sys.maxint, below you can see a quick test on how to reproduce it.

And here's an explanation from \u\Rhomboid on Reddit of why we believe the issue happens.

"You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows.

The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary.

This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():

    return PyLong_FromLong((long)res);

That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t()."

Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> a = 'a'*2500000000
>>> a.__len__()
-1794967296
>>> len(a)
2500000000L
>>> a = [1]*2500000000
>>> len(a)
2500000000L
>>> a.__len__()
-1794967296
msg260770 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-02-24 08:51
If I understand the report correctly, this affects Python 3, too. Adding 3.5 and 3.6 to the versions.
msg260773 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-24 09:04
> __len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit.

Hum, I don't understand this statement. It looks like the code uses Py_ssize_t everywhere and Py_ssize_t is supposed to be able to store a whole pointer, so be 64-bit when Python is compiled in 64-bit mode.

> Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32

This is a 32-bit build ("win32"), no? max.size is 2147483647 on 32-bit mode if I recall correctly. On 64-bit, it's 9223372036854775807. By the way, on 64-bit, sys.maxsize == sys.maxint.


In Python 2:

len(obj) => builtin_len() => PyObject_Size() which returns a Py_ssize_t

For string, PyObject_Size() => string_length() => Py_SIZE(obj) => ((PyVarObject *)obj)->ob_size

PyVarObject.ob_size has the type Py_ssize_t.

builtin_len() gets a Py_ssize_t which is converted to a Python int or long with PyInt_FromSsize_t().

PyInt_FromSsize_t() creates an int if the value fits into a C long, or it calls _PyLong_FromSsize_t().


Difference in Python 3:

builtin_len() also gets a Py_ssize_t, but it calls PyLong_FromSsize_t() (since Python short integers as gone, long became int in Python 3).

string_length() is replaced with unicode_length() => PyUnicode_GET_LENGTH() => (PyASCIIObject *)obj)->length and PyASCIIObject.length type is Py_ssize_t.
msg260774 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-02-24 09:16
> This is a 32-bit build ("win32"), no?

No. "win32" appears for both 32-bit and 64-bit builds on Windows. This is a 64-bit build. `sys.maxint` is 2**31 - 1, and `sys.maxsize` is 2**63 - 1. They're not equal.

I can reproduce the issue with a stock Python, on both 2.7 and 3.5.
msg260775 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-24 09:21
I tested Python 2.7 on Windows, downloaded from python.org:

* 32-bit:

  - 'x'*(2**31+5) fails with OverflowError as expected
  - len('x' * (sys.maxsize // 2)) returns 2**30-1 as a Python int
  - for length larger with 'x' * (sys.maxsize // 2), I get a MemoryError, probably because of a integer overflow in the C code, or maybe because malloc() failed, I don't know

* 64-bit:

  - len('x' * (2**32 + 5)) returns 2**32 + 5 (a positive number) as a Pyhon long
  - len('x' * (2**31 + 5)) returns 2**31 + 5 (a positive number) as a Pyhon long
  - len('x' * (2**31 - 5)) returns 2**31 - 5 (a positive number) as a Pyhon int

Limits :

* 32-bit: sys.maxint == 2**31-1, sys.maxsize == 2**31-1 == sys.maxint
* 64-bit: sys.maxint == 2**31-1, sys.maxsize == 2**63-1


It looks like the issue is specific to Python compiled by Continuum ("Anaconda" flavor)?
msg260776 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-24 09:22
You need to call `x.__len__()` explicitly.
msg260777 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-24 09:24
> You need to call `x.__len__()` explicitly.

Ah ok :-) I didn't read carefully the initial message.

Attached patch should fix the issue on Python 2.7. It uses the same function than builtin_len() to cast the C Py_ssize_t to a Python int or long.
msg260793 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 12:03
Could you write a test? May be on 3.x you can use range(sys.maxsize).__len__().
msg260799 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-24 13:47
> Could you write a test?

Done.
msg260809 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 14:54
Using a bigmem test for this issue looks excessive. Could we use the xrange() object for testing? Or weakref?
msg260811 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-24 14:59
> Could we use the xrange() object for testing?

xrange() constructor is implemented in C and looks to use C long for parameters. This issue is specific to 64-bit Windows with 32-bit C long, so xrange() doesn't seem to work here.


> Or weakref?

What do you mean? Which object? weakref.ref() has no length.


> Using a bigmem test for this issue looks excessive.

Well, I was too lazy to develop a C extension (add something to _testcapi) just for this tiny bug.

I also like tests really testing the final feature (the bug reported in this issue).

I failed to find an existing Python object with __len__() implemented in C which doesn't need to allocate 2^31 bytes to test this bug.
msg260812 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-02-24 15:16
mmap.mmap(-1, 2**31 + 5) could be used here. If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages.

Unpatched:

    >>> mmap.mmap(-1, 2**31 + 5).__len__()
    -2147483643

Patched:

    >>> mmap.mmap(-1, 2**31 + 5).__len__()
    2147483653L
msg260813 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-24 15:30
> mmap.mmap(-1, 2**31 + 5) could be used here.

Yeah, I hesitated to use mmap, but I'm not really a big fan of using directly a mmap object in test_descr. Unit tests are supposed to only test one thing.


> If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages.

It depends on the Kernel configuration. On Solaris, overcommit is disable by default.

I prefer to use a regular Python str. Or maybe use a custom Python object for which we can control the output of __len__()?
msg260814 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 15:42
> xrange() constructor is implemented in C and looks to use C long for
> parameters. This issue is specific to 64-bit Windows with 32-bit C long, so
> xrange() doesn't seem to work here.

What about xrange(-1, 2**31-1)? In any case the fact that xrange works not 
with Py_ssize_t, but with long, looks as a bug. I'll open a separate issue for 
this.

> > Or weakref?
> 
> What do you mean? Which object? weakref.ref() has no length.

I see that the sq_length slot in the weakproxy type is set to proxy_length.
msg260815 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-02-24 15:56
> the sq_length slot in the weakproxy type is set to proxy_length.

Nice. Its tp_getattro gets in the way of using __len__ directly, but this can be side stepped by manually binding the descriptor:

    class Test(object):
        def __len__(self):
            return 2**31 + 5

    >>> t = Test()
    >>> p = weakref.proxy(t)
    >>> p.__len__()
    2147483653L
    >>> type(p).__len__.__get__(p)()
    -2147483643
msg260817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 16:41
With fixed issue26428 you could use xrange() for testing this issue.
msg292122 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-22 15:28
Could you create a PR for your patch Victor?
msg343774 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-28 12:55
New changeset 05f16416d99dc9fc76fef11e56f16593e7a5955e by Victor Stinner (Zackery Spytz) in branch 'master':
bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606)
https://github.com/python/cpython/commit/05f16416d99dc9fc76fef11e56f16593e7a5955e
msg343777 - (view) Author: miss-islington (miss-islington) Date: 2019-05-28 13:17
New changeset e7ddf586ae5b7a3b975103b09c8202226d77f421 by Miss Islington (bot) in branch '3.7':
bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606)
https://github.com/python/cpython/commit/e7ddf586ae5b7a3b975103b09c8202226d77f421
msg343792 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-28 15:23
New changeset 80dfe990162e7a2dd99524829beecd449a973e9e by Victor Stinner in branch '2.7':
bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606) (GH-13625)
https://github.com/python/cpython/commit/80dfe990162e7a2dd99524829beecd449a973e9e
msg343793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-28 15:24
Thanks Zackery Spytz for the fix. It's now applied to 2.7, 3.7 and master branches.
msg343795 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-28 15:37
Oh. The test fails on Python 2.7 on Windows:

https://buildbot.python.org/all/#/builders/26/builds/287

ERROR: test_wrap_lenfunc_bad_cast (test.test_descr.OperatorsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\2.7.kloth-win64vs9\build\lib\test\test_descr.py", line 407, in test_wrap_lenfunc_bad_cast
    self.assertEqual(xrange(sys.maxsize).__len__(), sys.maxsize)
OverflowError: Python int too large to convert to C long
msg343800 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-28 16:15
New changeset aaed2c332ae8370e5e87d09c43ef7a39c2abf68d by Victor Stinner in branch '2.7':
bpo-26423: Fix test_descr.test_wrap_lenfunc_bad_cast() on 32-bit Windows (GH-13629)
https://github.com/python/cpython/commit/aaed2c332ae8370e5e87d09c43ef7a39c2abf68d
History
Date User Action Args
2019-05-28 16:15:34vstinnersetmessages: + msg343800
2019-05-28 16:02:10vstinnersetstage: resolved -> patch review
pull_requests: + pull_request13529
2019-05-28 15:37:17vstinnersetstatus: closed -> open
resolution: fixed ->
messages: + msg343795
2019-05-28 15:24:03vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg343793

stage: patch review -> resolved
2019-05-28 15:23:12vstinnersetmessages: + msg343792
2019-05-28 13:58:58vstinnersetpull_requests: + pull_request13524
2019-05-28 13:17:50miss-islingtonsetnosy: + miss-islington
messages: + msg343777
2019-05-28 12:57:24ZackerySpytzsetnosy: + ZackerySpytz

versions: + Python 3.7, Python 3.8, - Python 3.5, Python 3.6
2019-05-28 12:55:41miss-islingtonsetpull_requests: + pull_request13522
2019-05-28 12:55:36vstinnersetmessages: + msg343774
2019-05-28 00:01:58ZackerySpytzsetstage: needs patch -> patch review
pull_requests: + pull_request13512
2017-04-22 15:28:13serhiy.storchakasetmessages: + msg292122
2016-02-24 16:41:55serhiy.storchakasetmessages: + msg260817
2016-02-24 15:56:58eryksunsetmessages: + msg260815
2016-02-24 15:42:43serhiy.storchakasetmessages: + msg260814
2016-02-24 15:30:36vstinnersetmessages: + msg260813
2016-02-24 15:16:35eryksunsetnosy: + eryksun
messages: + msg260812
2016-02-24 14:59:48vstinnersetmessages: + msg260811
2016-02-24 14:54:40serhiy.storchakasetmessages: + msg260809
2016-02-24 13:47:09vstinnersetfiles: + wrap_lenfunc-2.patch

messages: + msg260799
2016-02-24 12:03:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg260793
2016-02-24 09:25:17vstinnersettitle: Integer overflow in wrap_lenfunc() on 64-bit build of Windows -> Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1
2016-02-24 09:24:57vstinnersetfiles: + wrap_lenfunc.patch
keywords: + patch
messages: + msg260777
2016-02-24 09:23:50vstinnersettitle: __len__() returns 32 bit int on windows leading to overflows -> Integer overflow in wrap_lenfunc() on 64-bit build of Windows
2016-02-24 09:22:44georg.brandlsetmessages: + msg260776
2016-02-24 09:21:39vstinnersetnosy: + pitrou
messages: + msg260775
2016-02-24 09:16:10mark.dickinsonsetpriority: normal -> high

messages: + msg260774
components: + Interpreter Core
stage: needs patch
2016-02-24 09:04:55vstinnersetnosy: + vstinner
messages: + msg260773
2016-02-24 08:51:11mark.dickinsonsetnosy: + mark.dickinson

messages: + msg260770
versions: + Python 3.5, Python 3.6
2016-02-24 01:53:45RazerMsetnosy: + RazerM
2016-02-23 23:04:48Dave Hibbittscreate