msg260749 - (view) |
Author: Dave Hibbitts (Dave Hibbitts) |
Date: 2016-02-23 23:04 |
__len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit. len() however returns an int for values less than sys.maxint and a long above that.
Returning an int in __len__() causes it to return negative lengths for objects of size greater than sys.maxint, below you can see a quick test on how to reproduce it.
And here's an explanation from \u\Rhomboid on Reddit of why we believe the issue happens.
"You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows.
The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary.
This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():
return PyLong_FromLong((long)res);
That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t()."
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> a = 'a'*2500000000
>>> a.__len__()
-1794967296
>>> len(a)
2500000000L
>>> a = [1]*2500000000
>>> len(a)
2500000000L
>>> a.__len__()
-1794967296
|
msg260770 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2016-02-24 08:51 |
If I understand the report correctly, this affects Python 3, too. Adding 3.5 and 3.6 to the versions.
|
msg260773 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-24 09:04 |
> __len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit.
Hum, I don't understand this statement. It looks like the code uses Py_ssize_t everywhere and Py_ssize_t is supposed to be able to store a whole pointer, so be 64-bit when Python is compiled in 64-bit mode.
> Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
This is a 32-bit build ("win32"), no? max.size is 2147483647 on 32-bit mode if I recall correctly. On 64-bit, it's 9223372036854775807. By the way, on 64-bit, sys.maxsize == sys.maxint.
In Python 2:
len(obj) => builtin_len() => PyObject_Size() which returns a Py_ssize_t
For string, PyObject_Size() => string_length() => Py_SIZE(obj) => ((PyVarObject *)obj)->ob_size
PyVarObject.ob_size has the type Py_ssize_t.
builtin_len() gets a Py_ssize_t which is converted to a Python int or long with PyInt_FromSsize_t().
PyInt_FromSsize_t() creates an int if the value fits into a C long, or it calls _PyLong_FromSsize_t().
Difference in Python 3:
builtin_len() also gets a Py_ssize_t, but it calls PyLong_FromSsize_t() (since Python short integers as gone, long became int in Python 3).
string_length() is replaced with unicode_length() => PyUnicode_GET_LENGTH() => (PyASCIIObject *)obj)->length and PyASCIIObject.length type is Py_ssize_t.
|
msg260774 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2016-02-24 09:16 |
> This is a 32-bit build ("win32"), no?
No. "win32" appears for both 32-bit and 64-bit builds on Windows. This is a 64-bit build. `sys.maxint` is 2**31 - 1, and `sys.maxsize` is 2**63 - 1. They're not equal.
I can reproduce the issue with a stock Python, on both 2.7 and 3.5.
|
msg260775 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-24 09:21 |
I tested Python 2.7 on Windows, downloaded from python.org:
* 32-bit:
- 'x'*(2**31+5) fails with OverflowError as expected
- len('x' * (sys.maxsize // 2)) returns 2**30-1 as a Python int
- for length larger with 'x' * (sys.maxsize // 2), I get a MemoryError, probably because of a integer overflow in the C code, or maybe because malloc() failed, I don't know
* 64-bit:
- len('x' * (2**32 + 5)) returns 2**32 + 5 (a positive number) as a Pyhon long
- len('x' * (2**31 + 5)) returns 2**31 + 5 (a positive number) as a Pyhon long
- len('x' * (2**31 - 5)) returns 2**31 - 5 (a positive number) as a Pyhon int
Limits :
* 32-bit: sys.maxint == 2**31-1, sys.maxsize == 2**31-1 == sys.maxint
* 64-bit: sys.maxint == 2**31-1, sys.maxsize == 2**63-1
It looks like the issue is specific to Python compiled by Continuum ("Anaconda" flavor)?
|
msg260776 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2016-02-24 09:22 |
You need to call `x.__len__()` explicitly.
|
msg260777 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-24 09:24 |
> You need to call `x.__len__()` explicitly.
Ah ok :-) I didn't read carefully the initial message.
Attached patch should fix the issue on Python 2.7. It uses the same function than builtin_len() to cast the C Py_ssize_t to a Python int or long.
|
msg260793 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-24 12:03 |
Could you write a test? May be on 3.x you can use range(sys.maxsize).__len__().
|
msg260799 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-24 13:47 |
> Could you write a test?
Done.
|
msg260809 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-24 14:54 |
Using a bigmem test for this issue looks excessive. Could we use the xrange() object for testing? Or weakref?
|
msg260811 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-24 14:59 |
> Could we use the xrange() object for testing?
xrange() constructor is implemented in C and looks to use C long for parameters. This issue is specific to 64-bit Windows with 32-bit C long, so xrange() doesn't seem to work here.
> Or weakref?
What do you mean? Which object? weakref.ref() has no length.
> Using a bigmem test for this issue looks excessive.
Well, I was too lazy to develop a C extension (add something to _testcapi) just for this tiny bug.
I also like tests really testing the final feature (the bug reported in this issue).
I failed to find an existing Python object with __len__() implemented in C which doesn't need to allocate 2^31 bytes to test this bug.
|
msg260812 - (view) |
Author: Eryk Sun (eryksun) * |
Date: 2016-02-24 15:16 |
mmap.mmap(-1, 2**31 + 5) could be used here. If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages.
Unpatched:
>>> mmap.mmap(-1, 2**31 + 5).__len__()
-2147483643
Patched:
>>> mmap.mmap(-1, 2**31 + 5).__len__()
2147483653L
|
msg260813 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-24 15:30 |
> mmap.mmap(-1, 2**31 + 5) could be used here.
Yeah, I hesitated to use mmap, but I'm not really a big fan of using directly a mmap object in test_descr. Unit tests are supposed to only test one thing.
> If the pages are never touched it won't increase the working set size. It merely maps the address range with demand-zero pages.
It depends on the Kernel configuration. On Solaris, overcommit is disable by default.
I prefer to use a regular Python str. Or maybe use a custom Python object for which we can control the output of __len__()?
|
msg260814 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-24 15:42 |
> xrange() constructor is implemented in C and looks to use C long for
> parameters. This issue is specific to 64-bit Windows with 32-bit C long, so
> xrange() doesn't seem to work here.
What about xrange(-1, 2**31-1)? In any case the fact that xrange works not
with Py_ssize_t, but with long, looks as a bug. I'll open a separate issue for
this.
> > Or weakref?
>
> What do you mean? Which object? weakref.ref() has no length.
I see that the sq_length slot in the weakproxy type is set to proxy_length.
|
msg260815 - (view) |
Author: Eryk Sun (eryksun) * |
Date: 2016-02-24 15:56 |
> the sq_length slot in the weakproxy type is set to proxy_length.
Nice. Its tp_getattro gets in the way of using __len__ directly, but this can be side stepped by manually binding the descriptor:
class Test(object):
def __len__(self):
return 2**31 + 5
>>> t = Test()
>>> p = weakref.proxy(t)
>>> p.__len__()
2147483653L
>>> type(p).__len__.__get__(p)()
-2147483643
|
msg260817 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-24 16:41 |
With fixed issue26428 you could use xrange() for testing this issue.
|
msg292122 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-04-22 15:28 |
Could you create a PR for your patch Victor?
|
msg343774 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-05-28 12:55 |
New changeset 05f16416d99dc9fc76fef11e56f16593e7a5955e by Victor Stinner (Zackery Spytz) in branch 'master':
bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606)
https://github.com/python/cpython/commit/05f16416d99dc9fc76fef11e56f16593e7a5955e
|
msg343777 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-05-28 13:17 |
New changeset e7ddf586ae5b7a3b975103b09c8202226d77f421 by Miss Islington (bot) in branch '3.7':
bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606)
https://github.com/python/cpython/commit/e7ddf586ae5b7a3b975103b09c8202226d77f421
|
msg343792 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-05-28 15:23 |
New changeset 80dfe990162e7a2dd99524829beecd449a973e9e by Victor Stinner in branch '2.7':
bpo-26423: Fix possible overflow in wrap_lenfunc() (GH-13606) (GH-13625)
https://github.com/python/cpython/commit/80dfe990162e7a2dd99524829beecd449a973e9e
|
msg343793 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-05-28 15:24 |
Thanks Zackery Spytz for the fix. It's now applied to 2.7, 3.7 and master branches.
|
msg343795 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-05-28 15:37 |
Oh. The test fails on Python 2.7 on Windows:
https://buildbot.python.org/all/#/builders/26/builds/287
ERROR: test_wrap_lenfunc_bad_cast (test.test_descr.OperatorsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\buildbot.python.org\2.7.kloth-win64vs9\build\lib\test\test_descr.py", line 407, in test_wrap_lenfunc_bad_cast
self.assertEqual(xrange(sys.maxsize).__len__(), sys.maxsize)
OverflowError: Python int too large to convert to C long
|
msg343800 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-05-28 16:15 |
New changeset aaed2c332ae8370e5e87d09c43ef7a39c2abf68d by Victor Stinner in branch '2.7':
bpo-26423: Fix test_descr.test_wrap_lenfunc_bad_cast() on 32-bit Windows (GH-13629)
https://github.com/python/cpython/commit/aaed2c332ae8370e5e87d09c43ef7a39c2abf68d
|
msg378773 - (view) |
Author: Irit Katriel (iritkatriel) * |
Date: 2020-10-16 22:32 |
Can this be closed?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:27 | admin | set | github: 70610 |
2020-10-18 09:17:15 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2020-10-16 22:32:29 | iritkatriel | set | nosy:
+ iritkatriel messages:
+ msg378773
|
2019-05-28 16:15:34 | vstinner | set | messages:
+ msg343800 |
2019-05-28 16:02:10 | vstinner | set | stage: resolved -> patch review pull_requests:
+ pull_request13529 |
2019-05-28 15:37:17 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages:
+ msg343795
|
2019-05-28 15:24:03 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg343793
stage: patch review -> resolved |
2019-05-28 15:23:12 | vstinner | set | messages:
+ msg343792 |
2019-05-28 13:58:58 | vstinner | set | pull_requests:
+ pull_request13524 |
2019-05-28 13:17:50 | miss-islington | set | nosy:
+ miss-islington messages:
+ msg343777
|
2019-05-28 12:57:24 | ZackerySpytz | set | nosy:
+ ZackerySpytz
versions:
+ Python 3.7, Python 3.8, - Python 3.5, Python 3.6 |
2019-05-28 12:55:41 | miss-islington | set | pull_requests:
+ pull_request13522 |
2019-05-28 12:55:36 | vstinner | set | messages:
+ msg343774 |
2019-05-28 00:01:58 | ZackerySpytz | set | stage: needs patch -> patch review pull_requests:
+ pull_request13512 |
2017-04-22 15:28:13 | serhiy.storchaka | set | messages:
+ msg292122 |
2016-02-24 16:41:55 | serhiy.storchaka | set | messages:
+ msg260817 |
2016-02-24 15:56:58 | eryksun | set | messages:
+ msg260815 |
2016-02-24 15:42:43 | serhiy.storchaka | set | messages:
+ msg260814 |
2016-02-24 15:30:36 | vstinner | set | messages:
+ msg260813 |
2016-02-24 15:16:35 | eryksun | set | nosy:
+ eryksun messages:
+ msg260812
|
2016-02-24 14:59:48 | vstinner | set | messages:
+ msg260811 |
2016-02-24 14:54:40 | serhiy.storchaka | set | messages:
+ msg260809 |
2016-02-24 13:47:09 | vstinner | set | files:
+ wrap_lenfunc-2.patch
messages:
+ msg260799 |
2016-02-24 12:03:45 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg260793
|
2016-02-24 09:25:17 | vstinner | set | title: Integer overflow in wrap_lenfunc() on 64-bit build of Windows -> Integer overflow in wrap_lenfunc() on 64-bit build of Windows with len > 2**31-1 |
2016-02-24 09:24:57 | vstinner | set | files:
+ wrap_lenfunc.patch keywords:
+ patch messages:
+ msg260777
|
2016-02-24 09:23:50 | vstinner | set | title: __len__() returns 32 bit int on windows leading to overflows -> Integer overflow in wrap_lenfunc() on 64-bit build of Windows |
2016-02-24 09:22:44 | georg.brandl | set | messages:
+ msg260776 |
2016-02-24 09:21:39 | vstinner | set | nosy:
+ pitrou messages:
+ msg260775
|
2016-02-24 09:16:10 | mark.dickinson | set | priority: normal -> high
messages:
+ msg260774 components:
+ Interpreter Core stage: needs patch |
2016-02-24 09:04:55 | vstinner | set | nosy:
+ vstinner messages:
+ msg260773
|
2016-02-24 08:51:11 | mark.dickinson | set | nosy:
+ mark.dickinson
messages:
+ msg260770 versions:
+ Python 3.5, Python 3.6 |
2016-02-24 01:53:45 | RazerM | set | nosy:
+ RazerM
|
2016-02-23 23:04:48 | Dave Hibbitts | create | |