This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Dave Hibbitts
Recipients Dave Hibbitts, georg.brandl, paul.moore, steve.dower, tim.golden, zach.ware
Date 2016-02-23.23:04:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1456268688.08.0.284414420897.issue26423@psf.upfronthosting.co.za>
In-reply-to
Content
__len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit. len() however returns an int for values less than sys.maxint and a long above that. 

Returning an int in __len__() causes it to return negative lengths for objects of size greater than sys.maxint, below you can see a quick test on how to reproduce it.

And here's an explanation from \u\Rhomboid on Reddit of why we believe the issue happens.

"You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows.

The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary.

This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():

    return PyLong_FromLong((long)res);

That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t()."

Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> a = 'a'*2500000000
>>> a.__len__()
-1794967296
>>> len(a)
2500000000L
>>> a = [1]*2500000000
>>> len(a)
2500000000L
>>> a.__len__()
-1794967296
History
Date User Action Args
2016-02-23 23:04:48Dave Hibbittssetrecipients: + Dave Hibbitts, georg.brandl, paul.moore, tim.golden, zach.ware, steve.dower
2016-02-23 23:04:48Dave Hibbittssetmessageid: <1456268688.08.0.284414420897.issue26423@psf.upfronthosting.co.za>
2016-02-23 23:04:47Dave Hibbittslinkissue26423 messages
2016-02-23 23:04:46Dave Hibbittscreate