classification
Title: Truncate __len__() at sys.maxsize
Type: feature request
Components: Interpreter Core Versions: Python 3.0
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, gregory.p.smith, pitrou, rbp, rhettinger
Priority: Keywords: patch

Created on 2008-04-30 04:36 by belopolsky, last changed 2008-05-11 22:48 by belopolsky.

Files
File name Uploaded Description Edit Remove
len.diff belopolsky, 2008-04-30 04:36 patch against py3k revision 62564
len_message.patch rbp, 2008-05-10 19:46 Change OverflowError message when len > sys.maxsize (py3k r62990)
Messages
msg65989 (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-04-30 04:35
On Tue, Apr 29, 2008 at 10:36 PM, Guido van Rossum <guido@python.org> 
wrote:
..
>  Let's also fix __len__() so that it returns sys.{maxint,maxsize} when
>  the result doesn't fit in a Py_ssize_t.

http://mail.python.org/pipermail/python-3000/2008-April/013343.html

With attached patch given

class x:
    def __len__(self):
        return 2**100

len(x()) and len(range(2**100)) will return sys.maxsize.
msg65994 (view) Author: Raymond Hettinger (rhettinger) Date: 2008-04-30 06:32
Wouldn't it be better to raise OverflowError or somesuch?
msg66001 (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-04-30 13:42
On Wed, Apr 30, 2008 at 2:32 AM, Raymond Hettinger
<report@bugs.python.org> wrote:

>  Wouldn't it be better to raise OverflowError or somesuch?

Thats what the current code does.  I don't know what Guido's full
rationale is, but I guess the idea is that len(..) is not supposed to
raise an exception on sizeable objects.

Here is a quote from another message:

"""
__len__ will always be problematic when there are more values than can
be counted in a signed C long; maybe we should do what the Java
collections package does: for once, Java chooses practicality over
purity, and simply states that if the length doesn't fit, the largest
number that does fit is returned (i.e. for us that would be
sys.maxsize in 3.0, sys.maxint in 2.x).
"""
-- Guido van Rossum, 2008-04-30
http://mail.python.org/pipermail/python-3000/2008-April/013340.html

I suspect, however, that part of Java's motivation for this behavior
is that exceptions need to be declared and declaring the length method
as throwing OverflowError would make many programmers very unhappy.
msg66013 (view) Author: Antoine Pitrou (pitrou) Date: 2008-04-30 18:31
Gasp, having len() return something else than the true container size
sounds horrible. At least raising OverflowError makes it clear that
something wrong is going on...
msg66046 (view) Author: Benjamin Peterson (benjamin.peterson) Date: 2008-05-01 21:16
If you're interested I asked a Java news group:
http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/fddbc3b1f9fec125#
msg66459 (view) Author: Antoine Pitrou (pitrou) Date: 2008-05-09 08:47
Well apparently the Java guys think raising an exception would have been
a much better idea than the behaviour they are stuck with.
There's also in interesting proposal there:

""" The ReturnValueTooBigException could even have a method declared as
"long size()" that reports the actual size of the collection. """
msg66573 (view) Author: Rodrigo Bernardo Pimentel (rbp) Date: 2008-05-10 19:46
I think returning sys.{maxint,maxsize} in this case is a plain lie.
That's not practicality, that's giving back false information.

Barring drastic language changes (such as having objects representing
"infinity" or "greater than" - which, of course, won't happen), I think
the current behaviour of raising an exception is the correct one. But,
although I think OverflowError is good enough, the current exception
message is a bit cryptic, especially for anyone who doesn't know C:

"""OverflowError: Python int too large to convert to C ssize_t"""

I've attached a simple patch (modified from Alexander's) to raise:

"""OverflowError: Length too large"""

(I thought about "Object too large", but our problem is actually that
the *length* itself is too large)
msg66687 (view) Author: Gregory P. Smith (gregory.p.smith) Date: 2008-05-11 22:38
Agreed, having it lie about the size is the WORST possible behavior
because it will silently hide problems.  Lets not do that.

But I must've missed something, why can't __len__ return the correct
value?  Merely because range() is broken and might use it as input? 
Thats no excuse.  Fix range().
msg66688 (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-05-11 22:48
On Sun, May 11, 2008 at 6:38 PM, Gregory P. Smith
<report@bugs.python.org> wrote:
..
> But I must've missed something, why can't __len__ return the correct
> value?

The problem is the C signature of the sq_length slot:

typedef Py_ssize_t (*lenfunc)(PyObject *);
History
Date User Action Args
2008-05-11 22:48:22belopolskysetmessages: + msg66688
2008-05-11 22:38:41gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg66687
2008-05-10 19:46:50rbpsetfiles: + len_message.patch
nosy: + rbp
messages: + msg66573
2008-05-09 08:47:19pitrousetmessages: + msg66459
2008-05-01 21:16:07benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg66046
2008-04-30 18:31:19pitrousetnosy: + pitrou
messages: + msg66013
2008-04-30 13:42:17belopolskysetmessages: + msg66001
2008-04-30 06:32:11rhettingersetnosy: + rhettinger
messages: + msg65994
2008-04-30 04:39:25belopolskysettype: feature request
2008-04-30 04:36:29belopolskycreate