This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Truncate __len__() at sys.maxsize
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: _doublep, belopolsky, benjamin.peterson, gregory.p.smith, hagen, pitrou, rbp, rhettinger, vstinner
Priority: normal Keywords: patch

Created on 2008-04-30 04:36 by belopolsky, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
len.diff belopolsky, 2008-04-30 04:36 patch against py3k revision 62564
len_message.patch rbp, 2008-05-10 19:46 Change OverflowError message when len > sys.maxsize (py3k r62990)
Messages (12)
msg65989 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-30 04:35
On Tue, Apr 29, 2008 at 10:36 PM, Guido van Rossum <guido@python.org> 
wrote:
..
>  Let's also fix __len__() so that it returns sys.{maxint,maxsize} when
>  the result doesn't fit in a Py_ssize_t.

http://mail.python.org/pipermail/python-3000/2008-April/013343.html

With attached patch given

class x:
    def __len__(self):
        return 2**100

len(x()) and len(range(2**100)) will return sys.maxsize.
msg65994 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-30 06:32
Wouldn't it be better to raise OverflowError or somesuch?
msg66001 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-30 13:42
On Wed, Apr 30, 2008 at 2:32 AM, Raymond Hettinger
<report@bugs.python.org> wrote:

>  Wouldn't it be better to raise OverflowError or somesuch?

Thats what the current code does.  I don't know what Guido's full
rationale is, but I guess the idea is that len(..) is not supposed to
raise an exception on sizeable objects.

Here is a quote from another message:

"""
__len__ will always be problematic when there are more values than can
be counted in a signed C long; maybe we should do what the Java
collections package does: for once, Java chooses practicality over
purity, and simply states that if the length doesn't fit, the largest
number that does fit is returned (i.e. for us that would be
sys.maxsize in 3.0, sys.maxint in 2.x).
"""
-- Guido van Rossum, 2008-04-30
http://mail.python.org/pipermail/python-3000/2008-April/013340.html

I suspect, however, that part of Java's motivation for this behavior
is that exceptions need to be declared and declaring the length method
as throwing OverflowError would make many programmers very unhappy.
msg66013 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-04-30 18:31
Gasp, having len() return something else than the true container size
sounds horrible. At least raising OverflowError makes it clear that
something wrong is going on...
msg66046 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-05-01 21:16
If you're interested I asked a Java news group:
http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/fddbc3b1f9fec125#
msg66459 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-05-09 08:47
Well apparently the Java guys think raising an exception would have been
a much better idea than the behaviour they are stuck with.
There's also in interesting proposal there:

""" The ReturnValueTooBigException could even have a method declared as
"long size()" that reports the actual size of the collection. """
msg66573 - (view) Author: Rodrigo Bernardo Pimentel (rbp) (Python committer) Date: 2008-05-10 19:46
I think returning sys.{maxint,maxsize} in this case is a plain lie.
That's not practicality, that's giving back false information.

Barring drastic language changes (such as having objects representing
"infinity" or "greater than" - which, of course, won't happen), I think
the current behaviour of raising an exception is the correct one. But,
although I think OverflowError is good enough, the current exception
message is a bit cryptic, especially for anyone who doesn't know C:

"""OverflowError: Python int too large to convert to C ssize_t"""

I've attached a simple patch (modified from Alexander's) to raise:

"""OverflowError: Length too large"""

(I thought about "Object too large", but our problem is actually that
the *length* itself is too large)
msg66687 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-05-11 22:38
Agreed, having it lie about the size is the WORST possible behavior
because it will silently hide problems.  Lets not do that.

But I must've missed something, why can't __len__ return the correct
value?  Merely because range() is broken and might use it as input? 
Thats no excuse.  Fix range().
msg66688 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-05-11 22:48
On Sun, May 11, 2008 at 6:38 PM, Gregory P. Smith
<report@bugs.python.org> wrote:
..
> But I must've missed something, why can't __len__ return the correct
> value?

The problem is the C signature of the sq_length slot:

typedef Py_ssize_t (*lenfunc)(PyObject *);
msg72198 - (view) Author: Paul Pogonyshev (_doublep) Date: 2008-08-30 15:34
I'm also absolutely against having len() lying to me.  This would be a
welcome to bump into some other hideous error later, e.g. discarding
part of data as I'd think it wasn't there.  Better raise an exception as
now, at least then programmers will know something is wrong and have a
chance to workaround, etc.
msg78285 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-12-25 23:59
Most people disagree with the original idea (len.diff: truncate the 
length to sys.maxsize).

I don't like rbp's patch: replace verbose error message ("Python int 
too large to convert to C ssize_t") by a shorter message ("Length too 
large"). If I want to debug my program, I prefer longer error 
messages. The original message contains the C type: ssize_t, useful 
information.

I dislike both patches. Can we close this issue with 
resolution=invalid?
msg79810 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-14 00:01
Initial proposition (len.diff) was rejected. If you want to change 
len() behaviour, reopen this issue or open another one.
History
Date User Action Args
2022-04-11 14:56:33adminsetgithub: 46975
2009-01-14 00:01:49vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg79810
2008-12-26 00:00:00vstinnersetnosy: + vstinner
messages: + msg78285
2008-08-30 15:34:07_doublepsetnosy: + _doublep
messages: + msg72198
2008-08-30 13:34:07hagensetnosy: + hagen
2008-05-11 22:48:22belopolskysetmessages: + msg66688
2008-05-11 22:38:41gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg66687
2008-05-10 19:46:50rbpsetfiles: + len_message.patch
nosy: + rbp
messages: + msg66573
2008-05-09 08:47:19pitrousetmessages: + msg66459
2008-05-01 21:16:07benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg66046
2008-04-30 18:31:19pitrousetnosy: + pitrou
messages: + msg66013
2008-04-30 13:42:17belopolskysetmessages: + msg66001
2008-04-30 06:32:11rhettingersetnosy: + rhettinger
messages: + msg65994
2008-04-30 04:39:25belopolskysettype: enhancement
2008-04-30 04:36:29belopolskycreate