classification
Title: __len__ called twice in the list() constructor
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: pablogsal Nosy List: brett.cannon, eric.snow, kimiguel, pablogsal, rhettinger, terry.reedy
Priority: normal Keywords:

Created on 2020-03-02 15:36 by kimiguel, last changed 2020-03-09 16:20 by eric.snow.

Messages (6)
msg363186 - (view) Author: Kim-Adeline Miguel (kimiguel) Date: 2020-03-02 15:36
(See #33234)

Recently we added Python 3.8 to our CI test matrix, and we noticed a possible backward incompatibility with the list() constructor.

We found that __len__ is getting called twice, while before 3.8 it was only called once.

Here's an example:

class Foo:
 def __iter__(self):
  print("iter")
  return iter([3, 5, 42, 69])

 def __len__(self):
  print("len")
  return 4

Calling list(Foo()) using Python 3.7 prints:

iter
len

But calling list(Foo()) using Python 3.8 prints:

len
iter
len

It looks like this behaviour was introduced for #33234 with PR GH-9846. 

We realize that this was merged a while back, but at least we wanted to make the team aware of this change in behaviour.
msg363188 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-02 15:44
Why should that be backwards incompatible? The number of times we can `__len__` on the constructor is an implementation detail. The reason is called now twice is because there is an extra check for the preallocation logic, which is detached from the logic of the subsequent list_extend(self, iterable). 

On the other hand, there may be a chance for optimization here, but on a very rough first plan, that may require coupling some logic (passing down the calculated length to list_extend() or some helper, which I am not very fond of.
msg363588 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-03-07 10:23
The only specification is that len(ob) calls ob.__len__ and that ob.__len__ should return an 'integer >= 0'.  (Adding side effects goes beyond that spec.)  I agree that a detectable internal in list is not a bug.  Unless there is a realistic performance enhancement in caching the result of the first call, this issue should be closed.
msg363745 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2020-03-09 16:09
FWIW, I encouraged Kim to file this.  Thanks Kim!

While it isn't part of any specification, it is an unexpected change in behavior that led to some test failures.  So I figured it would be worth bringing up. :)  I did find it surprising that we were not caching the result, but don't think that's necessarily a problem.

All that said, the change did not actually break anything other than some tests (not the code they were testing).  So I don't have a problem with closing this.
msg363746 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-09 16:12
Thanks Kim and Eric!

I think it still makes sense to do some quick benchmarking and research on passing down the calculated length. I can try to produce a draft PR so we can discuss with something more tangible.
msg363747 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2020-03-09 16:20
I'm not opposed. :)  I just don't want to impose on your time.
History
Date User Action Args
2020-03-09 16:20:31eric.snowsetstatus: closed -> open
messages: + msg363747

assignee: pablogsal
resolution: not a bug ->
stage: resolved ->
2020-03-09 16:12:29pablogsalsetmessages: + msg363746
2020-03-09 16:09:26eric.snowsetstatus: open -> closed
resolution: not a bug
messages: + msg363745

stage: resolved
2020-03-07 10:23:44terry.reedysettype: behavior -> performance

messages: + msg363588
nosy: + terry.reedy
2020-03-02 15:44:00pablogsalsetmessages: + msg363188
2020-03-02 15:36:16kimiguelcreate