len called twice in the list() constructor #84010

kimadeline · 2020-03-02T15:36:17Z

BPO	39829
Nosy	@brettcannon, @rhettinger, @terryjreedy, @methane, @ericsnowcurrently, @pablogsal, @sweeneyde, @godlygeek, @kimadeline, @thatbirdguythatuknownot
PRs	bpo-39829: Optimize __len__() being called twice in the list() constructor #31816

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/pablogsal'
closed_at = <Date 2022-03-14.01:24:29.714>
created_at = <Date 2020-03-02.15:36:16.527>
labels = ['interpreter-core', '3.11', '3.9', '3.10', 'performance']
title = '__len__ called twice in the list() constructor'
updated_at = <Date 2022-03-14.01:24:38.330>
user = 'https://github.com/kimadeline'

bugs.python.org fields:

activity = <Date 2022-03-14.01:24:38.330>
actor = 'methane'
assignee = 'pablogsal'
closed = True
closed_date = <Date 2022-03-14.01:24:29.714>
closer = 'methane'
components = ['Interpreter Core']
creation = <Date 2020-03-02.15:36:16.527>
creator = 'kimiguel'
dependencies = []
files = []
hgrepos = []
issue_num = 39829
keywords = ['patch']
message_count = 13.0
messages = ['363186', '363188', '363588', '363745', '363746', '363747', '414724', '414727', '414840', '414889', '415026', '415111', '415112']
nosy_count = 10.0
nosy_names = ['brett.cannon', 'rhettinger', 'terry.reedy', 'methane', 'eric.snow', 'pablogsal', 'Dennis Sweeney', 'godlygeek', 'kimiguel', 'Crowthebird']
pr_nums = ['31816']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue39829'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

kimadeline · 2020-03-02T15:36:16Z

(See bpo-33234)

Recently we added Python 3.8 to our CI test matrix, and we noticed a possible backward incompatibility with the list() constructor.

We found that __len__ is getting called twice, while before 3.8 it was only called once.

Here's an example:

class Foo:
 def __iter__(self):
  print("iter")
  return iter([3, 5, 42, 69])

 def __len__(self):
  print("len")
  return 4

Calling list(Foo()) using Python 3.7 prints:

iter
len

But calling list(Foo()) using Python 3.8 prints:

len
iter
len

It looks like this behaviour was introduced for bpo-33234 with PR #54055.

We realize that this was merged a while back, but at least we wanted to make the team aware of this change in behaviour.

pablogsal · 2020-03-02T15:44:00Z

Why should that be backwards incompatible? The number of times we can __len__ on the constructor is an implementation detail. The reason is called now twice is because there is an extra check for the preallocation logic, which is detached from the logic of the subsequent list_extend(self, iterable).

On the other hand, there may be a chance for optimization here, but on a very rough first plan, that may require coupling some logic (passing down the calculated length to list_extend() or some helper, which I am not very fond of.

terryjreedy · 2020-03-07T10:23:44Z

The only specification is that len(ob) calls ob.__len__ and that ob.__len__ should return an 'integer >= 0'. (Adding side effects goes beyond that spec.) I agree that a detectable internal in list is not a bug. Unless there is a realistic performance enhancement in caching the result of the first call, this issue should be closed.

ericsnowcurrently · 2020-03-09T16:09:27Z

FWIW, I encouraged Kim to file this. Thanks Kim!

While it isn't part of any specification, it is an unexpected change in behavior that led to some test failures. So I figured it would be worth bringing up. :) I did find it surprising that we were not caching the result, but don't think that's necessarily a problem.

All that said, the change did not actually break anything other than some tests (not the code they were testing). So I don't have a problem with closing this.

pablogsal · 2020-03-09T16:12:29Z

Thanks Kim and Eric!

I think it still makes sense to do some quick benchmarking and research on passing down the calculated length. I can try to produce a draft PR so we can discuss with something more tangible.

ericsnowcurrently · 2020-03-09T16:20:31Z

I'm not opposed. :) I just don't want to impose on your time.

godlygeek · 2022-03-08T05:17:04Z

Pardon me for necroing an old issue, but someone pointed out the surprising behavior of __len__ being called twice by list(iterable), and it caught my curiosity.

372d705 made it so that list.__init__(iterable) calls iterable.__len__() before calling list.extend(), to preallocate exactly the right amount of space, rather than allowing list.extend() to grow the array. That's because list.extend() can over-allocate.

What if instead, we made it so that list.extend(iterable) doesn't over-allocate when called on an empty list? In the two places where list_extend calls list_resize to grow the array, we'd check if self->ob_item == NULL and if so call list_preallocate_exact instead, and we'd remove the call to list_preallocate_exact from list___init___impl.

It seems like that ought to achieve the same goal as making __init__ call preallocate exactly, without requiring the extra call to __len__.

sweeneyde · 2022-03-08T06:02:47Z

Related to Matt's idea is https://bugs.python.org/issue43574

thatbirdguythatuknownot · 2022-03-10T11:05:29Z

Matt's idea leads to some speedups when implemented correctly (pardon me but I have no idea how to use pyperf):

list({}): Mean +- std dev: [orig] 109 ns +- 1 ns -> [modif] 103 ns +- 1 ns: 1.06x faster
list({1: 2}): Mean +- std dev: [orig] 125 ns +- 1 ns -> [modif] 118 ns +- 1 ns: 1.05x faster
list({(1, 2, 3): 4}): Mean +- std dev: [orig] 125 ns +- 1 ns -> [modif] 118 ns +- 1 ns: 1.05x faster
list((3, 3, 4)): Mean +- std dev: [orig] 89.2 ns +- 4.5 ns -> [modif] 82.9 ns +- 4.6 ns: 1.08x faster
list(()): Mean +- std dev: [orig] 70.1 ns +- 0.8 ns -> [modif] 65.5 ns +- 0.8 ns: 1.07x faster
list({0, 1, 2, ...}): Mean +- std dev: [orig] 74.7 us +- 3.6 us -> [modif] 67.6 us +- 1.6 us: 1.11x faster
list({9, 3}): Mean +- std dev: [orig] 131 ns +- 2 ns -> [modif] 126 ns +- 4 ns: 1.04x faster
list(set()): Mean +- std dev: [orig] 115 ns +- 6 ns -> [modif] 110 ns +- 2 ns: 1.05x faster
list([]): Mean +- std dev: [orig] 73.2 ns +- 5.5 ns -> [modif] 67.8 ns +- 3.4 ns: 1.08x faster
list([1, 2, 1, 1]): Mean +- std dev: [orig] 93.5 ns +- 9.8 ns -> [modif] 87.9 ns +- 8.6 ns: 1.06x faster
list([1, 2, 1, 2, 1, 2]): Mean +- std dev: [orig] 93.0 ns +- 3.1 ns -> [modif] 87.0 ns +- 2.7 ns: 1.07x faster

Benchmark hidden because not significant (3): list({0: 0, 1: ...}), list((4, 5, 1, ...)), list([4, 1, 3, ...])

Geometric mean: 1.05x faster

Changes compared here: main...thatbirdguythatuknownot:patch-17

methane · 2022-03-11T05:54:15Z

Changes compared here: main...thatbirdguythatuknownot:patch-17

Looks good to me. Would you create a pull request?

thatbirdguythatuknownot · 2022-03-13T05:11:02Z

Looks good to me. Would you create a pull request?

Created a pull request (31816).

methane · 2022-03-14T01:24:06Z

New changeset 2153daf by Crowthebird in branch 'main':
bpo-39829: Fix __len__() is called twice in list() constructor (GH-31816)
2153daf

methane · 2022-03-14T01:24:38Z

Thanks.

This works around python/cpython#84010, which otherwise causes extra SQL statements and hence test failures on Python 3.8-3.10.

kimadeline mannequin added type-bug An unexpected behavior, bug, or error 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 2, 2020

terryjreedy added performance Performance or resource usage and removed type-bug An unexpected behavior, bug, or error labels Mar 7, 2020

ericsnowcurrently closed this as completed Mar 9, 2020

ericsnowcurrently added the invalid label Mar 9, 2020

ericsnowcurrently closed this as completed Mar 9, 2020

ericsnowcurrently added the invalid label Mar 9, 2020

ericsnowcurrently reopened this Mar 9, 2020

ericsnowcurrently removed the invalid label Mar 9, 2020

ericsnowcurrently assigned pablogsal Mar 9, 2020

ericsnowcurrently reopened this Mar 9, 2020

thatbirdguythatuknownot mannequin added 3.10 only security fixes 3.11 only security fixes labels Mar 13, 2022

iritkatriel removed the 3.8 only security fixes label Mar 13, 2022

methane closed this as completed Mar 14, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

jugmac00 pushed a commit to jugmac00/launchpad that referenced this issue Dec 5, 2022

Cache return value of POFilesByPOTemplates.__len__

ac9960e

This works around python/cpython#84010, which otherwise causes extra SQL statements and hence test failures on Python 3.8-3.10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

len called twice in the list() constructor #84010

len called twice in the list() constructor #84010

kimadeline mannequin commented Mar 2, 2020

kimadeline mannequin commented Mar 2, 2020

pablogsal commented Mar 2, 2020

terryjreedy commented Mar 7, 2020

ericsnowcurrently commented Mar 9, 2020

pablogsal commented Mar 9, 2020

ericsnowcurrently commented Mar 9, 2020

godlygeek mannequin commented Mar 8, 2022

sweeneyde commented Mar 8, 2022

thatbirdguythatuknownot mannequin commented Mar 10, 2022

methane commented Mar 11, 2022

thatbirdguythatuknownot mannequin commented Mar 13, 2022

methane commented Mar 14, 2022

methane commented Mar 14, 2022

__len__ called twice in the list() constructor #84010

__len__ called twice in the list() constructor #84010

Comments

kimadeline mannequin commented Mar 2, 2020

kimadeline mannequin commented Mar 2, 2020

pablogsal commented Mar 2, 2020

terryjreedy commented Mar 7, 2020

ericsnowcurrently commented Mar 9, 2020

pablogsal commented Mar 9, 2020

ericsnowcurrently commented Mar 9, 2020

godlygeek mannequin commented Mar 8, 2022

sweeneyde commented Mar 8, 2022

thatbirdguythatuknownot mannequin commented Mar 10, 2022

methane commented Mar 11, 2022

thatbirdguythatuknownot mannequin commented Mar 13, 2022

methane commented Mar 14, 2022

methane commented Mar 14, 2022

len called twice in the list() constructor #84010

len called twice in the list() constructor #84010