Message 341330 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	rhettinger, serhiy.storchaka
Date	2019-05-03.09:49:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1556876973.98.0.990157093134.issue36781@roundup.psfhosted.org>
In-reply-to

Content
To count the number of items that satisfy certain condition you can use either sum(1 for x in data if pred(x)) or sum(pred(x) for x in data) where pred(x) is a boolean expression. The latter case is shorter but slower. There are two causes for this: 1. The generator expression needs to generate more items, not only when pred(x) is true, but also when pred(x) is false. 2. sum() is optimized for integers and floats, but not for bools. The first cause is out of the scope of this issue, but sum() can optimized for bools. $ ./python -m timeit -s "a = [True] * 106" -- "sum(a)" Unpatched: 10 loops, best of 5: 22.3 msec per loop Patched: 50 loops, best of 5: 6.26 msec per loop $ ./python -m timeit -s "a = list(range(106))" -- "sum(x % 2 == 0 for x in a)" Unpatched: 5 loops, best of 5: 89.8 msec per loop Patched: 5 loops, best of 5: 67.5 msec per loop $ ./python -m timeit -s "a = list(range(10**6))" -- "sum(1 for x in a if x % 2 == 0)" 5 loops, best of 5: 53.9 msec per loop

To count the number of items that satisfy certain condition you can use either

    sum(1 for x in data if pred(x))

or

    sum(pred(x) for x in data)

where pred(x) is a boolean expression.

The latter case is shorter but slower. There are two causes for this:

1. The generator expression needs to generate more items, not only when pred(x) is true, but also when pred(x) is false.

2. sum() is optimized for integers and floats, but not for bools.

The first cause is out of the scope of this issue, but sum() can optimized for bools.

$ ./python -m timeit -s "a = [True] * 10**6" -- "sum(a)"
Unpatched:  10 loops, best of 5: 22.3 msec per loop
Patched:    50 loops, best of 5: 6.26 msec per loop

$ ./python -m timeit -s "a = list(range(10**6))" -- "sum(x % 2 == 0 for x in a)"
Unpatched:  5 loops, best of 5: 89.8 msec per loop
Patched:    5 loops, best of 5: 67.5 msec per loop

$ ./python -m timeit -s "a = list(range(10**6))" -- "sum(1 for x in a if x % 2 == 0)"
5 loops, best of 5: 53.9 msec per loop

History
Date	User	Action	Args
2019-05-03 09:49:34	serhiy.storchaka	set	recipients: + serhiy.storchaka, rhettinger
2019-05-03 09:49:33	serhiy.storchaka	set	messageid: <1556876973.98.0.990157093134.issue36781@roundup.psfhosted.org>
2019-05-03 09:49:33	serhiy.storchaka	link	issue36781 messages
2019-05-03 09:49:33	serhiy.storchaka	create