Message 291530 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	rhettinger, serhiy.storchaka
Date	2017-04-12.08:39:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1491986363.23.0.567694316296.issue30049@psf.upfronthosting.co.za>
In-reply-to

Content
Some operations cache the value of the tp_iternext slot and call it in a loop. But calling tp_iternext can run arbitrary code, release GIL and change the tp_iternext slot in the same or other thread. This can leads to visible behavior difference, such as different iterated values, different raised exceptions or infinite loop. In the past this even could cause a crash (see issue3720), but seems now this is impossible. Examples (list constructor caches tp_iternext, tuple constructor doesn't): >>> def make_iter(): ... class Iterator: ... def __iter__(self): ... return self ... def __next__(self): ... del Iterator.__next__ ... return 1 ... return Iterator() ... >>> tuple(make_iter()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'Iterator' object is not iterable >>> list(make_iter()) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: __next__ >>> >>> def make_iter2(): ... it2 = iter((2,)) ... def subiter(): ... Iterator.__next__ = Iterator.next2 ... yield 1 ... class Iterator(filter): ... def next2(self): ... return next(it2) ... return Iterator(lambda x: True, subiter()) ... >>> tuple(make_iter2()) (1, 2) >>> list(make_iter2()) [1] The tp_iternext is cached for performance, and removing the caching can cause performance regression. But actually the difference is very small. I found a measurable difference (up to 5%) only in following artificial examples in which tp_iternext is called in very tight and long loops: $ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]1000000+[1]' -- 'next(filter(None, a))'python-orig: ..................... 47.2 ms +- 0.2 ms python: ..................... 49.7 ms +- 0.3 ms Mean +- std dev: [python-orig] 47.2 ms +- 0.2 ms -> [python] 49.7 ms +- 0.3 ms: 1.05x slower (+5%) $ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 1000000, None))' python-orig: ..................... 15.4 ms +- 0.1 ms python: ..................... 16.0 ms +- 0.2 ms Mean +- std dev: [python-orig] 15.4 ms +- 0.1 ms -> [python] 16.0 ms +- 0.2 ms: 1.04x slower (+4%) $ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 1000000), 0)' python-orig: ..................... 14.2 ms +- 0.1 ms python: ..................... 14.8 ms +- 0.2 ms Mean +- std dev: [python-orig] 14.2 ms +- 0.1 ms -> [python] 14.8 ms +- 0.2 ms: 1.05x slower (+5%) In all other other cases, when involved creation of a collection (list, bytearray, deque (with maxlen != 0) constructors, ''.join), or calling other code (builtins all, max, map, itertools functions), or for shorter loops the difference is hardly distinguished from the random noise. $ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]1000' -- 'list(iter(a))' python-orig: ..................... 31.8 us +- 0.3 us python: ..................... 31.8 us +- 0.4 us Mean +- std dev: [python-orig] 31.8 us +- 0.3 us -> [python] 31.8 us +- 0.4 us: 1.00x faster (-0%) Not significant! $ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]1000' -- 'all(a)' python-orig: ..................... 47.4 us +- 0.2 us python: ..................... 48.0 us +- 0.3 us Mean +- std dev: [python-orig] 47.4 us +- 0.2 us -> [python] 48.0 us +- 0.3 us: 1.01x slower (+1%) $ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]1000' -- 'max(a)' python-orig: ..................... 108 us +- 1 us python: ..................... 108 us +- 1 us Mean +- std dev: [python-orig] 108 us +- 1 us -> [python] 108 us +- 1 us: 1.00x faster (-0%) Not significant! $ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(lambda x: x, a))' python-orig: ..................... 527 ms +- 8 ms python: ..................... 528 ms +- 2 ms Mean +- std dev: [python-orig] 527 ms +- 8 ms -> [python] 528 ms +- 2 ms: 1.00x slower (+0%) Not significant! $ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 100, None))' python-orig: ..................... 4.72 us +- 0.05 us python: ..................... 4.72 us +- 0.04 us Mean +- std dev: [python-orig] 4.72 us +- 0.05 us -> [python] 4.72 us +- 0.04 us: 1.00x faster (-0%) Not significant! $ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 100), 0)' python-orig: ..................... 4.16 us +- 0.11 us python: ..................... 4.11 us +- 0.05 us Mean +- std dev: [python-orig] 4.16 us +- 0.11 us -> [python] 4.11 us +- 0.05 us: 1.01x faster (-1%)

Some operations cache the value of the tp_iternext slot and call it in a loop. But calling tp_iternext can run arbitrary code, release GIL and change the tp_iternext slot in the same or other thread. This can leads to visible behavior difference, such as different iterated values, different raised exceptions or infinite loop. In the past this even could cause a crash (see issue3720), but seems now this is impossible.

Examples (list constructor caches tp_iternext, tuple constructor doesn't):

>>> def make_iter():
...     class Iterator:
...         def __iter__(self):
...             return self
...         def __next__(self):
...             del Iterator.__next__
...             return 1
...     return Iterator()
... 
>>> tuple(make_iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Iterator' object is not iterable
>>> list(make_iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __next__
>>> 
>>> def make_iter2():
...     it2 = iter((2,))
...     def subiter():
...         Iterator.__next__ = Iterator.next2
...         yield 1
...     class Iterator(filter):
...         def next2(self):
...             return next(it2)
...     return Iterator(lambda x: True, subiter())
... 
>>> tuple(make_iter2())
(1, 2)
>>> list(make_iter2())
[1]

The tp_iternext is cached for performance, and removing the caching can cause performance regression. But actually the difference is very small. I found a measurable difference (up to 5%) only in following artificial examples in which tp_iternext is called in very tight and long loops:

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(None, a))'python-orig: ..................... 47.2 ms +- 0.2 ms
python: ..................... 49.7 ms +- 0.3 ms

Mean +- std dev: [python-orig] 47.2 ms +- 0.2 ms -> [python] 49.7 ms +- 0.3 ms: 1.05x slower (+5%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 1000000, None))'
python-orig: ..................... 15.4 ms +- 0.1 ms
python: ..................... 16.0 ms +- 0.2 ms

Mean +- std dev: [python-orig] 15.4 ms +- 0.1 ms -> [python] 16.0 ms +- 0.2 ms: 1.04x slower (+4%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 1000000), 0)'
python-orig: ..................... 14.2 ms +- 0.1 ms
python: ..................... 14.8 ms +- 0.2 ms

Mean +- std dev: [python-orig] 14.2 ms +- 0.1 ms -> [python] 14.8 ms +- 0.2 ms: 1.05x slower (+5%)

In all other other cases, when involved creation of a collection (list, bytearray, deque (with maxlen != 0) constructors, ''.join), or calling other code (builtins all, max, map, itertools functions), or for shorter loops the difference is hardly distinguished from the random noise.

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000' -- 'list(iter(a))'
python-orig: ..................... 31.8 us +- 0.3 us
python: ..................... 31.8 us +- 0.4 us

Mean +- std dev: [python-orig] 31.8 us +- 0.3 us -> [python] 31.8 us +- 0.4 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]*1000' -- 'all(a)'
python-orig: ..................... 47.4 us +- 0.2 us
python: ..................... 48.0 us +- 0.3 us

Mean +- std dev: [python-orig] 47.4 us +- 0.2 us -> [python] 48.0 us +- 0.3 us: 1.01x slower (+1%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]*1000' -- 'max(a)'
python-orig: ..................... 108 us +- 1 us
python: ..................... 108 us +- 1 us

Mean +- std dev: [python-orig] 108 us +- 1 us -> [python] 108 us +- 1 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(lambda x: x, a))'
python-orig: ..................... 527 ms +- 8 ms
python: ..................... 528 ms +- 2 ms

Mean +- std dev: [python-orig] 527 ms +- 8 ms -> [python] 528 ms +- 2 ms: 1.00x slower (+0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 100, None))'
python-orig: ..................... 4.72 us +- 0.05 us
python: ..................... 4.72 us +- 0.04 us

Mean +- std dev: [python-orig] 4.72 us +- 0.05 us -> [python] 4.72 us +- 0.04 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 100), 0)'
python-orig: ..................... 4.16 us +- 0.11 us
python: ..................... 4.11 us +- 0.05 us

Mean +- std dev: [python-orig] 4.16 us +- 0.11 us -> [python] 4.11 us +- 0.05 us: 1.01x faster (-1%)

History
Date	User	Action	Args
2017-04-12 08:39:23	serhiy.storchaka	set	recipients: + serhiy.storchaka, rhettinger
2017-04-12 08:39:23	serhiy.storchaka	set	messageid: <1491986363.23.0.567694316296.issue30049@psf.upfronthosting.co.za>
2017-04-12 08:39:23	serhiy.storchaka	link	issue30049 messages
2017-04-12 08:39:22	serhiy.storchaka	create