classification
Title: Don't cache tp_iternext
Type: behavior Stage: resolved
Components: Extension Modules, Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: josh.r, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-04-12 08:39 by serhiy.storchaka, last changed 2017-04-19 00:41 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1088 closed serhiy.storchaka, 2017-04-12 08:47
Messages (3)
msg291530 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-12 08:39
Some operations cache the value of the tp_iternext slot and call it in a loop. But calling tp_iternext can run arbitrary code, release GIL and change the tp_iternext slot in the same or other thread. This can leads to visible behavior difference, such as different iterated values, different raised exceptions or infinite loop. In the past this even could cause a crash (see issue3720), but seems now this is impossible.

Examples (list constructor caches tp_iternext, tuple constructor doesn't):

>>> def make_iter():
...     class Iterator:
...         def __iter__(self):
...             return self
...         def __next__(self):
...             del Iterator.__next__
...             return 1
...     return Iterator()
... 
>>> tuple(make_iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Iterator' object is not iterable
>>> list(make_iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __next__
>>> 
>>> def make_iter2():
...     it2 = iter((2,))
...     def subiter():
...         Iterator.__next__ = Iterator.next2
...         yield 1
...     class Iterator(filter):
...         def next2(self):
...             return next(it2)
...     return Iterator(lambda x: True, subiter())
... 
>>> tuple(make_iter2())
(1, 2)
>>> list(make_iter2())
[1]

The tp_iternext is cached for performance, and removing the caching can cause performance regression. But actually the difference is very small. I found a measurable difference (up to 5%) only in following artificial examples in which tp_iternext is called in very tight and long loops:

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(None, a))'python-orig: ..................... 47.2 ms +- 0.2 ms
python: ..................... 49.7 ms +- 0.3 ms

Mean +- std dev: [python-orig] 47.2 ms +- 0.2 ms -> [python] 49.7 ms +- 0.3 ms: 1.05x slower (+5%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 1000000, None))'
python-orig: ..................... 15.4 ms +- 0.1 ms
python: ..................... 16.0 ms +- 0.2 ms

Mean +- std dev: [python-orig] 15.4 ms +- 0.1 ms -> [python] 16.0 ms +- 0.2 ms: 1.04x slower (+4%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 1000000), 0)'
python-orig: ..................... 14.2 ms +- 0.1 ms
python: ..................... 14.8 ms +- 0.2 ms

Mean +- std dev: [python-orig] 14.2 ms +- 0.1 ms -> [python] 14.8 ms +- 0.2 ms: 1.05x slower (+5%)

In all other other cases, when involved creation of a collection (list, bytearray, deque (with maxlen != 0) constructors, ''.join), or calling other code (builtins all, max, map, itertools functions), or for shorter loops the difference is hardly distinguished from the random noise.

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000' -- 'list(iter(a))'
python-orig: ..................... 31.8 us +- 0.3 us
python: ..................... 31.8 us +- 0.4 us

Mean +- std dev: [python-orig] 31.8 us +- 0.3 us -> [python] 31.8 us +- 0.4 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]*1000' -- 'all(a)'
python-orig: ..................... 47.4 us +- 0.2 us
python: ..................... 48.0 us +- 0.3 us

Mean +- std dev: [python-orig] 47.4 us +- 0.2 us -> [python] 48.0 us +- 0.3 us: 1.01x slower (+1%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]*1000' -- 'max(a)'
python-orig: ..................... 108 us +- 1 us
python: ..................... 108 us +- 1 us

Mean +- std dev: [python-orig] 108 us +- 1 us -> [python] 108 us +- 1 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(lambda x: x, a))'
python-orig: ..................... 527 ms +- 8 ms
python: ..................... 528 ms +- 2 ms

Mean +- std dev: [python-orig] 527 ms +- 8 ms -> [python] 528 ms +- 2 ms: 1.00x slower (+0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 100, None))'
python-orig: ..................... 4.72 us +- 0.05 us
python: ..................... 4.72 us +- 0.04 us

Mean +- std dev: [python-orig] 4.72 us +- 0.05 us -> [python] 4.72 us +- 0.04 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 100), 0)'
python-orig: ..................... 4.16 us +- 0.11 us
python: ..................... 4.11 us +- 0.05 us

Mean +- std dev: [python-orig] 4.16 us +- 0.11 us -> [python] 4.11 us +- 0.05 us: 1.01x faster (-1%)
msg291818 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2017-04-18 02:13
I'm having a hard time thinking of legitimate cases where replacing __next__ mid-iteration is a thing. Doing so retroactively on the class (so it changes all outstanding iterators, no matter what state they're in) seems dubious at best, and it wouldn't be thread-safe if many different ways if iterators of said type are being used in multiple threads.

Is the dynamic type lookup on every iteration necessary too? Would failing to do so cause crashes if evil iterator reassigns self.__class__ or something?

I feel like if segfaults aren't a possibility (I haven't been able to cause one on Py 3.5 by reassigning the class's __next__ or the instance's __class__), it's not our job to fix intentionally broken code at the expense of all other code. Raymond okayed the previous issue because it was a segfaulting bug, but it seems like this is just a "broken code is broken" bug.
msg291854 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-19 00:41
> it seems like this is just a "broken code is broken" bug

I agree.  Unless there were a crasher of some sort, I prefer to keep the existing code, some of which has been deployed successfully for well over a decade.
History
Date User Action Args
2017-04-19 00:41:26rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg291854

stage: patch review -> resolved
2017-04-18 02:13:33josh.rsetnosy: + josh.r
messages: + msg291818
2017-04-12 11:07:10rhettingersetassignee: rhettinger
2017-04-12 08:47:13serhiy.storchakasetpull_requests: + pull_request1231
2017-04-12 08:39:23serhiy.storchakacreate