Message 361810 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	SilentGhost, Stefan Pochmann, paul.moore, rhettinger, serhiy.storchaka, steve.dower, terry.reedy, tim.golden, vstinner, zach.ware
Date	2020-02-11.13:00:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1581426048.08.0.76213788951.issue39521@roundup.psfhosted.org>
In-reply-to

Content
I close this issue. It's likely just a hiccup in the PGO compilation. It's not the thing that we can easily control. The good thing is that the common code path iter(list) is efficient ;-) > The code for listiter_next() and listreviter_next() is almost the same. Right. It cannot explain a 2x slowdown. > python -m timeit -s "a = list(range(1000))" "list(iter(a))" > 50000 loops, best of 5: 5.73 usec per loop It means around 5.73 ns per iteration. This is almost "nothing": just a few CPU cycles. For such microbenchmark, you are very close to the bare metal. You have to take in account CPU low-level metrics like usage of the CPU caches. > Another possible cause is that this is just a random build outcome due to PGO or incidental branch mis-prediction from aliasing (as described in https://stackoverflow.com/a/17906589/1001643 ). If someone cares about such microbenchmark, I suggest to get access to a profiling tool and measure the CPU cache usage and other metrics like that. On Linux, I know the "perf" command which can be used. I don't know performance tooling on Windows. Maybe search in Intel developer tools. I expect that list(iter(a)) better uses the CPU (cache? branch predictor?) than list(reversed(a)), because of how listiter_next() and listreviter_next() have been optimized. Bad code placement has a high cost on performance on such microbenchmarks. See: * https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86 * https://vstinner.github.io/analysis-python-performance-issue.html

I close this issue. It's likely just a hiccup in the PGO compilation. It's not the thing that we can easily control. The good thing is that the common code path iter(list) is efficient ;-)


> The code for listiter_next() and listreviter_next() is almost the same. 

Right. It cannot explain a 2x slowdown.


> python -m timeit -s "a = list(range(1000))" "list(iter(a))"
> 50000 loops, best of 5: 5.73 usec per loop

It means around 5.73 ns per iteration. This is almost "nothing": just a few CPU cycles. For such microbenchmark, you are very close to the bare metal. You have to take in account CPU low-level metrics like usage of the CPU caches.


> Another possible cause is that this is just a random build outcome due to PGO or incidental branch mis-prediction from aliasing (as described in https://stackoverflow.com/a/17906589/1001643 ).

If someone cares about such microbenchmark, I suggest to get access to a profiling tool and measure the CPU cache usage and other metrics like that. On Linux, I know the "perf" command which can be used. I don't know performance tooling on Windows. Maybe search in Intel developer tools.

I expect that list(iter(a)) better uses the CPU (cache? branch predictor?) than list(reversed(a)), because of how listiter_next() and listreviter_next() have been optimized.

Bad code placement has a high cost on performance on such microbenchmarks. See:

* https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86
* https://vstinner.github.io/analysis-python-performance-issue.html

History
Date	User	Action	Args
2020-02-11 13:00:48	vstinner	set	recipients: + vstinner, rhettinger, terry.reedy, paul.moore, tim.golden, SilentGhost, zach.ware, serhiy.storchaka, steve.dower, Stefan Pochmann
2020-02-11 13:00:48	vstinner	set	messageid: <1581426048.08.0.76213788951.issue39521@roundup.psfhosted.org>
2020-02-11 13:00:48	vstinner	link	issue39521 messages
2020-02-11 13:00:47	vstinner	create