Message361810
I close this issue. It's likely just a hiccup in the PGO compilation. It's not the thing that we can easily control. The good thing is that the common code path iter(list) is efficient ;-)
> The code for listiter_next() and listreviter_next() is almost the same.
Right. It cannot explain a 2x slowdown.
> python -m timeit -s "a = list(range(1000))" "list(iter(a))"
> 50000 loops, best of 5: 5.73 usec per loop
It means around 5.73 ns per iteration. This is almost "nothing": just a few CPU cycles. For such microbenchmark, you are very close to the bare metal. You have to take in account CPU low-level metrics like usage of the CPU caches.
> Another possible cause is that this is just a random build outcome due to PGO or incidental branch mis-prediction from aliasing (as described in https://stackoverflow.com/a/17906589/1001643 ).
If someone cares about such microbenchmark, I suggest to get access to a profiling tool and measure the CPU cache usage and other metrics like that. On Linux, I know the "perf" command which can be used. I don't know performance tooling on Windows. Maybe search in Intel developer tools.
I expect that list(iter(a)) better uses the CPU (cache? branch predictor?) than list(reversed(a)), because of how listiter_next() and listreviter_next() have been optimized.
Bad code placement has a high cost on performance on such microbenchmarks. See:
* https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86
* https://vstinner.github.io/analysis-python-performance-issue.html |
|
Date |
User |
Action |
Args |
2020-02-11 13:00:48 | vstinner | set | recipients:
+ vstinner, rhettinger, terry.reedy, paul.moore, tim.golden, SilentGhost, zach.ware, serhiy.storchaka, steve.dower, Stefan Pochmann |
2020-02-11 13:00:48 | vstinner | set | messageid: <1581426048.08.0.76213788951.issue39521@roundup.psfhosted.org> |
2020-02-11 13:00:48 | vstinner | link | issue39521 messages |
2020-02-11 13:00:47 | vstinner | create | |
|