This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: reversed(mylist) much slower on Python 3.8 32-bit for Windows
Type: performance Stage: resolved
Components: Interpreter Core, Windows Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, Stefan Pochmann, paul.moore, rhettinger, serhiy.storchaka, steve.dower, terry.reedy, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2020-02-01 21:42 by Stefan Pochmann, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (8)
msg361195 - (view) Author: Stefan Pochmann (Stefan Pochmann) * Date: 2020-02-01 21:42
Somehow `reversed` became much slower than `iter` on lists:

List with 1,000 elements:

> python -m timeit -s "a = list(range(1000))" "list(iter(a))"
50000 loops, best of 5: 5.73 usec per loop

> python -m timeit -s "a = list(range(1000))" "list(reversed(a))"
20000 loops, best of 5: 14.2 usec per loop

List with 1,000,000 elements:

> python -m timeit -s "a = list(range(250)) * 4000" "list(iter(a))"
50 loops, best of 5: 7.08 msec per loop

> python -m timeit -s "a = list(range(250)) * 4000" "list(reversed(a))"
20 loops, best of 5: 15.5 msec per loop

On another machine I tried ten different Python versions and found out it's only version 3.8.1 and only the 32-bit version:

                32-bit              64-bit
CPython     iter   reversed     iter   reversed
 3.5.4      19.8     19.9       22.4     22.7
 3.6.8      19.8     19.9       22.3     22.6
 3.7.6      19.9     19.9       22.3     22.5
 3.8.1      19.8     24.9       22.4     22.6

Another time with 3.8.0 instead of 3.8.1:

                32-bit              64-bit
CPython     iter   reversed     iter   reversed
 3.5.4      19.5     19.6       21.9     22.2
 3.6.8      19.5     19.7       21.8     22.1
 3.7.6      19.5     19.6       21.7     22.0
 3.8.0      19.4     24.5       21.7     22.1

I used the "Stable Releases" "executable installer"s from here:
https://www.python.org/downloads/windows/

More details here: https://stackoverflow.com/q/60005302/12671057
msg361196 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2020-02-01 21:47
Your 3.8.0 numbers are similar to 3.8.1
msg361197 - (view) Author: Stefan Pochmann (Stefan Pochmann) * Date: 2020-02-01 21:51
Oh right. The times are correct, I just summarized wrongly there.
msg361198 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-02-01 22:13
The code for listiter_next() and listreviter_next() is almost the same. 

The main difference that the code for reversed saves the index to Py_ssize_t variable.  Maybe that causes a 32-bit to 64-bit conversion and back.  The change was made on 30 March 2016 in fbb1c5ee068d209e33f6e15ecb4821d5d8b107fa

Another possible cause is that this is just a random build outcome due to PGO or incidental branch mis-prediction from aliasing (as described in https://stackoverflow.com/a/17906589/1001643 ).

On the python.org mac 64-bit build, I'm not seeing any difference, so this is unique to the Windows 32-bit build.

$ python3.7 -m timeit -r11 -s "a = list(range(250)) * 4000" "list(iter(a))"
100 loops, best of 11: 3.08 msec per loop
$ python3.7 -m timeit -r11 -s "a = list(range(250)) * 4000" "list(reversed(a))"
100 loops, best of 11: 3.04 msec per loop
$ python3.8 -m timeit -r11 -s "a = list(range(250)) * 4000" "list(iter(a))"
100 loops, best of 11: 3.07 msec per loop
$ python3.8 -m timeit -r11 -s "a = list(range(250)) * 4000" "list(reversed(a))"
100 loops, best of 11: 3.01 msec per loop
msg361662 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-02-09 20:12
I do not see the within-version slowdown on fresh Windows 10, Pro 64, 32-bit debug builds.

3.7: 26.4, 27.0
3.8: 26.8, 27.1  # installed normal 64 bit is 7.4, 7.4
3.9: 30.2, 30.1  # 10% slower
msg361810 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-11 13:00
I close this issue. It's likely just a hiccup in the PGO compilation. It's not the thing that we can easily control. The good thing is that the common code path iter(list) is efficient ;-)


> The code for listiter_next() and listreviter_next() is almost the same. 

Right. It cannot explain a 2x slowdown.


> python -m timeit -s "a = list(range(1000))" "list(iter(a))"
> 50000 loops, best of 5: 5.73 usec per loop

It means around 5.73 ns per iteration. This is almost "nothing": just a few CPU cycles. For such microbenchmark, you are very close to the bare metal. You have to take in account CPU low-level metrics like usage of the CPU caches.


> Another possible cause is that this is just a random build outcome due to PGO or incidental branch mis-prediction from aliasing (as described in https://stackoverflow.com/a/17906589/1001643 ).

If someone cares about such microbenchmark, I suggest to get access to a profiling tool and measure the CPU cache usage and other metrics like that. On Linux, I know the "perf" command which can be used. I don't know performance tooling on Windows. Maybe search in Intel developer tools.

I expect that list(iter(a)) better uses the CPU (cache? branch predictor?) than list(reversed(a)), because of how listiter_next() and listreviter_next() have been optimized.

Bad code placement has a high cost on performance on such microbenchmarks. See:

* https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86
* https://vstinner.github.io/analysis-python-performance-issue.html
msg361827 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-02-11 17:48
Just FYI, there is no PGO run on the official 32-bit builds. I only run it on the 64-bit build.

It might be worth investigating differences between the 3.7 and 3.8 implementations of reversed().
msg361833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-11 22:02
> Just FYI, there is no PGO run on the official 32-bit builds. I only run it on the 64-bit build.

Aaaah :-) That's good to know ;-) It can explain the performance difference.
History
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83702
2020-02-11 22:02:53vstinnersetmessages: + msg361833
2020-02-11 17:48:42steve.dowersetmessages: + msg361827
2020-02-11 13:00:48vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg361810

stage: resolved
2020-02-09 20:12:42terry.reedysetnosy: + terry.reedy, paul.moore, tim.golden, zach.ware, steve.dower
messages: + msg361662
components: + Windows
2020-02-01 22:13:40rhettingersetnosy: + rhettinger
messages: + msg361198
2020-02-01 21:51:17Stefan Pochmannsetmessages: + msg361197
2020-02-01 21:48:23Stefan Pochmannsettitle: reversed(mylist) much slower on Python 3.8.1 32-bit for Windows -> reversed(mylist) much slower on Python 3.8 32-bit for Windows
2020-02-01 21:47:04SilentGhostsetnosy: + SilentGhost, vstinner, serhiy.storchaka
messages: + msg361196
components: + Interpreter Core, - Build
2020-02-01 21:42:09Stefan Pochmanncreate