This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call
Type: performance Stage: patch review
Components: Build Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: corona10, methane, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2022-02-01 13:06 by vstinner, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
stack_overflow-4.py vstinner, 2022-02-01 13:40
Pull Requests
URL Status Linked Edit
PR 31052 merged vstinner, 2022-02-01 13:12
PR 31058 merged vstinner, 2022-02-01 16:24
Messages (14)
msg412252 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:06
Measure using this script on the main branch (commit 108e66b6d23efd0fc2966163ead9434b328c5f17):
---
import _testcapi
def f(): yield _testcapi.stack_pointer()
print(_testcapi.stack_pointer() - next(f()))
---

Stack usage depending on the compiler and compiler optimization level:

* clang -O0: 9,104 bytes
* clang -Og: 736 bytes
* gcc -O0: 6,784 bytes
* gcc -Og: 624 bytes

-O0 allocates around 10x more memory.

Moreover, "./configure --with-pydebug CC=clang" uses -O0 in CFLAGS, because "clang --help" output doesn't containt "-Og". I'm working on a configure change to use -Og on clang which supports it.
msg412253 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:15
GH-31052 enables -Og when using clang and ./configure --with-pydebug and so the example uses 736 bytes instead of 9,104 bytes.
msg412254 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:16
This issue is a follow-up of bpo-46542 "test_json and test_lib2to3 crash on s390x Fedora Clang 3.x buildbot".
msg412255 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:22
Previous issues about stack memory usage, work done in 2017:

* bpo-28870: Reduce stack consumption of PyObject_CallFunctionObjArgs() and like
* bpo-29227: Reduce C stack consumption in function calls
* bpo-29465: Modify _PyObject_FastCall() to reduce stack consumption
29464

I summarized the results in the "Stack consumption" section of my article: https://vstinner.github.io/contrib-cpython-2017q1.html
msg412256 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:25
See also bpo-30866: "Add _testcapi.stack_pointer() to measure the C stack consumption".
msg412258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:40
stack_overflow-4.py: Update script from bpo-30866 to measure stack memory usage before Python crash or raises a RecursionError.

I had to modify the script since calling a Python function from a Python function no longer allocates (additional) memory on the stack! See bpo-45256 "Remove the usage of the C stack in Python to Python calls".
msg412260 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:46
stack_overflow-4.py output depending on the compiler and compiler flags.

gcc -O3 (./configure):
---
test_python_call: 11904 calls before crash, stack: 704.1 bytes/call
test_python_iterator: 17460 calls before crash, stack: 480.0 bytes/call
test_python_getitem: 245760 calls before recursion error, stack: 0.2 bytes/call

=> total: 275124 calls, 1184.3 bytes per call
---

It's better than stack memory usage in 2017: https://bugs.python.org/issue30866#msg297826


clang -O3 (./configure CC=clang):
---
test_python_call: 10270 calls before crash, stack: 816.1 bytes/call
test_python_iterator: 14155 calls before crash, stack: 592.0 bytes/call
test_python_getitem: 245760 calls before recursion error, stack: 0.3 bytes/call

=> total: 270185 calls, 1408.4 bytes per call
---

clang allocates a little bit more memory on the stack than gcc.

I didn't try PGO or LTO yet.
msg412261 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 13:47
New changeset 0515eafe55ce7699e3bbc3c1555f08073d43b790 by Victor Stinner in branch 'main':
bpo-46600: ./configure --with-pydebug uses -Og with clang (GH-31052)
https://github.com/python/cpython/commit/0515eafe55ce7699e3bbc3c1555f08073d43b790
msg412278 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-02-01 15:25
PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :(
msg412286 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 16:06
> PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :(

test_gdb fails if Python is built with clang -Og. I don't think that it's a regression. It's just that previously, buildbots using clang only build Python with -O0 or -O3.

I'm investigating the test_gdb issue: it's easy to reproduce on Linux (clang 13.0.0). I may skip test_gdb is Python is built with clang -Og.
msg412294 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-01 17:12
New changeset bebaa95fd0f44babf8b6bcffd8f2908c73ca259e by Victor Stinner in branch 'main':
bpo-46600: Fix test_gdb.test_pycfunction() for clang -Og (GH-31058)
https://github.com/python/cpython/commit/bebaa95fd0f44babf8b6bcffd8f2908c73ca259e
msg412334 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2022-02-02 02:50
FWIW, it seems -O0 don't merge local variables in different path or lifetime.

For example, see _Py_abspath

```
    if (path[0] == '\0' || !wcscmp(path, L".")) {
       wchar_t cwd[MAXPATHLEN + 1];
       //(snip)
    }
    //(snip)
    wchar_t cwd[MAXPATHLEN + 1];
```

wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes.
-Og allocates 32856 bytes for it and -Og allocates 16440 bytes for it.

I don't know what is the specific optimization flag in -Og do merge local variable, but I think -Og is very important for _PyEval_EvalFrameDefault() since it has many local variables in huge switch-case statements.
-Og allocates 312 bytes for it and -O0 allocates 8280 bytes for it.

By the way, clang 13 has `-fstack-usage` option like gcc, but clang 12 don't have it.
Since Ubuntu 20.04 have only clang 12, I use `-fstack-size-segment` and https://github.com/mvanotti/stack-sizes to get stack size.
msg412348 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-02 10:04
> For example, see _Py_abspath

For functions which are commonly called in Python at runtime, it may be worth it to manually merged large local variables to save a few bytes on the stack when Python is built with -O0. For _Py_abspath(), this function is only called at startup, if I recall correctly, so it should be a big issue in practice.
msg412407 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2022-02-03 00:00
I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and -Og is so different.

We can reduce stack usage of it easily, but it is not a problem than _PyEval_EvalFrameDefault.
It is difficult to reduce stack usage of _PyEval_EvalFrameDefault with -O0.
History
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90758
2022-02-03 00:00:28methanesetmessages: + msg412407
2022-02-02 10:04:49vstinnersetmessages: + msg412348
2022-02-02 05:16:18corona10setnosy: + corona10
2022-02-02 02:50:12methanesetnosy: + methane
messages: + msg412334
2022-02-01 17:12:36vstinnersetmessages: + msg412294
2022-02-01 16:24:05vstinnersetpull_requests: + pull_request29240
2022-02-01 16:06:29vstinnersetmessages: + msg412286
2022-02-01 15:25:55pablogsalsetnosy: + pablogsal
messages: + msg412278
2022-02-01 13:47:26vstinnersetmessages: + msg412261
2022-02-01 13:46:28vstinnersetmessages: + msg412260
2022-02-01 13:40:57vstinnersetfiles: + stack_overflow-4.py

messages: + msg412258
2022-02-01 13:25:28vstinnersetmessages: + msg412256
2022-02-01 13:22:24vstinnersetmessages: + msg412255
2022-02-01 13:16:29vstinnersetmessages: + msg412254
2022-02-01 13:15:31vstinnersetmessages: + msg412253
2022-02-01 13:12:00vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request29234
2022-02-01 13:06:00vstinnercreate