msg412252 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:06 |
Measure using this script on the main branch (commit 108e66b6d23efd0fc2966163ead9434b328c5f17):
---
import _testcapi
def f(): yield _testcapi.stack_pointer()
print(_testcapi.stack_pointer() - next(f()))
---
Stack usage depending on the compiler and compiler optimization level:
* clang -O0: 9,104 bytes
* clang -Og: 736 bytes
* gcc -O0: 6,784 bytes
* gcc -Og: 624 bytes
-O0 allocates around 10x more memory.
Moreover, "./configure --with-pydebug CC=clang" uses -O0 in CFLAGS, because "clang --help" output doesn't containt "-Og". I'm working on a configure change to use -Og on clang which supports it.
|
msg412253 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:15 |
GH-31052 enables -Og when using clang and ./configure --with-pydebug and so the example uses 736 bytes instead of 9,104 bytes.
|
msg412254 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:16 |
This issue is a follow-up of bpo-46542 "test_json and test_lib2to3 crash on s390x Fedora Clang 3.x buildbot".
|
msg412255 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:22 |
Previous issues about stack memory usage, work done in 2017:
* bpo-28870: Reduce stack consumption of PyObject_CallFunctionObjArgs() and like
* bpo-29227: Reduce C stack consumption in function calls
* bpo-29465: Modify _PyObject_FastCall() to reduce stack consumption
29464
I summarized the results in the "Stack consumption" section of my article: https://vstinner.github.io/contrib-cpython-2017q1.html
|
msg412256 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:25 |
See also bpo-30866: "Add _testcapi.stack_pointer() to measure the C stack consumption".
|
msg412258 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:40 |
stack_overflow-4.py: Update script from bpo-30866 to measure stack memory usage before Python crash or raises a RecursionError.
I had to modify the script since calling a Python function from a Python function no longer allocates (additional) memory on the stack! See bpo-45256 "Remove the usage of the C stack in Python to Python calls".
|
msg412260 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:46 |
stack_overflow-4.py output depending on the compiler and compiler flags.
gcc -O3 (./configure):
---
test_python_call: 11904 calls before crash, stack: 704.1 bytes/call
test_python_iterator: 17460 calls before crash, stack: 480.0 bytes/call
test_python_getitem: 245760 calls before recursion error, stack: 0.2 bytes/call
=> total: 275124 calls, 1184.3 bytes per call
---
It's better than stack memory usage in 2017: https://bugs.python.org/issue30866#msg297826
clang -O3 (./configure CC=clang):
---
test_python_call: 10270 calls before crash, stack: 816.1 bytes/call
test_python_iterator: 14155 calls before crash, stack: 592.0 bytes/call
test_python_getitem: 245760 calls before recursion error, stack: 0.3 bytes/call
=> total: 270185 calls, 1408.4 bytes per call
---
clang allocates a little bit more memory on the stack than gcc.
I didn't try PGO or LTO yet.
|
msg412261 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 13:47 |
New changeset 0515eafe55ce7699e3bbc3c1555f08073d43b790 by Victor Stinner in branch 'main':
bpo-46600: ./configure --with-pydebug uses -Og with clang (GH-31052)
https://github.com/python/cpython/commit/0515eafe55ce7699e3bbc3c1555f08073d43b790
|
msg412278 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2022-02-01 15:25 |
PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :(
|
msg412286 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 16:06 |
> PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :(
test_gdb fails if Python is built with clang -Og. I don't think that it's a regression. It's just that previously, buildbots using clang only build Python with -O0 or -O3.
I'm investigating the test_gdb issue: it's easy to reproduce on Linux (clang 13.0.0). I may skip test_gdb is Python is built with clang -Og.
|
msg412294 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-01 17:12 |
New changeset bebaa95fd0f44babf8b6bcffd8f2908c73ca259e by Victor Stinner in branch 'main':
bpo-46600: Fix test_gdb.test_pycfunction() for clang -Og (GH-31058)
https://github.com/python/cpython/commit/bebaa95fd0f44babf8b6bcffd8f2908c73ca259e
|
msg412334 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2022-02-02 02:50 |
FWIW, it seems -O0 don't merge local variables in different path or lifetime.
For example, see _Py_abspath
```
if (path[0] == '\0' || !wcscmp(path, L".")) {
wchar_t cwd[MAXPATHLEN + 1];
//(snip)
}
//(snip)
wchar_t cwd[MAXPATHLEN + 1];
```
wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes.
-Og allocates 32856 bytes for it and -Og allocates 16440 bytes for it.
I don't know what is the specific optimization flag in -Og do merge local variable, but I think -Og is very important for _PyEval_EvalFrameDefault() since it has many local variables in huge switch-case statements.
-Og allocates 312 bytes for it and -O0 allocates 8280 bytes for it.
By the way, clang 13 has `-fstack-usage` option like gcc, but clang 12 don't have it.
Since Ubuntu 20.04 have only clang 12, I use `-fstack-size-segment` and https://github.com/mvanotti/stack-sizes to get stack size.
|
msg412348 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2022-02-02 10:04 |
> For example, see _Py_abspath
For functions which are commonly called in Python at runtime, it may be worth it to manually merged large local variables to save a few bytes on the stack when Python is built with -O0. For _Py_abspath(), this function is only called at startup, if I recall correctly, so it should be a big issue in practice.
|
msg412407 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2022-02-03 00:00 |
I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and -Og is so different.
We can reduce stack usage of it easily, but it is not a problem than _PyEval_EvalFrameDefault.
It is difficult to reduce stack usage of _PyEval_EvalFrameDefault with -O0.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:55 | admin | set | github: 90758 |
2022-02-03 00:00:28 | methane | set | messages:
+ msg412407 |
2022-02-02 10:04:49 | vstinner | set | messages:
+ msg412348 |
2022-02-02 05:16:18 | corona10 | set | nosy:
+ corona10
|
2022-02-02 02:50:12 | methane | set | nosy:
+ methane messages:
+ msg412334
|
2022-02-01 17:12:36 | vstinner | set | messages:
+ msg412294 |
2022-02-01 16:24:05 | vstinner | set | pull_requests:
+ pull_request29240 |
2022-02-01 16:06:29 | vstinner | set | messages:
+ msg412286 |
2022-02-01 15:25:55 | pablogsal | set | nosy:
+ pablogsal messages:
+ msg412278
|
2022-02-01 13:47:26 | vstinner | set | messages:
+ msg412261 |
2022-02-01 13:46:28 | vstinner | set | messages:
+ msg412260 |
2022-02-01 13:40:57 | vstinner | set | files:
+ stack_overflow-4.py
messages:
+ msg412258 |
2022-02-01 13:25:28 | vstinner | set | messages:
+ msg412256 |
2022-02-01 13:22:24 | vstinner | set | messages:
+ msg412255 |
2022-02-01 13:16:29 | vstinner | set | messages:
+ msg412254 |
2022-02-01 13:15:31 | vstinner | set | messages:
+ msg412253 |
2022-02-01 13:12:00 | vstinner | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request29234 |
2022-02-01 13:06:00 | vstinner | create | |