msg395316 - (view) |
Author: Ken Jin (kj) * |
Date: 2021-06-08 09:47 |
Hi all, I created this issue after discussion in https://bugs.python.org/issue39573#msg395206:
In issue39573 "[C API] Make PyObject an opaque structure in the limited C API" the commit f3fa63ec75fdbb4a08a10957a5c631bf0c4a5970 ("Py_TYPE becomes a static inline function") may have caused test_recursion_in_except_handler to fail only on windows debug builds (according to git bisect anyways) due to higher stack consumption causing a stack overflow.
That test was added in 4e7a69bdb63a104587759d7784124492dcdd496e (Dec 2020), and was passing until last week. Currently another test (bpo-11105, test_ast test_recursion_direct) is masking the broken test, but it's also a stack overflow in windows debug. When I reverted commit f3fa63ec75fdbb4a08a10957a5c631bf0c4a5970, test_recursion_in_except_handler passes but test_recursion_direct still fails.
For test_recursion_direct, the overflow occurs after the Py_Enter/LeaveRecursiveCall guard, when calling some other function like _PyObject_LookupAttr.
I can think of a few possible ways to fix this (in no particular order):
1. Increase stack size for windows builds from 2MB (some valid points against this though, see msg395177).
2. Decrease the global recursion limit only for windows debug.
3. Custom recursion limits for each module depending on OS, which Batuhan has been working on for AST module at GH-26550.
4. Skip the tests on windows debug, since most people on windows use release builds anyways which are unaffected.
Thanks for your time! Which approach would you prefer?
|
msg395320 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2021-06-08 11:02 |
There are many buildbot failures today and this is getting a bit out of hand so I'm starting with a revert of f3fa63ec75fdbb4a08a10957a5c631bf0c4a5970 until we have a better plan.
|
msg395322 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2021-06-08 11:24 |
New changeset 6d518bb3a11f9b16098f45b21a13ebe8f537f045 by Pablo Galindo in branch 'main':
bpo-44348: Revert "bpo-39573: Py_TYPE becomes a static inline function (GH-26493)" (GH-26596)
https://github.com/python/cpython/commit/6d518bb3a11f9b16098f45b21a13ebe8f537f045
|
msg395340 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2021-06-08 16:51 |
> Decrease the global recursion limit only for windows debug.
This is already what we do, so if someone has increased stack usage, they should also decrease the value here.
Hopefully the non-debug value doesn't have to be reduced. In that case, it's worth looking at the stack allocations to reduce it (e.g. I once fixed this by moving a rarely used buffer into a dynamic allocation rather than a local, which saved about 1KB/frame).
|
msg395392 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-06-09 08:06 |
See also bpo-44360 "test_compile segfault on AMD64 Ubuntu 3.x".
|
msg395451 - (view) |
Author: Ken Jin (kj) * |
Date: 2021-06-09 18:07 |
@Steve thanks for the info,
> This is already what we do, so if someone has increased stack usage, they should also decrease the value here.
I see the recursion limit set in _PyEval_InitState but I don't see any special settings for windows. Is it somewhere else or am I missing something here :( ?
BTW, I'm thinking about changing the CI checks for windows to test on debug builds. I find it strange that the macOS and ubuntu CIs test on debug builds while windows seemingly doesn't[1]. If we change it, similar errors may be caught earlier at the PR stage rather than when it's committed and causes the buildbots to fail. What do you think?
[1] https://github.com/python/cpython/blob/main/.github/workflows/build.yml#L95
|
msg395514 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2021-06-10 07:47 |
New changeset 54cb63863f19a7c64d9a3a5fd97bdfc0dd7ab374 by Mark Shannon in branch 'main':
bpo-44348: Move trace-info to thread-state (GH-26623)
https://github.com/python/cpython/commit/54cb63863f19a7c64d9a3a5fd97bdfc0dd7ab374
|
msg400992 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-03 13:27 |
See bpo-45094: "Consider using __forceinline and __attribute__((always_inline)) on static inline functions (Py_INCREF, Py_TYPE) for debug builds".
|
msg401150 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-06 16:13 |
Another *working* approach to fix this issue: bpo-45115 "Windows: enable compiler optimizations when building Python in debug mode".
|
msg401161 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2021-09-06 17:08 |
It should also be possible to reduce the stack size of each frame. We've done that before, because there were multiple temporary buffers allocated on the stack that were trivially moved into functions. I bet we can also reduce the number of indirections, as last time I had to investigate a native stack we seem to have bloated up to 6-7 C frames for each Python frame.
Failing that, I'd rather have an option to build with higher stack size just for tests (or users who care that much - my experience is that people tend to care more about many Python processes rather than high recursion). That only affects python[w][_uwp].exe.
We could also add a threading API to allow creating new threads with larger stack size. That would allow us to write tests that can do it, and also enable users who may run into it. I'd prefer Windows just be able to expand the stack better at runtime, but the reserved address space apparently wins the back-compat argument.
|
msg401185 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-06 22:34 |
test_recursion_in_except_handler() creates chained of exceptions. When an exception is deallocated, it calls the deallocator of another exception, etc.
* recurse_in_except() sub-test creates chains of 11 nested deallocator calls
* recurse_in_body_and_except() sub-test creates a chain of 8192 nested deallocator calls
I'm not sure how recurse_in_body_and_except() can creates a chain which is so long, knowing that the test sets the recursion limit to 44 frames (default Python limit is 1000).
|
msg401194 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-06 23:30 |
bench.py: pyperf benchmarks on BaseException_dealloc().
Benchmarks results on PR 28190 on Fedora 34 with LTO (I didn't use PGO nor CPU pinning). It says "faster" but it's likely noise in the build or in the benchmark (again, I didn't use CPU pinning, and I used my laptop while the benchmark was running. IMO it means that the trashcan overhead is not significant on this benchmark.
$ python3 -m pyperf compare_to ref.json trashcan.json --table
+----------------+---------+-----------------------+
| Benchmark | ref | trashcan |
+================+=========+=======================+
| list 10_000 | 1.27 ms | 1.18 ms: 1.08x faster |
+----------------+---------+-----------------------+
| chain 10_000 | 3.94 ms | 3.89 ms: 1.01x faster |
+----------------+---------+-----------------------+
| Geometric mean | (ref) | 1.01x faster |
+----------------+---------+-----------------------+
Benchmark hidden because not significant (6): list 10, list 100, list 1000, chain 10, chain 100, chain 1000
|
msg401195 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-06 23:31 |
Using trashcan, it's possible to create a long longer than 10 000 exceptions. But currently, Python does crash in this case, so I stopped the benchmark at 10 000.
|
msg401269 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-07 13:42 |
New changeset fb305092a5d7894b41f122c1a1117b3abf4c567e by Victor Stinner in branch 'main':
bpo-44348: BaseException deallocator uses trashcan (GH-28190)
https://github.com/python/cpython/commit/fb305092a5d7894b41f122c1a1117b3abf4c567e
|
msg401271 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-07 13:43 |
Even if Python 3.9 and 3.10 have the idea, I don't think that long exception chains are common enough to justify a backport. I prefer to leave the code as it it in stable branches, and see how things go with the change in the main branch. I close the issue.
|
msg401305 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-07 16:40 |
I rejected my two other ideas to fix this issue:
* bpo-45094: Consider using __forceinline and __attribute__((always_inline)) on static inline functions (Py_INCREF, Py_TYPE) for debug builds
* bpo-45115: Windows: enable compiler optimizations when building Python in debug mode
|
msg401307 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-09-07 16:45 |
I marked bpo-27277 as a duplicate of this issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:46 | admin | set | github: 88514 |
2021-09-07 16:45:25 | vstinner | set | messages:
+ msg401307 |
2021-09-07 16:45:07 | vstinner | link | issue27277 superseder |
2021-09-07 16:40:43 | vstinner | set | messages:
+ msg401305 |
2021-09-07 13:43:45 | vstinner | set | status: open -> closed versions:
- Python 3.9, Python 3.10 messages:
+ msg401271
resolution: fixed stage: patch review -> resolved |
2021-09-07 13:42:19 | vstinner | set | messages:
+ msg401269 |
2021-09-06 23:31:23 | vstinner | set | messages:
+ msg401195 |
2021-09-06 23:30:10 | vstinner | set | files:
+ bench.py
messages:
+ msg401194 |
2021-09-06 22:34:24 | vstinner | set | messages:
+ msg401185 |
2021-09-06 22:21:55 | vstinner | set | pull_requests:
+ pull_request26617 |
2021-09-06 17:08:31 | steve.dower | set | messages:
+ msg401161 |
2021-09-06 16:13:24 | vstinner | set | messages:
+ msg401150 |
2021-09-03 13:27:41 | vstinner | set | messages:
+ msg400992 |
2021-06-10 07:47:08 | Mark.Shannon | set | messages:
+ msg395514 |
2021-06-09 18:07:33 | kj | set | messages:
+ msg395451 |
2021-06-09 18:06:59 | kj | set | pull_requests:
+ pull_request25213 |
2021-06-09 11:01:27 | Mark.Shannon | set | nosy:
+ Mark.Shannon pull_requests:
+ pull_request25208
|
2021-06-09 10:31:02 | orsenthil | set | nosy:
+ orsenthil
|
2021-06-09 08:06:57 | vstinner | set | messages:
+ msg395392 |
2021-06-08 16:51:19 | steve.dower | set | messages:
+ msg395340 |
2021-06-08 11:24:57 | pablogsal | set | messages:
+ msg395322 |
2021-06-08 11:07:12 | pablogsal | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request25179 |
2021-06-08 11:02:30 | pablogsal | set | messages:
+ msg395320 |
2021-06-08 09:47:57 | kj | create | |