classification
Title: test_exceptions.ExceptionTests.test_recursion_in_except_handler stack overflow on Windows debug builds
Type: crash Stage: resolved
Components: Windows Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Mark.Shannon, kj, orsenthil, pablogsal, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2021-06-08 09:47 by kj, last changed 2021-09-07 16:45 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
bench.py vstinner, 2021-09-06 23:30
Pull Requests
URL Status Linked Edit
PR 26596 merged pablogsal, 2021-06-08 11:07
PR 26623 merged Mark.Shannon, 2021-06-09 11:01
PR 26627 closed kj, 2021-06-09 18:06
PR 28190 merged vstinner, 2021-09-06 22:21
Messages (17)
msg395316 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-06-08 09:47
Hi all, I created this issue after discussion in https://bugs.python.org/issue39573#msg395206:

In issue39573 "[C API] Make PyObject an opaque structure in the limited C API" the commit f3fa63ec75fdbb4a08a10957a5c631bf0c4a5970 ("Py_TYPE becomes a static inline function") may have caused test_recursion_in_except_handler to fail only on windows debug builds (according to git bisect anyways) due to higher stack consumption causing a stack overflow.

That test was added in 4e7a69bdb63a104587759d7784124492dcdd496e (Dec 2020), and was passing until last week. Currently another test (bpo-11105, test_ast test_recursion_direct) is masking the broken test, but it's also a stack overflow in windows debug. When I reverted commit f3fa63ec75fdbb4a08a10957a5c631bf0c4a5970, test_recursion_in_except_handler passes but test_recursion_direct still fails.

For test_recursion_direct, the overflow occurs after the Py_Enter/LeaveRecursiveCall guard, when calling some other function like _PyObject_LookupAttr.

I can think of a few possible ways to fix this (in no particular order):

1. Increase stack size for windows builds from 2MB (some valid points against this though, see msg395177).
2. Decrease the global recursion limit only for windows debug.
3. Custom recursion limits for each module depending on OS, which Batuhan has been working on for AST module at GH-26550.
4. Skip the tests on windows debug, since most people on windows use release builds anyways which are unaffected.

Thanks for your time! Which approach would you prefer?
msg395320 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-08 11:02
There are many buildbot failures today and this is getting a bit out of hand so I'm starting with a revert of f3fa63ec75fdbb4a08a10957a5c631bf0c4a5970 until we have a better plan.
msg395322 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-08 11:24
New changeset 6d518bb3a11f9b16098f45b21a13ebe8f537f045 by Pablo Galindo in branch 'main':
bpo-44348: Revert "bpo-39573: Py_TYPE becomes a static inline function (GH-26493)" (GH-26596)
https://github.com/python/cpython/commit/6d518bb3a11f9b16098f45b21a13ebe8f537f045
msg395340 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-06-08 16:51
> Decrease the global recursion limit only for windows debug.

This is already what we do, so if someone has increased stack usage, they should also decrease the value here.

Hopefully the non-debug value doesn't have to be reduced. In that case, it's worth looking at the stack allocations to reduce it (e.g. I once fixed this by moving a rarely used buffer into a dynamic allocation rather than a local, which saved about 1KB/frame).
msg395392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-09 08:06
See also bpo-44360 "test_compile segfault on AMD64 Ubuntu 3.x".
msg395451 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-06-09 18:07
@Steve thanks for the info,

> This is already what we do, so if someone has increased stack usage, they should also decrease the value here.

I see the recursion limit set in _PyEval_InitState but I don't see any special settings for windows. Is it somewhere else or am I missing something here :( ?

BTW, I'm thinking about changing the CI checks for windows to test on debug builds. I find it strange that the macOS and ubuntu CIs test on debug builds while windows seemingly doesn't[1]. If we change it, similar errors may be caught earlier at the PR stage rather than when it's committed and causes the buildbots to fail. What do you think?

[1] https://github.com/python/cpython/blob/main/.github/workflows/build.yml#L95
msg395514 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-06-10 07:47
New changeset 54cb63863f19a7c64d9a3a5fd97bdfc0dd7ab374 by Mark Shannon in branch 'main':
bpo-44348: Move trace-info to thread-state (GH-26623)
https://github.com/python/cpython/commit/54cb63863f19a7c64d9a3a5fd97bdfc0dd7ab374
msg400992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-03 13:27
See bpo-45094: "Consider using __forceinline and __attribute__((always_inline)) on static inline functions (Py_INCREF, Py_TYPE) for debug builds".
msg401150 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 16:13
Another *working* approach to fix this issue: bpo-45115 "Windows: enable compiler optimizations when building Python in debug mode".
msg401161 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-09-06 17:08
It should also be possible to reduce the stack size of each frame. We've done that before, because there were multiple temporary buffers allocated on the stack that were trivially moved into functions. I bet we can also reduce the number of indirections, as last time I had to investigate a native stack we seem to have bloated up to 6-7 C frames for each Python frame.

Failing that, I'd rather have an option to build with higher stack size just for tests (or users who care that much - my experience is that people tend to care more about many Python processes rather than high recursion). That only affects python[w][_uwp].exe.

We could also add a threading API to allow creating new threads with larger stack size. That would allow us to write tests that can do it, and also enable users who may run into it. I'd prefer Windows just be able to expand the stack better at runtime, but the reserved address space apparently wins the back-compat argument.
msg401185 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 22:34
test_recursion_in_except_handler() creates chained of exceptions. When an exception is deallocated, it calls the deallocator of another exception, etc.

* recurse_in_except() sub-test creates chains of 11 nested deallocator calls
* recurse_in_body_and_except() sub-test creates a chain of 8192 nested deallocator calls

I'm not sure how recurse_in_body_and_except() can creates a chain which is so long, knowing that the test sets the recursion limit to 44 frames (default Python limit is 1000).
msg401194 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 23:30
bench.py: pyperf benchmarks on BaseException_dealloc().

Benchmarks results on PR 28190 on Fedora 34 with LTO (I didn't use PGO nor CPU pinning). It says "faster" but it's likely noise in the build or in the benchmark (again, I didn't use CPU pinning, and I used my laptop while the benchmark was running. IMO it means that the trashcan overhead is not significant on this benchmark.

$ python3 -m pyperf compare_to ref.json trashcan.json --table
+----------------+---------+-----------------------+
| Benchmark      | ref     | trashcan              |
+================+=========+=======================+
| list 10_000    | 1.27 ms | 1.18 ms: 1.08x faster |
+----------------+---------+-----------------------+
| chain 10_000   | 3.94 ms | 3.89 ms: 1.01x faster |
+----------------+---------+-----------------------+
| Geometric mean | (ref)   | 1.01x faster          |
+----------------+---------+-----------------------+

Benchmark hidden because not significant (6): list 10, list 100, list 1000, chain 10, chain 100, chain 1000
msg401195 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 23:31
Using trashcan, it's possible to create a long longer than 10 000 exceptions. But currently, Python does crash in this case, so I stopped the benchmark at 10 000.
msg401269 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-07 13:42
New changeset fb305092a5d7894b41f122c1a1117b3abf4c567e by Victor Stinner in branch 'main':
bpo-44348: BaseException deallocator uses trashcan (GH-28190)
https://github.com/python/cpython/commit/fb305092a5d7894b41f122c1a1117b3abf4c567e
msg401271 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-07 13:43
Even if Python 3.9 and 3.10 have the idea, I don't think that long exception chains are common enough to justify a backport. I prefer to leave the code as it it in stable branches, and see how things go with the change in the main branch. I close the issue.
msg401305 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-07 16:40
I rejected my two other ideas to fix this issue:

* bpo-45094: Consider using __forceinline and __attribute__((always_inline)) on static inline functions (Py_INCREF, Py_TYPE) for debug builds
* bpo-45115: Windows: enable compiler optimizations when building Python in debug mode
msg401307 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-07 16:45
I marked bpo-27277 as a duplicate of this issue.
History
Date User Action Args
2021-09-07 16:45:25vstinnersetmessages: + msg401307
2021-09-07 16:45:07vstinnerlinkissue27277 superseder
2021-09-07 16:40:43vstinnersetmessages: + msg401305
2021-09-07 13:43:45vstinnersetstatus: open -> closed
versions: - Python 3.9, Python 3.10
messages: + msg401271

resolution: fixed
stage: patch review -> resolved
2021-09-07 13:42:19vstinnersetmessages: + msg401269
2021-09-06 23:31:23vstinnersetmessages: + msg401195
2021-09-06 23:30:10vstinnersetfiles: + bench.py

messages: + msg401194
2021-09-06 22:34:24vstinnersetmessages: + msg401185
2021-09-06 22:21:55vstinnersetpull_requests: + pull_request26617
2021-09-06 17:08:31steve.dowersetmessages: + msg401161
2021-09-06 16:13:24vstinnersetmessages: + msg401150
2021-09-03 13:27:41vstinnersetmessages: + msg400992
2021-06-10 07:47:08Mark.Shannonsetmessages: + msg395514
2021-06-09 18:07:33kjsetmessages: + msg395451
2021-06-09 18:06:59kjsetpull_requests: + pull_request25213
2021-06-09 11:01:27Mark.Shannonsetnosy: + Mark.Shannon
pull_requests: + pull_request25208
2021-06-09 10:31:02orsenthilsetnosy: + orsenthil
2021-06-09 08:06:57vstinnersetmessages: + msg395392
2021-06-08 16:51:19steve.dowersetmessages: + msg395340
2021-06-08 11:24:57pablogsalsetmessages: + msg395322
2021-06-08 11:07:12pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25179
2021-06-08 11:02:30pablogsalsetmessages: + msg395320
2021-06-08 09:47:57kjcreate