msg387348 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-02-19 20:26 |
AMD64 Windows10 3.x, build 856:
https://buildbot.python.org/all/#/builders/146/builds/856
17 tests failed:
test_exceptions test_fileio test_io test_isinstance test_json
test_lib2to3 test_logging test_pickle test_pickletools
test_plistlib test_richcmp test_runpy test_sys test_threading
test_traceback test_typing test_xml_etree
It may be a regression caused by:
"bpo-43166: Disable ceval.c optimisations for Windows debug builds (GH-24485)"
https://github.com/python/cpython/commit/b74396c3167cc780f01309148db02709bc37b432
The latest successful build was 11 days ago (Feb 9):
https://buildbot.python.org/all/#/builders/146/builds/819
----
0:01:46 load avg: 14.25 [ 61/426/1] test_xml_etree crashed (Exit code 3221225725) -- running: test_tokenize (46.4 sec), test_largefile (1 min 10 sec)
Windows fatal exception: stack overflow
Current thread 0x0000110c (most recent call first):
File "D:\buildarea\3.x.bolen-windows10\build\lib\xml\etree\ElementTree.py", line 178 in __repr__
File "D:\buildarea\3.x.bolen-windows10\build\lib\xml\etree\ElementTree.py", line 178 in __repr__
File "D:\buildarea\3.x.bolen-windows10\build\lib\xml\etree\ElementTree.py", line 178 in __repr__
...
======================================================================
FAIL: test_recursion_limit (test.test_threading.ThreadingExceptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\buildarea\3.x.bolen-windows10\build\lib\test\test_threading.py", line 1203, in test_recursion_limit
self.assertEqual(p.returncode, 0, "Unexpected error: " + stderr.decode())
AssertionError: 3221225725 != 0 : Unexpected error:
0:05:09 load avg: 29.65 [140/426/3] test_traceback crashed (Exit code 3221225725) -- running: test_concurrent_futures (2 min 6 sec), test_mmap (2 min)
Windows fatal exception: stack overflow
Current thread 0x00000118 (most recent call first):
File "D:\buildarea\3.x.bolen-windows10\build\lib\test\test_traceback.py", line 1154 in f
File "D:\buildarea\3.x.bolen-windows10\build\lib\test\test_traceback.py", line 1156 in f
File "D:\buildarea\3.x.bolen-windows10\build\lib\test\test_traceback.py", line 1156 in f
...
0:06:14 load avg: 29.15 [151/426/4] test_lib2to3 crashed (Exit code 3221225725) -- running: test_multiprocessing_spawn (31.7 sec), test_capi (1 min 3 sec), test_mmap (3 min 4 sec)
Windows fatal exception: stack overflow
Current thread 0x00001b90 (most recent call first):
File "D:\buildarea\3.x.bolen-windows10\build\lib\lib2to3\pytree.py", line 496 in generate_matches
File "D:\buildarea\3.x.bolen-windows10\build\lib\lib2to3\pytree.py", line 845 in generate_matches
...
|
msg387351 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2021-02-19 20:40 |
Looks like someone increased the size of locals in the ceval functions.
If any new buffers have been added recently, moving the processing into a helper function or using a single buffer instead of allocating them in separate if/case blocks can help reduce it.
|
msg387352 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-02-19 20:46 |
If someone wants to measure the stack memory usage per function call, _testcapi.stack_pointer() can be used: see bpo-30866.
|
msg387816 - (view) |
Author: David Bolen (db3l) * |
Date: 2021-02-28 20:04 |
I don't think it's actually any change in ceval per se, or any new buffers, just how the compiler code generation has changed due to this commit.
Based on some local testing, the triggering issue is the exclusion of the optimization pragma entirely in debug mode. It appears that the existing code required the pragma to be enabled to stay within the available stack space.
A quick reproduction is running test_pickletools. It fails abruptly and quickly, at least on the worker, without the pragma, but runs to completion if enabled.
Perhaps that aspect of the commit should be reverted, maybe with a separate flag for debugging ceval (but not normal testing)? Alternatively, is there a way to increase the stack in debug mode to compensate? Though I guess that would risk missing a release-build only stack issue.
|
msg387825 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2021-03-01 01:05 |
Thanks David!
So the crash only happens in debug mode?
PEP 651 promises to take reduce the C stack usage in the future. Therefore, I would be okay with rolling this back for the time being.
If we do roll it back, maybe add a comment to ceval.c explaining
that even in debug mode this file is optimized?
(The change to the pragma can stay, those two letters have been unused for a decade.)
|
msg387826 - (view) |
Author: David Bolen (db3l) * |
Date: 2021-03-01 01:39 |
I hadn't tested release mode earlier, since the commit only removed the pragma in debug builds, but I just built a release build with the commit in place on the worker and it seems fine.
So barring stack changes (enlarging or improving usage) it appears the optimize pragma is required on Windows, at least in order to pass the current test suite.
|
msg387840 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2021-03-01 04:41 |
Thanks! That's new info, at least, it wasn't clear to me from Victor's
original comment that this was for a debug build (even though as you say it
can be deduced from context).
Do you feel like submitting a PR along the lines I suggested?
|
msg387853 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2021-03-01 11:04 |
PEP 651 would prevent any crashes, although some of the tests might still fail with a StackOverflowException, if they weren't expecting a RecursionError.
|
msg387943 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2021-03-02 17:26 |
We already have a separate recursion limit for debug builds on Windows, so it's probably best all around to reduce that (if it's only recursion tests that are failing). That way, we won't forget to fix up debugging later if/when we fix recursion.
|
msg387966 - (view) |
Author: David Bolen (db3l) * |
Date: 2021-03-02 20:43 |
Steve, where is that configured? If reducing that further would resolve the crashes while retaining ceval debugging, maybe that's a reasonable trade-off, though based on my testing, reverting still seems simpler.
Right now the debug build on the buildbot appears use the standard recursion limit of 1000, and some quick grep-fu didn't find a clear spot (either internally or just for the tests) where it would be reduced. Could perhaps how it's done be something that doesn't influence buildbot builds?
The failing tests do all appear to be recursion tests of one form or another. Based on further testing, we'd need to have a recursion limit of 290 or below to get most of the test suite to pass. Presumably something like 250 for a bit of head room.
However, lowering the limit appears to cover only most of the failures, not all. Some tests directly play with the limits (like test_exceptions.test_recursion_in_except_handler and test_sys.test_recursionlimit_recovery) and still get into trouble, aborting the process, no matter how low the default recursion limit is. so those tests also need changes to pass. In addition, lowering to 290 created one new failure - test_compile.test_extended_arg now fails with a recursion error. It seems to need a limit of about 900 to pass; I guess the test could temporarily reset or something but that seems especially kludgy.
Out of curiousity, are these these failures not occurring elsewhere? It seems like something that would be happening in general (at least for anyone compiling debug builds). I thought of checking CI builds, but they appear to use release mode, so would be unaffected.
|
msg387978 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2021-03-02 22:30 |
Hmm, yeah, it seems like the debug-specific definition has gone. Wonder when that happened.
In that case, go ahead and revert, but I'd suggest doing it by just removing the new Py_DEBUG condition rather than undoing all the changes. Should be nothing wrong with the rest of it.
IIRC, PR builds run under release because the compile-time increase is more than made up by the test time decrease.
|
msg387983 - (view) |
Author: David Bolen (db3l) * |
Date: 2021-03-02 23:11 |
Yes, that was the idea (revert the PyDEBUG condition only); it sounds like everyone is on the same page.
I've been doing buildbot duties only for a while (no code contributions since long before the current process), so there may be a short delay while I figure out this newfangled PR thing...
|
msg388072 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2021-03-04 03:09 |
New changeset 131d5516409791b170b09a6ef8ed8463c9b09015 by db3l in branch 'master':
bpo-43271: Re-enable ceval.c optimizations for Windows debug builds (GH-24739)
https://github.com/python/cpython/commit/131d5516409791b170b09a6ef8ed8463c9b09015
|
msg388073 - (view) |
Author: David Bolen (db3l) * |
Date: 2021-03-04 03:30 |
The win10 buildbot is green again, so I think this can be closed.
|
msg388074 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2021-03-04 04:21 |
Thanks for all your research and the PR!
|
msg388289 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-03-08 19:28 |
Thanks for the fix David Bolen ;-)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:41 | admin | set | github: 87437 |
2022-03-21 07:09:09 | neonene | set | nosy:
+ neonene, pablogsal
pull_requests:
+ pull_request30112 |
2021-03-08 19:28:19 | vstinner | set | messages:
+ msg388289 |
2021-03-04 04:21:30 | gvanrossum | set | status: open -> closed resolution: fixed messages:
+ msg388074
stage: patch review -> resolved |
2021-03-04 03:30:07 | db3l | set | messages:
+ msg388073 |
2021-03-04 03:09:56 | gvanrossum | set | messages:
+ msg388072 |
2021-03-04 01:58:32 | db3l | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request23510 |
2021-03-02 23:11:16 | db3l | set | messages:
+ msg387983 |
2021-03-02 22:30:39 | steve.dower | set | messages:
+ msg387978 |
2021-03-02 20:43:45 | db3l | set | messages:
+ msg387966 |
2021-03-02 17:26:29 | steve.dower | set | messages:
+ msg387943 |
2021-03-01 11:04:29 | Mark.Shannon | set | messages:
+ msg387853 |
2021-03-01 04:41:20 | gvanrossum | set | messages:
+ msg387840 |
2021-03-01 01:39:40 | db3l | set | messages:
+ msg387826 |
2021-03-01 01:05:03 | gvanrossum | set | messages:
+ msg387825 |
2021-02-28 20:04:32 | db3l | set | nosy:
+ db3l messages:
+ msg387816
|
2021-02-19 23:38:46 | pablogsal | set | priority: normal -> release blocker |
2021-02-19 22:51:04 | gvanrossum | set | nosy:
+ gvanrossum, Mark.Shannon
|
2021-02-19 20:46:42 | vstinner | set | messages:
+ msg387352 |
2021-02-19 20:40:21 | steve.dower | set | messages:
+ msg387351 |
2021-02-19 20:26:54 | vstinner | create | |