Message 343405 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	peadar
Recipients	bkabrda, markmcclain, opoplawski, peadar, vstinner
Date	2019-05-24.16:15:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1558714527.85.0.131061792966.issue21131@roundup.psfhosted.org>
In-reply-to

Content
Hi - we ran into what looks like exactly this issue on an x86_64 sporadically, and tracked down the root cause. When faulthandler.c uses sigaltstack(2), the stack size is set up with a buffer of size SIGSTKSZ. That is, sadly, only 8k. When a signal is raised, before the handler is called, the kernel stores the machine state on the user's (possibly "alternate") stack. The size of that state is very much variable, depending on the CPU. When we chain the signal handler in the sigaction variant of the code in faulthandler, we raise the signal with the existing handler still on the stack, and save a second copy of the CPU state. Finally, when any part of that signal handler has to invoke a function that requires the dynamic linker's intervention to resolve, it will call some form of _dl_runtime_resolve* - likely _dl_runtime_resolve_xsave or _dl_runtime_resolve_xsavec. These functions will also have to save machine state. So, how big is the machine state? Well, it depends on the CPU. On one machine I have access to, /proc/cpuinfo shows "Intel(R) Xeon(R) CPU E5-2640 v4", I have: > (gdb) p _rtld_local_ro._dl_x86_cpu_features.xsave_state_size > $1 = 896 On another machine, reporting as "Intel(R) Xeon(R) Gold 5118 CPU", I have: > (gdb) p _rtld_local_ro._dl_x86_cpu_features.xsave_state_size > $1 = 2560 This means that the required stack space to hold 3 sets of CPU state is over 7.5k. And, for the signal handlers, it's actually worse: more like 3.25k per frame. A chained signal handler that needs to invoke dynamic linking will therefore consume more than the default stack space allocated in faulthandler.c, just in machine-state saves alone. So, the failing test is failing because its scribbling on random memory before the allocated stack space. My guess is that the previous architectures this manifested in have larger stack demands for signal handling than x86_64, but clearly newer x86_64 processors are starting to get tickled by this. Fix is pretty simple - just allocate more stack space. The attached patch uses pthread_attr_getstacksize to find the system's default stack size, and then uses that as the default, and also defines an absolute minimum stack size of 1M. This fixes the issue on our machine with the big xsave state size. (I'm sure I'm getting the feature test macros wrong for testing for pthreads availability) Also, I think in the case of a threaded environment, using the altstack might not be the best choice - I think multiple threads handling signals that run on that stack will wind up stomping on the same memory - is there a strong reason to maintain this altstack behaviour?

Hi - we ran into what looks like exactly this issue on an x86_64 sporadically, and tracked down the root cause.

When faulthandler.c uses sigaltstack(2), the stack size is set up with a buffer of size SIGSTKSZ. That is, sadly, only 8k.

When a signal is raised, before the handler is called, the kernel stores the machine state on the user's (possibly "alternate") stack. The size of that state is very much variable, depending on the CPU.

When we chain the signal handler in the sigaction variant of the code in faulthandler, we raise the signal with the existing handler still on the stack, and save a second copy of the CPU state.

Finally, when any part of that signal handler has to invoke a function that requires the dynamic linker's intervention to resolve, it will call some form of _dl_runtime_resolve* - likely _dl_runtime_resolve_xsave or _dl_runtime_resolve_xsavec.

These functions will also have to save machine state. So, how big is the machine state? Well, it depends on the CPU. 
On one machine I have access to, /proc/cpuinfo shows "Intel(R) Xeon(R) CPU E5-2640 v4", I have:

> (gdb) p _rtld_local_ro._dl_x86_cpu_features.xsave_state_size
> $1 = 896

On another machine, reporting as "Intel(R) Xeon(R) Gold 5118 CPU", I have:

> (gdb) p _rtld_local_ro._dl_x86_cpu_features.xsave_state_size
> $1 = 2560

This means that the required stack space to hold 3 sets of CPU state is over 7.5k. And, for the signal handlers, it's actually worse: more like 3.25k per frame. A chained signal handler that needs to invoke dynamic linking will therefore consume more than the default stack space allocated in faulthandler.c, just in machine-state saves alone. So, the failing test is failing because its scribbling on random memory before the allocated stack space.

My guess is that the previous architectures this manifested in have larger stack demands for signal handling than x86_64, but clearly newer x86_64 processors are starting to get tickled by this.

Fix is pretty simple - just allocate more stack space. The attached patch uses pthread_attr_getstacksize to find the system's default stack size, and then uses that as the default, and also defines an absolute minimum stack size of 1M. This fixes the issue on our machine with the big xsave state size. (I'm sure I'm getting the feature test macros wrong for testing for pthreads availability)

Also, I think in the case of a threaded environment, using the altstack might not be the best choice - I think multiple threads handling signals that run on that stack will wind up stomping on the same memory - is there a strong reason to maintain this altstack behaviour?

History
Date	User	Action	Args
2019-05-24 16:15:27	peadar	set	recipients: + peadar, vstinner, bkabrda, opoplawski, markmcclain
2019-05-24 16:15:27	peadar	set	messageid: <1558714527.85.0.131061792966.issue21131@roundup.psfhosted.org>
2019-05-24 16:15:27	peadar	link	issue21131 messages
2019-05-24 16:15:26	peadar	create