This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients pablogsal, seberg, vstinner
Date 2020-12-30.11:38:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1609328302.87.0.0458429530287.issue40522@roundup.psfhosted.org>
In-reply-to
Content
One GIL per interpreter requires to store the tstate per thread. I don't see any other option. We need to replace the global _PyRuntime atomic variable with a TLS variable. I'm trying to reduce the overhead, but it's heard to beat the performance of an atomic variable.

That's also we I modified many functions to pass explicitly tstate to subfunctions in internal C functions, to avoid any possible overhead of getting tstate.

https://vstinner.github.io/cpython-pass-tstate.html


Pablo:
> In MacOS is quite challenging to activate LTO, so normally optimized builds are only done with PGO.

Oh right, I forgot macOS. I should check how TLS is compiled on macOS. IMO wwo MOV instead of MOV is not a major performance bottleneck.

The best would be to be able to avoid pthread_getspecific() function which is less efficient than a TLS variable. The glibc implementation uses an array for a few variables (first 32 variables?) and then a slower hash table.


Pablo:
> Also in Windows I am not sure is possible to use LTO. Same for many other platforms.

I will check how it's implemented on Windows.

We cannot use TLS on all platforms, since it requires C11 features which are not available on all platforms. Also, the implementation depends on the architecture.
History
Date User Action Args
2020-12-30 11:38:22vstinnersetrecipients: + vstinner, seberg, pablogsal
2020-12-30 11:38:22vstinnersetmessageid: <1609328302.87.0.0458429530287.issue40522@roundup.psfhosted.org>
2020-12-30 11:38:22vstinnerlinkissue40522 messages
2020-12-30 11:38:22vstinnercreate