Issue 30747: _Py_atomic_* not actually atomic on Windows with MSVC

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74932

classification

Title:	_Py_atomic_* not actually atomic on Windows with MSVC
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Paxxi, Segev Finer, ethan smith, jyasskin, mark.dickinson, paul.moore, pitrou, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority:	normal	Keywords:

Created on 2017-06-24 19:35 by Paxxi, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 2383	merged	Paxxi, 2017-06-24 20:16
PR 3140	merged	Segev Finer, 2017-08-18 17:27

Messages (16)
msg296786 - (view)	Author: Pär Björklund (Paxxi) *	Date: 2017-06-24 19:35
_Py_atomic_store and _Py_atomic_load are not implemented as atomic operations on Windows when using Visual Studio to build. This might cause hard to troubleshoot behaviour, especially in third party hosting applications..
msg296793 - (view)	Author: Steve Dower (steve.dower) *	Date: 2017-06-24 22:59
Unless you've got an example of this causing actual issues, it should only go into 3.7. All platforms supported by Windows guarantee atomicity for aligned, pointer-sized reads and writes, but there appear to be memory fencing semantics in here too. I don't understand entirely why we need this code at all, so I'll let Victor or Serhiy comment.
msg296816 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-06-25 09:06
The general issue those macros want to prevent is that modern CPUs have a tendency to execute a lot of stuff out-of-order, including memory operations. From the perspective of a single hardware core (or thread, really), that's fine since it has a logically consistent view of the machine's state (it knows which operations have been reordered, which values have been committed to cache or not, etc.). But what happens when another hardware core examines in-memory state at the same time? It might find values changing in a different order than the programmer had intended. If it's important that the visible order hasn't been changed, you have a bug. Note that C "volatile" is not enough: it only prevents the compiler from re-ordering or eliding memory accesses, but not the CPU. As such, "volatile" is only useful if you have a single word-sized piece of state that you need to inspect from several threads at once. But the eval loop uses several of them, and therefore needs to prevent the CPU from writing or reading them in the wrong order (which may produce synchronization bugs such as deadlocks). Also note that traditionally, x86 has a "strong" memory ordering model which prevents nasty kinds of reorderings to happen. But other architectures such as ARM or SPARC can have much weaker memory models (IIRC, we had sporadic hangs on an ARM buildbot a long time ago, because of that, but I can't find a reference). I admit I'm unable to vouch that the current code is correct. Jeffrey Yasskin did the original change adding the Py_atomic types and using them in the GIL implementation.
msg296820 - (view)	Author: Pär Björklund (Paxxi) *	Date: 2017-06-25 10:33
Antoine said it best. It's very hard to prove that this code is correct or incorrect as it requires multiple threads accessing the same variable and very specific timings to produce an actual issue. My PR only solved half of the issue because I didn't really have a good idea for how to handle the load macro but I think it out today so I'll be updating the PR. Would there be any interest of implementing them for MSVC/ARM as well? It's basically the same code so not much work, however I don't have a platform to actually test it.
msg296823 - (view)	Author: Steve Dower (steve.dower) *	Date: 2017-06-25 13:26
> Would there be any interest of implementing them for MSVC/ARM as well Sure, since you're there. It's not easy to test, but I know people who are doing it, so it'll get noticed eventually. Maybe there's some sort of stress test we can write that is likely to encounter issues? I've done that before to detect concurrency issues in particularly complex code.
msg296846 - (view)	Author: Jeffrey Yasskin (jyasskin) *	Date: 2017-06-26 05:10
Has enough time passed that you can use the C11 atomic types and operations instead of special-casing these for each compiler? (e.g. http://en.cppreference.com/w/c/atomic/atomic_store)
msg296856 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-06-26 08:27
Unfortunately not: """Python versions greater than or equal to 3.6 use C89 with several select C99 features: [...]""" (PEP 7).
msg296857 - (view)	Author: Pär Björklund (Paxxi) *	Date: 2017-06-26 08:30
Microsoft don't spend much time on the C compiler features, still lacking C99 features so I don't have much hope of getting C11 support anytime soon or at all. One could of course implement a cross platform stdatomic library that matches the C11 spec.
msg300191 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-12 09:19
New changeset e664d7f89d2b9960d9049237136396e824795cac by Antoine Pitrou (Pär Björklund) in branch 'master': bpo-30747: Attempt to fix atomic load/store (#2383) https://github.com/python/cpython/commit/e664d7f89d2b9960d9049237136396e824795cac
msg300310 - (view)	Author: Steve Dower (steve.dower) *	Date: 2017-08-15 19:45
This commit added a lot of warnings - http://buildbot.python.org/all/builders/AMD64%20Windows10%203.x/builds/1056 Most are of the form: warning C4133: 'function': incompatible types - from 'volatile uintptr_t ' to 'volatile int ' I can't obviously see where they are coming from, but someone please fix them.
msg300330 - (view)	Author: Pär Björklund (Paxxi) *	Date: 2017-08-16 07:36
My guess would be the cast from uintptr_t to intptr_t in the return type. I'll look into this, when possible, should have some time later this week or over the weekend.
msg300609 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-20 22:45
New changeset 0267128aa4dc7c383c341c19833c5d506eae9c93 by Antoine Pitrou (Segev Finer) in branch 'master': bpo-9566 & bpo-30747: Silence warnings from pyatomic.h macros (#3140) https://github.com/python/cpython/commit/0267128aa4dc7c383c341c19833c5d506eae9c93
msg300611 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-20 22:47
Thanks to Segev Finer for submitting the PR which fixed the warnings!
msg319482 - (view)	Author: Ethan Smith (ethan smith) *	Date: 2018-06-13 18:28
When working on clang-cl support, I was advised here https://reviews.llvm.org/D47672#1131325 that we may be using hardware lock elision incorrectly. Copying from there: > I also spoke to Andi Kleen here at Intel to make sure I got these inline assembly versions correct. And he's not sure CPython should be using these the way it is. It looks like they try to use the HLE versions anytime the memory order is acquire/release. But HLE isn't suitable for every acquire/release. I believe if we just use the simple _InterlockedExchange/_InterlockedCompareExchange intrinsics, things should be safer.
msg319539 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2018-06-14 19:32
I would be ok with reverting to the non-HLE variants. Does anyone want to test the performance implications on TSX-enabled hardware?
msg319541 - (view)	Author: Pär Björklund (Paxxi) *	Date: 2018-06-14 19:54
The HLE variants were simply chosen to match the semantics on other platforms with regard to aquire/release. If Intel engineers say the plain versions are better that's good enough for me. It would be interesting seeing some benchmarks but I don't have any idea on how to reliably test the non happy path. On Thu, 14 Jun 2018, 21:32 Antoine Pitrou, <report@bugs.python.org> wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > I would be ok with reverting to the non-HLE variants. Does anyone want to > test the performance implications on TSX-enabled hardware? > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue30747> > _______________________________________ >

History
Date	User	Action	Args
2022-04-11 14:58:48	admin	set	github: 74932
2018-06-14 19:54:35	Paxxi	set	messages: + msg319541
2018-06-14 19:32:43	pitrou	set	messages: + msg319539
2018-06-13 18:28:35	ethan smith	set	nosy: + ethan smith messages: + msg319482
2017-08-20 22:47:44	pitrou	set	nosy: + Segev Finer messages: + msg300611
2017-08-20 22:46:23	pitrou	set	status: open -> closed resolution: fixed stage: needs patch -> resolved
2017-08-20 22:45:48	pitrou	set	messages: + msg300609
2017-08-18 17:27:53	Segev Finer	set	pull_requests: + pull_request3175
2017-08-16 07:36:10	Paxxi	set	messages: + msg300330
2017-08-15 19:45:03	steve.dower	set	status: closed -> open resolution: fixed -> (no value) messages: + msg300310 stage: backport needed -> needs patch
2017-08-12 09:19:47	pitrou	set	status: open -> closed resolution: fixed stage: patch review -> backport needed
2017-08-12 09:19:35	pitrou	set	messages: + msg300191
2017-06-26 08:30:48	Paxxi	set	messages: + msg296857
2017-06-26 08:27:13	pitrou	set	messages: + msg296856
2017-06-26 05:10:17	jyasskin	set	messages: + msg296846
2017-06-25 13:26:34	steve.dower	set	messages: + msg296823
2017-06-25 10:33:10	Paxxi	set	messages: + msg296820
2017-06-25 09:06:51	pitrou	set	nosy: + pitrou, jyasskin messages: + msg296816
2017-06-25 03:41:29	serhiy.storchaka	set	nosy: + mark.dickinson
2017-06-24 22:59:05	steve.dower	set	nosy: + vstinner, serhiy.storchaka messages: + msg296793 versions: - Python 3.5, Python 3.6
2017-06-24 22:05:11	pitrou	set	nosy: + paul.moore, tim.golden, zach.ware, steve.dower stage: patch review type: enhancement -> behavior versions: + Python 3.5
2017-06-24 20:16:29	Paxxi	set	pull_requests: + pull_request2433
2017-06-24 19:35:59	Paxxi	create