This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ctypes should support atomic operations
Type: enhancement Stage:
Components: ctypes Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Daniel Colascione, amaury.forgeotdarc, belopolsky, davin, meador.inge, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-10-01 01:50 by Daniel Colascione, last changed 2022-04-11 14:58 by admin.

Messages (14)
msg303442 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 01:50
Say we're using multiprocessing to share a counter between two processes and we want to atomically increment that counter. Right now, we need to protect that counter with a multiprocessing semaphore of some sort, then 1) acquire the semaphore, 2) read-modify-write the counter value, and 3) release the semaphore. What if we're preempted by a GIL-acquire request after step #1 but before step #3? We'll hold the semaphore until the OS scheduler gets around to running us again, which might be a while in the case of compute-bound tasks (especially if these tasks call C code that doesn't release the GIL).

Now, if some other process wants to increment the counter, it needs to wait on the first process's GIL! That partially defeats the purpose of multiprocessing: one of the nice things about multiprocessing is avoiding GIL-introduced latency!

If ctypes supported atomic operations, we could skip steps #1 and #3 entirely and operate directly on the shared memory region. Every operating system that supports threads at all also supports some kind of compare-and-exchange primitive. Compare-and-exchange is sufficient for avoiding the GIL contention I describe above.
msg303469 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-10-01 17:19
> Compare-and-exchange is sufficient for avoiding the GIL contention
> I describe above.

If Python objects are involved, it is more complicated than you suggest.  Possibly, multiprocessing can offer a shared counter that creates integer objects on demand and that offers guaranteed atomic increments and decrements (as semaphores) do.

> one of the nice things about multiprocessing is avoiding 
> GIL-introduced latency!

The primary way it achieves this benefit is by avoiding shared state altogether.
msg303471 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-01 17:45
While the use case is reasonable (if specialized), I'm not sure ctypes is the place to expose such functionality, which can be quite extensive (see https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). Perhaps as a separate package on PyPI?
msg303473 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 17:55
On Oct 1, 2017 10:19 AM, "Raymond Hettinger" <report@bugs.python.org> wrote:

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

> Compare-and-exchange is sufficient for avoiding the GIL contention
> I describe above.

If Python objects are involved, it is more complicated than you suggest.

Python objects are not involved. We're talking about memory manipulation on
the same level as ctypes.memmove.

Possibly, multiprocessing can offer a shared counter that creates integer
objects on demand and that offers guaranteed atomic increments and
decrements (as semaphores) do.

Why would it, when ctypes can provide generic functionality?

> one of the nice things about multiprocessing is avoiding
> GIL-introduced latency!

The primary way it achieves this benefit is by avoiding shared state
altogether.

Well, yes, but sometimes shared state is unavoidable, and it's best to
manipulate it as efficiently as possible.

----------
nosy: +davin, pitrou, rhettinger

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue31654>
_______________________________________
msg303474 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 18:04
On Oct 1, 2017 10:46 AM, "Antoine Pitrou" <report@bugs.python.org> wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

While the use case is reasonable (if specialized),

It's not that specialized. You might want atomic updates for coordinating
with C APIs that expect callers to have this capability.

not sure ctypes is the place to expose such functionality, which can be
quite extensive (see https://gcc.gnu.org/onlinedocs/gcc/_005f_
005fatomic-Builtins.html).

You don't need to provide all of those builtins. Users can build them in
Python out of atomic-compare-and-exchange. Only compare and exchange needs
C support. It's not very much code.

Perhaps

as a separate package on PyPI?

I have little interest in a separate PyPI module. I don't want to have to
distribute custom-compiled extension modules.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue31654>
_______________________________________
msg303475 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-01 18:12
Le 01/10/2017 à 20:04, Daniel Colascione a écrit :
> 
> It's not that specialized. You might want atomic updates for coordinating
> with C APIs that expect callers to have this capability.

That does sound specialized to me :-) Can you give an example of such a
C API?

> You don't need to provide all of those builtins. Users can build them in
> Python out of atomic-compare-and-exchange. Only compare and exchange needs
> C support. It's not very much code.

I'm assuming you're suggesting to write a loop with an
atomic-compare-and-exchange.  Bytecode execution in CPython being slow,
it means you risk a lot more contention (and busy looping) than if the
primitive was written in C.  Perhaps even a semaphore would be faster :-)

> I have little interest in a separate PyPI module. I don't want to have to
> distribute custom-compiled extension modules.

Understood, but that's not enough of an argument to put something in the
standard library...

You might want to float your idea on python-ideas to see if you get
support from people who have a similar need:
https://mail.python.org/mailman/listinfo/python-ideas
msg303476 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-01 18:14
Note that if there is already a C API to perform atomic ops, you can simply use ctypes to invoke that API.  Unfortunately, the aforementioned GCC builtins seem to be only available as intrinsics (at least I couldn't find a shared library that exposes the __atomic_* functions on my system).
msg303488 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 19:42
On Sun, Oct 1, 2017 at 11:14 AM, Antoine Pitrou <report@bugs.python.org>
wrote:

>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> Note that if there is already a C API to perform atomic ops, you can
> simply use ctypes to invoke that API.  Unfortunately, the aforementioned
> GCC builtins seem to be only available as intrinsics (at least I couldn't
> find a shared library that exposes the __atomic_* functions on my system).
>

Right. I performed the same search. On Windows, at least
InterlockedCompareExchange is exported from kernel32.
msg303489 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 19:51
On Sun, Oct 1, 2017 at 11:12 AM, Antoine Pitrou <report@bugs.python.org>
wrote:

>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> Le 01/10/2017 à 20:04, Daniel Colascione a écrit :
> >
> > It's not that specialized. You might want atomic updates for coordinating
> > with C APIs that expect callers to have this capability.
>
> That does sound specialized to me :-) Can you give an example of such a
> C API?
>

The Linux futex protocol, as described in man futex(7), comes to mind.
Maybe you want to manipulate C++ shared_ptr objects --- these objects also
rely on atomic operations. For these facilities, you need atomic operations
for *correctness*. Taking a mutex as an alternative is not an option
because there is no C-side mutex to take.

> > You don't need to provide all of those builtins. Users can build them in
> > Python out of atomic-compare-and-exchange. Only compare and exchange
> needs
> > C support. It's not very much code.
>
> I'm assuming you're suggesting to write a loop with an
> atomic-compare-and-exchange.  Bytecode execution in CPython being slow,
> it means you risk a lot more contention (and busy looping) than if the
> primitive was written in C.  Perhaps even a semaphore would be faster :-)
>

It's still faster than waiting several milliseconds for the GIL. Bytecode
isn't *that* slow --- according to ipython, this operation should take a
few hundred nanoseconds. Besides, in a JIT implementation, it'll be as fast
as native code.

>
> > I have little interest in a separate PyPI module. I don't want to have to
> > distribute custom-compiled extension modules.
>
> Understood, but that's not enough of an argument to put something in the
> standard library...
>
> You might want to float your idea on python-ideas to see if you get
> support from people who have a similar need:
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
I don't understand the opposition to this feature request. It's a trivial
amount of code (invoke a compiler intrinsic), makes the API more complete,
and addresses a real, specific use case as well as some other hypothetical
use cases. It costs nothing to add this functionality to the standard
library. The standard library already includes a whole web server and HTTP
client, a diff engine, various database engines, a facility for parsing
email, an NNTP client, a GUI system, and a facility for "determin[ing] the
type of sound [a] file". Why can the standard library include all of these
facilities and not a simple facility for performing a very common kind of
memory operation? Standard library support for this functionality is
essential, as it's not possible to implement in pure Python.
msg303491 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-01 21:01
Le 01/10/2017 à 21:51, Daniel Colascione a écrit :
> 
>> That does sound specialized to me :-) Can you give an example of such a
>> C API?
> 
> The Linux futex protocol, as described in man futex(7), comes to mind.
> Maybe you want to manipulate C++ shared_ptr objects --- these objects also
> rely on atomic operations.

That's even more specialized than I expected...

> It's still faster than waiting several milliseconds for the GIL.

Are you talking about https://bugs.python.org/issue31653? If so, it's
just waiting for an appropriate PR to be filed.

> I don't understand the opposition to this feature request. It's a trivial
> amount of code (invoke a compiler intrinsic), makes the API more complete,
> and addresses a real, specific use case as well as some other hypothetical
> use cases.

That's a compiler-dependent compiler intrinsic (or perhaps a whole range
of them, given there are different widths to cater for), an API wrapping
it, plus some documentation and tests, that we have to maintain until
the end of time (at least nominally).

> The standard library already includes a whole web server and HTTP
> client, a diff engine, various database engines, a facility for parsing
> email, an NNTP client, a GUI system, and a facility for "determin[ing] the
> type of sound [a] file".

It was determined at the time that the use cases for these justified the
effort of maintaining them in the stdlib. For a couple of these (such as
"determining the type of a sound file" or even an NNTP client), I expect
the decision would be different nowadays :-)

Perhaps other core developers will disagree with me and agree to include
(i.e. review, maintain) this functionality.  I simply am not convinced
it deserves being included, but that's not a veto.
msg303494 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-01 21:17
It is not clear to me what API is needed, but I agree with Antoine that ctypes doesn't look the appropriate place for it. Maybe in multiprocessing or subprocess, or in low-level module providing primitives for multiprocessing or subprocess?
msg303495 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 21:33
On Sun, Oct 1, 2017 at 2:01 PM, Antoine Pitrou <report@bugs.python.org>
wrote:

>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> Le 01/10/2017 à 21:51, Daniel Colascione a écrit :
> >
> >> That does sound specialized to me :-) Can you give an example of such a
> >> C API?
> >
> > The Linux futex protocol, as described in man futex(7), comes to mind.
> > Maybe you want to manipulate C++ shared_ptr objects --- these objects
> also
> > rely on atomic operations.
>
> That's even more specialized than I expected...
>

Huh? Both are very generic.

>
> > It's still faster than waiting several milliseconds for the GIL.
>
> Are you talking about https://bugs.python.org/issue31653? If so, it's
> just waiting for an appropriate PR to be filed.
>

This is a separate issue. That's about thrashing around less when we take a
lock. This issue is about process A not having to wait on process B to
schedule a thread in order to perform a simple operation on memory that
both processes own.

>
> > I don't understand the opposition to this feature request. It's a trivial
> > amount of code (invoke a compiler intrinsic), makes the API more
> complete,
> > and addresses a real, specific use case as well as some other
> hypothetical
> > use cases.
>
> That's a compiler-dependent compiler intrinsic (or perhaps a whole range
> of them, given there are different widths to cater for), an API wrapping
> it, plus some documentation and tests, that we have to maintain until
> the end of time (at least nominally).
>

It's trivial and easy to support conditionally. SCM_RIGHTS is "specialized"
and not supported on all systems, yet it's in stdlib.

>
> > The standard library already includes a whole web server and HTTP
> > client, a diff engine, various database engines, a facility for parsing
> > email, an NNTP client, a GUI system, and a facility for "determin[ing]
> the
> > type of sound [a] file".
>
> It was determined at the time that the use cases for these justified the
> effort of maintaining them in the stdlib. For a couple of these (such as
> "determining the type of a sound file" or even an NNTP client), I expect
> the decision would be different nowadays :-)
>
> Perhaps other core developers will disagree with me and agree to include
> (i.e. review, maintain) this functionality.  I simply am not convinced
> it deserves being included, but that's not a veto.
>

>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue31654>
> _______________________________________
>
msg303496 - (view) Author: Daniel Colascione (Daniel Colascione) Date: 2017-10-01 21:35
On Sun, Oct 1, 2017 at 2:01 PM, Antoine Pitrou <report@bugs.python.org>
wrote:

> Perhaps other core developers will disagree with me and agree to include
> (i.e. review, maintain) this functionality.  I simply am not convinced
> it deserves being included, but that's not a veto.

ctypes is a library for operating on native memory and working with native
functions. Performing atomic operations on memory is definitely within its
scope. Why does ctypes include memmove? Why memmove and not
compare-and-exchange? What evidence, if any, would convince you?
msg303499 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-01 21:53
Le 01/10/2017 à 23:33, Daniel Colascione a écrit :
> 
> Huh? Both are very generic.

"Specialized" as in "I didn't expect anyone would want to do such a
thing in pure Python".

> SCM_RIGHTS is "specialized"
> and not supported on all systems, yet it's in stdlib.

Because passing fds between processes was considered useful enough (it's
actually used by multiprocessing itself, for example to implement the
forkserver model).

And regardless, trying to point to other (more or less exotic) features
of the stdlib is not a convincing argument to add a new feature.
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 75835
2017-10-01 21:53:21pitrousetmessages: + msg303499
2017-10-01 21:35:18Daniel Colascionesetmessages: + msg303496
2017-10-01 21:33:20Daniel Colascionesetmessages: + msg303495
2017-10-01 21:17:47serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg303494
versions: + Python 3.7, - Python 3.8
2017-10-01 21:01:30pitrousetmessages: + msg303491
2017-10-01 19:51:25Daniel Colascionesetmessages: + msg303489
2017-10-01 19:42:26Daniel Colascionesetmessages: + msg303488
2017-10-01 18:14:36pitrousetmessages: + msg303476
2017-10-01 18:12:08pitrousetmessages: + msg303475
2017-10-01 18:04:41Daniel Colascionesetmessages: + msg303474
2017-10-01 17:55:33Daniel Colascionesetmessages: + msg303473
2017-10-01 17:45:59pitrousetmessages: + msg303471
2017-10-01 17:19:48rhettingersetnosy: + rhettinger, pitrou, davin
messages: + msg303469
2017-10-01 14:31:42SilentGhostsetnosy: + amaury.forgeotdarc, belopolsky, meador.inge
2017-10-01 01:50:47Daniel Colascionecreate