classification
Title: delete misleading faq entry about atomic operations
Type: behavior Stage: patch review
Components: Documentation Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: bjs, docs@python, graingert, jeff.allen, pablogsal, pitrou, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2021-10-11 20:25 by graingert, last changed 2021-10-12 23:09 by bjs.

Pull Requests
URL Status Linked Edit
PR 28886 closed graingert, 2021-10-11 20:27
Messages (8)
msg403700 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-10-11 23:12
Why do you say that the FAQ is misleading?

If it is misleading, it should be replaced with a more correct answer, not just deleted.
msg403714 - (view) Author: Jeff Allen (jeff.allen) * Date: 2021-10-12 07:42
I'm interested in Thomas' reasons, but here are some of mine (as far as I understand things):

1. It is specific to one interpreter implemented in C, equipped with a GIL, and on certain assumptions about the byte code interpreter and the implementation of built-ins, that may not hold long-term. 

2. In x = L[i], the index and assignment are distinct actions (in today's byte code), allowing L or i to change before x is assigned. This applies to multiple other of the examples.

3. A compiler (even a CPU) is free to re-order operations and cache values in unguessable ways, on the assumption of a single thread.

4. Code written on these principals is fragile. It only takes the replacement of a built-in with sub-class redefining __getitem__ (to support some worthy aim elsewhere in the code) to invalidate it.

5. sort() is not atomic if an element is of a type that overrides comparison in Python. (Nor is modifying a dictionary if __hash__ or __eq__ are redefined.)
 

If you want retain the question, with a better answer, the last sentence is good: "When in doubt, use a mutex!", accompanied by "Always be in doubt."
msg403715 - (view) Author: Thomas Grainger (graingert) * Date: 2021-10-12 08:07
it's as part of this discussion in https://mail.python.org/archives/list/python-dev@python.org/thread/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/#IAOCDDCJ653NBED3G2J2YBWD7HHPFHT6 and others in #python-dev 

specifically https://github.com/python/cpython/blob/2f92e2a590f0e5d2d3093549f5af9a4a1889eb5a/Objects/dictobject.c#L2582-L2586

regarding if any of the items are builtins or not: the faq entry lists (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints) so I read that to mean x and y are user defined objects with user defined comparison and equality methods
msg403716 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-10-12 08:35
I'm also surprised to learn that `L.sort()` and `D1.update(D2)` are supposed to be atomic. They certainly are not in the general case.

Remember, any Python code can release the GIL (because the GIL is released periodically in the interpreter loop). Any DECREF can also release the GIL (because it may trigger the execution of arbitrary destructors). This restricts a lot which operations can be safely considered atomic.
msg403725 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-10-12 11:03
Jeff makes an excellent point about the docs failing to distinguish 
between language guarantees, implementation guarantees, and things which 
are merely true sometimes.

On the other hand, we only need document what is true *now*, not what 
may be true in some long distant future.

On Tue, Oct 12, 2021 at 07:42:07AM +0000, Jeff Allen wrote:

> 2. In x = L[i], the index and assignment are distinct actions (in 
> today's byte code), allowing L or i to change before x is assigned.

Does that matter though? I think that's a distinction that makes no 
difference.

We know that another thread could change the L or the i before the 
assignment, if they are global. But once the L[i] lookup has occurred, 
it doesn't matter if they change. It's not going to affect what value 
gets bound to the x.

So in a practical sense, we can say that once the lookup L[i] has 
occurred, the binding might as well be atomic. I think that's what the 
entry is trying to say. Have I missed something?

> 3. A compiler (even a CPU) is free to re-order operations and cache 
> values in unguessable ways, on the assumption of a single thread.

The CPU doesn't operate at the level of Python byte code though, and 
there are limits to what the compiler will do. It's not going to reorder 
things in ways that change the semantics of the code (that would be a 
compiler bug). Its not going to reorder this code:

    x = 1
    print(x)
    x = 2

so that "2" gets printed. So I don't see that this objection is 
relevant.

> 4. Code written on these principals is fragile. It only takes the 
> replacement of a built-in with sub-class redefining __getitem__ (to 
> support some worthy aim elsewhere in the code) to invalidate it.

The FAQ entry could be more forceful that it is only talking about 
certain built-in types, and once you change things to another type, the 
promises may no longer hold.

But we should not hold it against the FAQs that the promises made about 
one type don't apply to other types.

> 5. sort() is not atomic if an element is of a type that overrides 
> comparison in Python. (Nor is modifying a dictionary if __hash__ or 
> __eq__ are redefined.)

Indeed, and the FAQ should be more forceful about that proviso.
msg403729 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-12 12:36
sort() is atomic, even if GIL is released during executing custom __lt__. It is guaranteed that no operations on the list in other threads can affect the result of sort().

I do not understand what non-atomic you see in x = L[i]. The value of x is determined by values of L and i at the start of the operation. GIL is not released during indexing L, and if it is released between indexing and assignment, it does not affect the result.

The FAQ answer is specially about built-in types, it is not related to types with overwritten __getitem__ etc.

The most questionable examples are dict operations. But even they provide some kind of atomacity. But you perhaps need to know internals to understand limitations.

We perhaps should explicitly document what non-trivial operations are atomical (for example list and dict copying is atomic) and whether atomacity is the part of the language specification or CPython implementation detail. In many places in the stdlib the code relies on GIL for atomacity.
msg403754 - (view) Author: Jeff Allen (jeff.allen) * Date: 2021-10-12 19:34
Thomas wrote:
> it's as part of this discussion in https://mail.python.org/archives/list/python-dev@python.org/thread/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/#IAOCDDCJ653NBED3G2J2YBWD7HHPFHT6 and others in #python-dev 

That's where I noticed it, but it seemed the wrong place to explore this way.

Steven is right, I'm over-stating the case. And although valid that this is CPython specific, it's well sign-posted and I'm just being thin-skinned.

Serhiy writes:
> sort() is atomic, even if GIL is released during executing custom __lt__. It is guaranteed that no operations on the list in other threads can affect the result of sort().

The strategy noted here: https://github.com/python/cpython/blob/2d21612f0dd84bf6d0ce35bcfcc9f0e1a41c202d/Objects/listobject.c#L2261-L2265
does guarantee that, which I hadn't noticed. What if during the release of the GIL, another thread appends to L? In my simple experiment I get a ValueError and the modifications are lost. I think that is not thread-safe.

Serhiy also writes:

> I do not understand what non-atomic you see in x = L[i]. The value of x is determined by values of L and i at the start of the operation. GIL is not released during indexing L, and if it is released between indexing and assignment, it does not affect the result.

and Steven:

> Does that matter though? I think that's a distinction that makes no 
difference.

> We know that another thread could change the L or the i before the 
assignment, if they are global. But once the L[i] lookup has occurred, 
it doesn't matter if they change. It's not going to affect what value 
gets bound to the x.

Fair enough. Atomicity is a bit slippery, I find. It depends where the critical region starts. Thinking again, it's not the assignment that's the issue ...

L is pushed
i is pushed
__getitem__ is called
x is popped

It is possible, if i and L are accessible to another thread and change after L is pushed, that x is given a value composed from an i and an L that never existed concurrently in the view of the other thread. Straining at gnats here, but atomicity is a strong claim.

And on the point about re-ordering and CPUs, I can't imagine re-ordering that effectively changes the order of byte codes. But do CPython threads run in separate CPUs, or is that only when we have multiple interpreters? If so, and L were in a hot memory location (either the variable or its content), this could be inconsistent between threads. Sorry, I don't know the memory coherence CPython has: I know I couldn't rely on it in Java.

I'm just arguing that the section gives advice that is *nearly* always right, which is a horrible thing to debug. I'll stop stirring.
msg403765 - (view) Author: Ben (bjs) * Date: 2021-10-12 23:09
The problem with the FAQs is that it's over-simplifying things to the point where it can sometimes mislead.

Notably, it says the GIL protects these operations; but as Antoine points out,  many operations on datatypes drop back into Python (including potential decrefs)

Concerns about non-atomic evaluation of the composition of operations in these statements is mostly due to the way the FAQ is presented,  it should be made clearer *which* operations it's describing to be atomic.
(Otherwise you get questions like "is x = L[x] atomic?")

graingert said the following might be useful, so:
Going through each of the points of the FAQ...

The following seem relatively straight-forward and non-controversial(?):
x = L[i]
x = L.pop()
x = y
L.append(x)
L1.extend(L2)

I'm not even sure what it *means* when it says the following:
D.keys()

The following probably have some caveats:
D[x] = y

These appear to be the suspect ones:
D1.update(D2)
L.sort()
L[i:j] = L2
x.field = y

Exploring each in more detail...

dict.keys is just a mystery to me, maybe this mattered in Python 2 but these are view objects now, or maybe I am missing something?

dict.__setitem__ needs clarification really, surely the actual setting of the item is "atomic" in that other threads will either see the dict with or without the item and not halfway through some resizing operation or something, but in doing the setting it may trigger many __eq__ calls on the other keys
(either during the resize itself, or just during probing).

The dict.update case seems like it should hold if both dicts have keys made of only other builtin types so that the GIL can continue to protect.  If the keys of either are custom objects with their own __eq__ then the "atomicity" of the operation is in question
as the __eq__ can happen "during" the update.
Imagine two update()s to the same dict,  if the keys have custom __eq__'s then the (concurrent) composition of the two may give some mix of the two dictionaries overlapping keys.
(Note that the __hash__ doesn't matter as it is not recomputed on an update)

For list.sort it's more subtle,
there is built-in protection to make it somewhat atomic
which means that append()s and extend()s shouldn't be lost
but concurrent threads might suddenly see the list be emptied and concurrent len()/L.pop() see sequentially inconsistent results.

For list.__setitem__ it's clear it's non-atomic in the case that the elements of the list are custom objects with their own __del__, and the FAQ does infact mention this case (at the bottom).

Attribute assignment is odd,  I can't see how that can be described as "atomic" for arbitrary objects. There is no way the FAQ really means that x and y are instances of `object`.

There are questions about operations that are potentially missing(?) from the list:
len(L)
D1.copy()
L1 += L2  (or does "extend" cover this too?)
... etc,  and other datatypes (tuples are an obvious question here)

It's not clear why the FAQ picked these exact operations out specifically.

Fundamentally this FAQ tries to be both a language definition ("You can rely on these operations being atomic") but also somewhat of an implementation-dependent description ("this is what is true in CPython").
Perhaps the best long-term solution would be to remove this "FAQ" and either move more detailed discussion about atomicity guarantees for various operations to the actual docs for the built-in data structures or to relax the guarantees the language gives -- asking people to use mutexes/async libraries more and only guaranteeing enough to make those cases work.
History
Date User Action Args
2021-10-12 23:09:38bjssetnosy: + bjs
messages: + msg403765
2021-10-12 19:34:36jeff.allensetmessages: + msg403754
2021-10-12 12:36:26serhiy.storchakasetmessages: + msg403729
2021-10-12 11:03:31steven.dapranosetmessages: + msg403725
2021-10-12 08:35:49pitrousetversions: + Python 3.9, Python 3.10, Python 3.11
nosy: + pitrou, pablogsal, serhiy.storchaka

messages: + msg403716

type: behavior
2021-10-12 08:07:48graingertsetmessages: + msg403715
2021-10-12 07:42:07jeff.allensetnosy: + jeff.allen
messages: + msg403714
2021-10-11 23:12:36steven.dapranosetnosy: + steven.daprano
messages: + msg403700
2021-10-11 20:27:35graingertsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27181
2021-10-11 20:25:32graingertcreate