Message 403765 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	bjs
Recipients	bjs, docs@python, graingert, jeff.allen, pablogsal, pitrou, serhiy.storchaka, steven.daprano
Date	2021-10-12.23:09:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1634080178.95.0.554086559469.issue45435@roundup.psfhosted.org>
In-reply-to

Content
The problem with the FAQs is that it's over-simplifying things to the point where it can sometimes mislead. Notably, it says the GIL protects these operations; but as Antoine points out, many operations on datatypes drop back into Python (including potential decrefs) Concerns about non-atomic evaluation of the composition of operations in these statements is mostly due to the way the FAQ is presented, it should be made clearer which operations it's describing to be atomic. (Otherwise you get questions like "is x = L[x] atomic?") graingert said the following might be useful, so: Going through each of the points of the FAQ... The following seem relatively straight-forward and non-controversial(?): x = L[i] x = L.pop() x = y L.append(x) L1.extend(L2) I'm not even sure what it means when it says the following: D.keys() The following probably have some caveats: D[x] = y These appear to be the suspect ones: D1.update(D2) L.sort() L[i:j] = L2 x.field = y Exploring each in more detail... dict.keys is just a mystery to me, maybe this mattered in Python 2 but these are view objects now, or maybe I am missing something? dict.__setitem__ needs clarification really, surely the actual setting of the item is "atomic" in that other threads will either see the dict with or without the item and not halfway through some resizing operation or something, but in doing the setting it may trigger many __eq__ calls on the other keys (either during the resize itself, or just during probing). The dict.update case seems like it should hold if both dicts have keys made of only other builtin types so that the GIL can continue to protect. If the keys of either are custom objects with their own __eq__ then the "atomicity" of the operation is in question as the __eq__ can happen "during" the update. Imagine two update()s to the same dict, if the keys have custom __eq__'s then the (concurrent) composition of the two may give some mix of the two dictionaries overlapping keys. (Note that the __hash__ doesn't matter as it is not recomputed on an update) For list.sort it's more subtle, there is built-in protection to make it somewhat atomic which means that append()s and extend()s shouldn't be lost but concurrent threads might suddenly see the list be emptied and concurrent len()/L.pop() see sequentially inconsistent results. For list.__setitem__ it's clear it's non-atomic in the case that the elements of the list are custom objects with their own __del__, and the FAQ does infact mention this case (at the bottom). Attribute assignment is odd, I can't see how that can be described as "atomic" for arbitrary objects. There is no way the FAQ really means that x and y are instances of `object`. There are questions about operations that are potentially missing(?) from the list: len(L) D1.copy() L1 += L2 (or does "extend" cover this too?) ... etc, and other datatypes (tuples are an obvious question here) It's not clear why the FAQ picked these exact operations out specifically. Fundamentally this FAQ tries to be both a language definition ("You can rely on these operations being atomic") but also somewhat of an implementation-dependent description ("this is what is true in CPython"). Perhaps the best long-term solution would be to remove this "FAQ" and either move more detailed discussion about atomicity guarantees for various operations to the actual docs for the built-in data structures or to relax the guarantees the language gives -- asking people to use mutexes/async libraries more and only guaranteeing enough to make those cases work.

The problem with the FAQs is that it's over-simplifying things to the point where it can sometimes mislead.

Notably, it says the GIL protects these operations; but as Antoine points out,  many operations on datatypes drop back into Python (including potential decrefs)

Concerns about non-atomic evaluation of the composition of operations in these statements is mostly due to the way the FAQ is presented,  it should be made clearer *which* operations it's describing to be atomic.
(Otherwise you get questions like "is x = L[x] atomic?")

graingert said the following might be useful, so:
Going through each of the points of the FAQ...

The following seem relatively straight-forward and non-controversial(?):
x = L[i]
x = L.pop()
x = y
L.append(x)
L1.extend(L2)

I'm not even sure what it *means* when it says the following:
D.keys()

The following probably have some caveats:
D[x] = y

These appear to be the suspect ones:
D1.update(D2)
L.sort()
L[i:j] = L2
x.field = y

Exploring each in more detail...

dict.keys is just a mystery to me, maybe this mattered in Python 2 but these are view objects now, or maybe I am missing something?

dict.__setitem__ needs clarification really, surely the actual setting of the item is "atomic" in that other threads will either see the dict with or without the item and not halfway through some resizing operation or something, but in doing the setting it may trigger many __eq__ calls on the other keys
(either during the resize itself, or just during probing).

The dict.update case seems like it should hold if both dicts have keys made of only other builtin types so that the GIL can continue to protect.  If the keys of either are custom objects with their own __eq__ then the "atomicity" of the operation is in question
as the __eq__ can happen "during" the update.
Imagine two update()s to the same dict,  if the keys have custom __eq__'s then the (concurrent) composition of the two may give some mix of the two dictionaries overlapping keys.
(Note that the __hash__ doesn't matter as it is not recomputed on an update)

For list.sort it's more subtle,
there is built-in protection to make it somewhat atomic
which means that append()s and extend()s shouldn't be lost
but concurrent threads might suddenly see the list be emptied and concurrent len()/L.pop() see sequentially inconsistent results.

For list.__setitem__ it's clear it's non-atomic in the case that the elements of the list are custom objects with their own __del__, and the FAQ does infact mention this case (at the bottom).

Attribute assignment is odd,  I can't see how that can be described as "atomic" for arbitrary objects. There is no way the FAQ really means that x and y are instances of `object`.

There are questions about operations that are potentially missing(?) from the list:
len(L)
D1.copy()
L1 += L2  (or does "extend" cover this too?)
... etc,  and other datatypes (tuples are an obvious question here)

It's not clear why the FAQ picked these exact operations out specifically.

Fundamentally this FAQ tries to be both a language definition ("You can rely on these operations being atomic") but also somewhat of an implementation-dependent description ("this is what is true in CPython").
Perhaps the best long-term solution would be to remove this "FAQ" and either move more detailed discussion about atomicity guarantees for various operations to the actual docs for the built-in data structures or to relax the guarantees the language gives -- asking people to use mutexes/async libraries more and only guaranteeing enough to make those cases work.

History
Date	User	Action	Args
2021-10-12 23:09:38	bjs	set	recipients: + bjs, pitrou, steven.daprano, docs@python, serhiy.storchaka, jeff.allen, graingert, pablogsal
2021-10-12 23:09:38	bjs	set	messageid: <1634080178.95.0.554086559469.issue45435@roundup.psfhosted.org>
2021-10-12 23:09:38	bjs	link	issue45435 messages
2021-10-12 23:09:38	bjs	create