classification
Title: Make dict.setdefault() atomic
Type: behavior Stage: committed/rejected
Components: Interpreter Core Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Ramchandra Apte, eric.araujo, gruszczy, jcea, jcon, meador.inge, pitrou, python-dev, rhettinger
Priority: normal Keywords: patch

Created on 2011-12-03 00:30 by rhettinger, last changed 2012-02-27 00:06 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
13521.patch gruszczy, 2011-12-11 19:53 review
13521_2.patch gruszczy, 2011-12-11 20:03 review
13521_27_1.patch gruszczy, 2011-12-18 18:00 review
13521_27_3.patch gruszczy, 2012-01-15 14:01
13521_27_4.patch gruszczy, 2012-02-26 20:40 review
Messages (33)
msg148782 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-12-03 00:30
A dialog on python-dev suggests that dict.setdefault() was intended to be atomic and that a number of experienced developers have deployed code relying on its atomicity:  http://mail.python.org/pipermail/python-3000/2007-July/008693.html

The actual implementation shows a second call to PyDict_Setitem() which can call PyObject_Hash() which can call arbitrary Python code.   http://hg.python.org/cpython/file/2.7/Objects/dictobject.c#l1937

This fix is straight-forward, use the results of the initial lookup to insert the default object.  This will make the operation atomic and it will make it faster.
msg148784 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-03 01:27
+1.
msg148815 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-03 18:27
Atomic and faster... +1.
msg149142 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2011-12-10 04:28
+1 for atomic and more robust
msg149164 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-12-10 16:08
maniram: We try to keep bug reports focused on one thing, and we don’t generally collect votes (we prefer that people vote with patches, tests and messages with content—for example, saying exactly what “more robust” should be).  Here I think Antoine and Jesús said +1 to agree that this should be changed even in stable versions.

Raymond: I think I could make the patch, but I’m not sure about testing the new behavior.
msg149195 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-10 22:31
Eric, overload "__hash__()" and check that is only called once, while now would be called twice.
msg149245 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-11 19:53
I have written a patch and a test, but since it's changing C code, I am far from being sure if it's achieve the expected behavior in the right way. There are also tests and running whole test suite didn't bring any errors.
msg149246 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-11 19:53
Also: I'll be happy to work further on this patch, if I get some comments and advice.
msg149247 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-11 19:59
If _PyDict_SetItemUsingHash is module-private, it should be declared "static". Also, better if it follows the usual naming of static functions inside that C file (i.e. "dict_some_lowercase_name").
msg149249 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-11 20:03
Done.
msg149255 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-11 22:43
The patch looks ok to me. I'll let Raymond make the final call.
msg149324 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-12-12 16:09
Thanks for the idea Jesús, even though I didn’t get the change to use it :)

Filip: Is there a reasons for using a nonlocal count rather than e.g. self.count?  Otherwise the test looks good.
msg149328 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-12 16:48
> Filip: Is there a reasons for using a nonlocal count rather than e.g.
> self.count?  Otherwise the test looks good.

Eric, please, could we stop such pointless nitpicking about one's
stylistic preferences? "nonlocal" is as good as any other solution here,
and reviewing is not supposed to be a pedantry contest. There are tons
of serious issues to be solved where your energy would be better spent,
IM(NSH)O.
msg149329 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-12 16:53
I haven't given it much thought, when I was making the choice of using nonlocal rather than self.count. I was rather excited to see, if the change will work as I wanted it to. If you believe it would be better to use an attribute, I'll be happy to change it.
msg149330 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-12 16:55
Well, if this is going to be applied to 2.7, "nonlocal" is invalid syntax there. I guess that being able to use the same test in 2.7 and 3.2/3.3 would be nice. Style, but justified.
msg149332 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-12 17:09
> Well, if this is going to be applied to 2.7, "nonlocal" is invalid
> syntax there.

Ah, good point.
msg149333 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-12 17:17
I'll send a patch, when I get home from work.
msg149335 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-12-12 17:20
Let's just start with a working 2.7 patch and go from there.
msg149350 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-12 19:43
I have tried to port patch to python2.7, but apparently I must be doing something wrong, because while most tests pass, several fail (including tests for unittest itself).

I would like to continue working on this test, but incoming week is pretty intensive for me. Is it ok if I returned to working on this during the weekend?
msg149367 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-13 00:37
> I would like to continue working on this test, but incoming week is
> pretty intensive for me. Is it ok if I returned to working on this
> during the weekend?

Of course. There's no rush.
(I'm really not sure it should go in 2.7 and 3.2; this looks more like a feature than a bug, since atomicity has never been documented)
msg149375 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-12-13 02:26
Thanks, I'll look at it over the next few days.
msg149785 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-12-18 18:00
Patch for 2.7.
msg150878 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2012-01-08 16:50
Bump! There was no activity here for two weeks. Is my patch for 2.7 ok or should I do something more about it?
msg150886 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-08 17:58
Looking at the patch again, I think this isn't enough.
setdefault() will still call the lookup routine twice which, in the general case (i.e. lookdict() not lookdict_unicode()), can call arbitrary Python code through e.g. __eq__ methods.
msg150908 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2012-01-08 21:30
I understand you are talking about this call:

mp->ma_lookup(mp, key, hash);

I haven't noticed that earlier. I'll try to provide a better fix (for 2.7 first, after we agree it's good enough, I will provide one for 3.3).

Do you have any idea, how I might this part? I mean how to check, that this is called only once?
msg150912 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-08 21:42
> Do you have any idea, how I might this part? I mean how to check, that
> this is called only once?

Checking that __eq__ is called only once should be a good proxy.
msg150914 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2012-01-08 21:46
Thanks, I'll try that. Like before I will probably find time next weekend.
msg151288 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2012-01-15 14:01
No more double lookup.
msg151296 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-15 17:20
> No more double lookup.

Your patch doesn't check hashed2.eq_count.
Since the dict specification doesn't say on which instance __eq__ will
be called when doing a lookup, the patch should either check
``hashed1.eq_count + hashed2.eq_count``, or make eq_count a class
attribute.

A nit: be careful not to use tabs in C files.
msg154384 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2012-02-26 20:40
Fixed both issues.
msg154421 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-26 23:48
New changeset 5c52e7c6d868 by Antoine Pitrou in branch '2.7':
Issue #13521: dict.setdefault() now does only one lookup for the given key, making it "atomic" for many purposes.
http://hg.python.org/cpython/rev/5c52e7c6d868
msg154422 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-27 00:05
New changeset 90572ccda12c by Antoine Pitrou in branch '3.2':
Issue #13521: dict.setdefault() now does only one lookup for the given key, making it "atomic" for many purposes.
http://hg.python.org/cpython/rev/90572ccda12c

New changeset 3dfa98cf2e26 by Antoine Pitrou in branch 'default':
Issue #13521: dict.setdefault() now does only one lookup for the given key, making it "atomic" for many purposes.
http://hg.python.org/cpython/rev/3dfa98cf2e26
msg154423 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-27 00:06
Thank you Filip! I've now committed the patch.
History
Date User Action Args
2012-05-16 16:49:58pitroulinkissue5730 superseder
2012-02-27 00:06:32pitrousetstatus: open -> closed
resolution: fixed
messages: + msg154423

stage: patch review -> committed/rejected
2012-02-27 00:05:45python-devsetmessages: + msg154422
2012-02-26 23:48:58python-devsetnosy: + python-dev
messages: + msg154421
2012-02-26 20:40:41gruszczysetfiles: + 13521_27_4.patch

messages: + msg154384
2012-01-15 17:20:03pitrousetmessages: + msg151296
2012-01-15 14:01:29gruszczysetfiles: + 13521_27_3.patch

messages: + msg151288
2012-01-08 21:46:34gruszczysetmessages: + msg150914
2012-01-08 21:42:19pitrousetmessages: + msg150912
2012-01-08 21:30:00gruszczysetmessages: + msg150908
2012-01-08 17:58:43pitrousetmessages: + msg150886
2012-01-08 16:50:39gruszczysetmessages: + msg150878
2011-12-18 18:00:55gruszczysetfiles: + 13521_27_1.patch

messages: + msg149785
2011-12-13 02:26:57rhettingersetmessages: + msg149375
2011-12-13 00:37:30pitrousetmessages: + msg149367
2011-12-12 19:43:48gruszczysetmessages: + msg149350
2011-12-12 17:20:10rhettingersetmessages: + msg149335
2011-12-12 17:17:52gruszczysetmessages: + msg149333
2011-12-12 17:09:34pitrousetmessages: + msg149332
2011-12-12 16:55:40jceasetmessages: + msg149330
2011-12-12 16:53:35gruszczysetmessages: + msg149329
2011-12-12 16:48:16pitrousetmessages: + msg149328
2011-12-12 16:09:19eric.araujosetmessages: + msg149324
2011-12-11 22:43:20pitrousetmessages: + msg149255
2011-12-11 20:03:27gruszczysetfiles: + 13521_2.patch

messages: + msg149249
2011-12-11 19:59:12pitrousetmessages: + msg149247
stage: needs patch -> patch review
2011-12-11 19:53:59gruszczysetmessages: + msg149246
2011-12-11 19:53:36gruszczysetfiles: + 13521.patch

nosy: + gruszczy
messages: + msg149245

keywords: + patch
2011-12-10 22:31:30jceasetmessages: + msg149195
2011-12-10 16:08:15eric.araujosetnosy: + eric.araujo

messages: + msg149164
versions: - Python 3.1
2011-12-10 04:28:36Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg149142
2011-12-10 04:01:31jconsetnosy: + jcon
2011-12-03 18:27:44jceasetnosy: + jcea
messages: + msg148815
2011-12-03 18:02:26meador.ingesetnosy: + meador.inge

stage: needs patch
2011-12-03 02:15:57rhettingersetassignee: rhettinger
2011-12-03 01:27:27pitrousetnosy: + pitrou
messages: + msg148784
2011-12-03 00:30:28rhettingercreate