This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Raise an error if a dict is modified during a lookup
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, Mark.Shannon, gvanrossum, python-dev, rhettinger, vstinner
Priority: low Keywords: patch

Created on 2012-03-05 21:19 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dict_lookup.patch vstinner, 2012-03-05 21:19 review
nomodify.patch vstinner, 2012-03-05 21:20 review
Messages (29)
msg154976 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-05 21:19
Lib/test/crashers/nasty_eq_vs_dict.py does crash Python because of an
infinite loop in the dictionary lookup. The script modifies the
dictionary at each lookup, whereas Python tries a new lookup each time
that the dictionary is modified.

I proposed to make the lookup fail with a RuntimeError if the
dictionary has been modified during a lookup. It should not occur if
you are not doing something evil.
msg154977 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-05 21:20
Another possibility is to avoid the modification of a dictionary
during a lookup: nomodify.patch.
msg154989 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2012-03-05 23:28
I much prefer dict_lookup.patch to nomodify.patch.
It doesn't increase the memory use of dict. One extra word per dict could be a lot of memory for a large application.

There is a very subtle semantic change, but I think it is a positive one.
Raising a runtimne seesm sensible as the dict iterators already raise a RuntimeError if the size of the dict changes.
msg154991 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-05 23:43
> I much prefer dict_lookup.patch to nomodify.patch.
> It doesn't increase the memory use of dict. One extra word
> per dict could be a lot of memory for a large application.

nomodify.patch is the correct fix, but I agree that dict_lookup.patch is better (and sufficient) in practive.

> Raising a runtimne seesm sensible as the dict iterators already
> raise a RuntimeError if the size of the dict changes.

Yes, that's how I chose the exception.

>>> d={k: str(k) for k in range(10)}
>>> for k in d:
...  del d[k]
... 
RuntimeError: dictionary changed size during iteration

>>> d={k for k in range(10)}
>>> for k in d:
...  d.remove(k)
... 
RuntimeError: Set changed size during iteration

>>> d=list(range(10))
>>> def keyfunc(x):
...  d.append(1)
...  return x
... 
>>> d.sort(key=keyfunc)
ValueError: list modified during sort
msg154993 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-03-05 23:50
Yeah, dict_lookup.patch seems fine.
msg154994 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-06 00:13
New changeset 934aaf2191d0 by Victor Stinner in branch 'default':
Close #14205: dict lookup raises a RuntimeError if the dict is modified during
http://hg.python.org/cpython/rev/934aaf2191d0
msg155021 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-03-06 16:35
Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.

Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't?
msg155022 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-06 16:54
> Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

The issue was triggered without threads.If the __eq__ method of the
objects used for keys use C functions releasing the GIL, you may
trigger the issue.

> Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't?

If you implement a special type modifying the dict that contains the
object, you should implement your own retry function on lookup failure
(catch RuntimeError).

Honestly, if you get RuntimeError, you should change your program, not
retry the lookup!
msg155023 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2012-03-06 16:56
Jim Jewett wrote:
> Jim Jewett <jimjjewett@gmail.com> added the comment:
> 
> Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

So, they are writing to a dict in one thread while reading from the same
dict in another thread, without any external locks and with keys written
in Python.
> 
> I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.
> 
I suspect, they are already seeing sporadic failures.
I think raising an exception is better than weird failures.

We should document the new behaviour, as it is a change in semantics.
msg155024 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-06 16:59
> If the __eq__ method of the
> objects used for keys use C functions releasing the GIL, you may
> trigger the issue.

Oh, I mean: trigger the issue with threads. I hope that your objects
don't call C functions like open() in their __eq__() method!
msg155026 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-03-06 17:21
On Tue, Mar 6, 2012 at 8:56 AM, Mark Shannon <report@bugs.python.org> wrote:
>
> Mark Shannon <mark@hotpy.org> added the comment:
>
> Jim Jewett wrote:
>> Jim Jewett <jimjjewett@gmail.com> added the comment:
>>
>> Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?
>
> So, they are writing to a dict in one thread while reading from the same
> dict in another thread, without any external locks and with keys written
> in Python.
>>
>> I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.
>>
> I suspect, they are already seeing sporadic failures.
> I think raising an exception is better than weird failures.
>
> We should document the new behaviour, as it is a change in semantics.

+1 on everything you said.
msg155027 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-03-06 17:43
On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:

> Jim Jewett:
>> Can't this be triggered by non-malicious code that just happened
>> to have a python comparison and get hit with a thread switch?

> So, they are writing to a dict in one thread while reading from the
> same dict in another thread, without any external locks and with
> keys written in Python.

Correct.  For example, it could be a configuration manager, or a
cache, or even a worklist.  If they're just adding new keys, or even
deleting other (==> NOT the one being looked up) keys, why should that
keep them from finding the existing, unchanged keys?

>> I'm not sure how often it happens, but today it would not be visible
>> to the user; after the patch, users will see a sporadic failure that
>> they can't easily defend against.

> I suspect, they are already seeing sporadic failures.

How?

The chain terminates as soon as the dict doesn't resize; without
malicious intent, the odds of hitting several resizes in a row are so
miniscule that it probably hasn't even slowed them down.
msg155029 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-03-06 18:05
On Tue, Mar 6, 2012 at 9:43 AM, Jim Jewett <report@bugs.python.org> wrote:
>
> Jim Jewett <jimjjewett@gmail.com> added the comment:
>
> On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:
>
>> Jim Jewett:
>>> Can't this be triggered by non-malicious code that just happened
>>> to have a python comparison and get hit with a thread switch?
>
>> So, they are writing to a dict in one thread while reading from the
>> same dict in another thread, without any external locks and with
>> keys written in Python.
>
> Correct.  For example, it could be a configuration manager, or a
> cache, or even a worklist.  If they're just adding new keys, or even
> deleting other (==> NOT the one being looked up) keys, why should that
> keep them from finding the existing, unchanged keys?

Use a lock or a built-in key class. I realize that neither is ideal,
but then, neither was the old situation.

>>> I'm not sure how often it happens, but today it would not be visible
>>> to the user; after the patch, users will see a sporadic failure that
>>> they can't easily defend against.
>
>> I suspect, they are already seeing sporadic failures.
>
> How?
>
> The chain terminates as soon as the dict doesn't resize; without
> malicious intent, the odds of hitting several resizes in a row are so
> miniscule that it probably hasn't even slowed them down.

Now I'm torn. If we'd have this RuntimeError from the start, would we
consider it a flaw in the dict implementation or a feature? The
RuntimeError when changing a dict's size while iterating over it is
definitely a feature (so as to allow the implementation to rehash
everything upon insert/delete). But this is not quite the same. Or is
it? On the one hand I think the scenario is pretty unlikely (mostly
because it involves a user-defined comparison); OTOH it would be quite
nasty to debug. Or would it? You do get a decent error message...

Note that Victor's alternate fix (nomodify.diff) has the same problem
-- it just raises RuntimeError in the mutating thread rather than in
the lookup thread.
msg155037 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-03-06 20:33
On Tue, Mar 6, 2012 at 1:05 PM, Guido van Rossum

Jim Jewett:
>>...  If they're just adding new keys, or even deleting other
>> (==> NOT the one being looked up) keys, why should that
>> keep them from finding the existing, unchanged keys?

>> ... The chain terminates as soon as the dict doesn't resize; without
>> malicious intent, the odds of hitting several resizes in a row are so
>> miniscule that it probably hasn't even slowed them down.

> Now I'm torn. If we'd have this RuntimeError from the start, would we
> consider it a flaw in the dict implementation or a feature?

I would personally have considered it a flaw.  Wrapping all dict
access just in case some other code added a weird key seems ugly and
inefficient.

> RuntimeError when changing a dict's size while iterating over it is
> definitely a feature (so as to allow the implementation to rehash
> everything upon insert/delete).

In that case, you've explicitly asked for the whole set (or at least a
snapshot); the RuntimeError indicates that you can't get it.  Maybe
there should be a way to say "I don't need the set to be perfect", but
there isn't, and you aren't getting what you actually did ask for, so
an Error is appropriate.

But in this case, you're asking for a specific key, and the key is
there.  It may even be a perfectly normal string, that just happened
to hash-collide with something some other code inserted.

The RuntimeError exposes a race condition to user code -- and it may
not be the code that stuck the oddball class in the dict in the first
place.

> Note that Victor's alternate fix (nomodify.diff) has the same problem
> -- it just raises RuntimeError in the mutating thread rather than in
> the lookup thread.

Right; I'm saying that raising an error is the wrong thing to do.

But there is an analogy to the hash-code change vs collision counting.
 Breaking an application that got unlucky once is wrong.  But if the
bad luck *continues* to the point where it would only occur one time
in a zillion, RuntimeError is better than a likely infinite loop.

(It really was a question initially, but I've now convinced at least myself.)

I would rather see something like replacing

   349             else {
   350                 /* The compare did major nasty stuff to the
   351                  * dict:  start over.
   352                  * XXX A clever adversary could prevent this
   353                  * XXX from terminating.
   354                  */
   355                 return lookdict(mp, key, hash);

with

   349             else {
   350                 /* The compare did major nasty stuff to the
   351                  * dict:  start over.
   352                  * XXX A clever adversary could prevent this
   353                  * XXX from terminating.
   354                  */
   355                 return lookdict_paranoid(mp, key, hash, 1);

where lookdict_paranoid is a near-copy of lookdict that adds a
paranoia parameter and replaces those same lines with

            else {
                /* The compare did major nasty stuff *again*. */
                if (paranoia < PARANOIA_LEVEL) {
                    return lookdict_paranoid(mp, key, hash, paranoia+1);
                }
                /* Something is wrong; raise a RuntimeError. */
            }
msg155147 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-08 01:50
New changeset 0255bafbccf2 by Victor Stinner in branch 'default':
Issue #14205: document the change of dict[key] behaviour if dict is modified
http://hg.python.org/cpython/rev/0255bafbccf2
msg155194 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-09 00:43
Guido: So, what do you think? Do you like the new behaviour, or would you prefer a softer solution like retrying N times? Python < 3.3 retries the lookup an unlimited number of times which lead to a crash, that's the issue fixed by my change.
msg155196 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-03-09 00:50
I'm still torn.  Can you implement the counter without adding an extra field to the dict object?  I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.

The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

I'm prepared to make the current semantic change a 3.3 feature, but I'm not sure it's safe to backport.
msg155197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-09 00:56
> Can you implement the counter without adding an extra field to the dict object?

Add a counter requires to change the prototype of the C lookup function:
PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

I don't know if it is a problem for ABI compatibility or extension
modules. I suppose that it is safer and simpler to not change it :-)

> I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.
>
> The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

Is it really useful to only retry once? Retrying once means that it
would work in most cases, but not in some corner cases. For example,
retrying once may hide the problem if you have only 1 client (1
thread), but you get the exception when you have more clients (more
threads).

Or do I misunderstand the problem?
msg155199 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-03-09 01:01
On Thu, Mar 8, 2012 at 4:56 PM, STINNER Victor <report@bugs.python.org> wrote:
>
> STINNER Victor <victor.stinner@gmail.com> added the comment:
>
>> Can you implement the counter without adding an extra field to the dict object?
>
> Add a counter requires to change the prototype of the C lookup function:
> PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);
>
> I don't know if it is a problem for ABI compatibility or extension
> modules. I suppose that it is safer and simpler to not change it :-)

Agreed.

>> I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.
>>
>> The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.
>
> Is it really useful to only retry once? Retrying once means that it
> would work in most cases, but not in some corner cases. For example,
> retrying once may hide the problem if you have only 1 client (1
> thread), but you get the exception when you have more clients (more
> threads).
>
> Or do I misunderstand the problem?

Sorry, I was joking; anyway, 1 in this context would mean do the
lookup exactly once (so 0 would mean not to do it at all). One retry
would effectively mean try it twice, which would have to be treated
skeptically. :-)

Given all this I think we should keep it as you have committed and add
it to the docs and whatsnew.
msg155205 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-03-09 04:28
On Thu, Mar 8, 2012 at 7:43 PM, STINNER Victor wrote:
> Python < 3.3 retries the lookup an unlimited number of times which
> lead to a crash, that's the issue fixed by my change.

How does it lead to a crash?  I think it just leads to an infinite
loop (which is worse) if the data is actually malicious, but
eventually fixes itself if the data is merely unlucky or even
ill-behaved.

Guido van Rossume:
>> Can you implement the counter without adding an extra field to the dict object?

Yes.  The logic does add complexity, but there is no per-dict charge,
and the extra logic can be isolated to a function that will normally
not be paged in.

> Add a counter requires to change the prototype of the C lookup
> function:
>    PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

> I don't know if it is a problem for ABI compatibility or extension
> modules. I suppose that it is safer and simpler to not change it :-)

That should still be the signature for the primary lookup; it can do
its own delegation to a private function with an extra parameter
if/when needed.

> The numbers 0, 1 and infinity are special; all others are to be
> treated with skepticism.

Yes; it isn't clear whether to cut off after 1 retry or 3 or 10...

> Is it really useful to only retry once? Retrying once means that it
> would work in most cases, but not in some corner cases. For example,
> retrying once may hide the problem if you have only 1 client (1
> thread), but you get the exception when you have more clients (more
> threads).

Without threads, there shouldn't be any problems for innocent code.
Even with threads, the number of resets is not (for innocent code)
tied to the number of alternative threads; only to the fact that they
can run mid-lookup.

-jJ
msg155214 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2012-03-09 09:00
Unmodified CPython (without the patch) already passes the new test in the patch.

The unmodified code already raises a Runtime error; a recursion limit exceeded error!
msg155219 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-09 09:44
> Unmodified CPython (without the patch) already passes the new test in the patch.

You should try Lib/test/crashers/nasty_eq_vs_dict.py, not my test.
msg155230 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-09 13:04
New changeset a428f85de29c by Victor Stinner in branch 'default':
Issue #14205: Document the dict lookup change in What's New in Python 3.3
http://hg.python.org/cpython/rev/a428f85de29c
msg155231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-03-09 13:05
Guido> Given all this I think we should keep it as you have committed
Guido> and add it to the docs and whatsnew.

I updated What's New in Python 3.3 document. I also wrote an email to python-dev to notify all developers of this change. Let's close the issue.
msg155232 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2012-03-09 13:52
STINNER Victor wrote:
> STINNER Victor <victor.stinner@gmail.com> added the comment:
> 
> Guido> Given all this I think we should keep it as you have committed
> Guido> and add it to the docs and whatsnew.
> 
> I updated What's New in Python 3.3 document. I also wrote an email to python-dev to notify all developers of this change. Let's close the issue.
> 

The tests still need to fixed

test_mutating_lookuptest doesn't fail on the unmodified version of cpython.
Lib/test/crashers/nasty_eq_vs_dict.py has been removed, but
no equivalent test has been added to Lib/test/test_dict.py

There should be at least one failing test for this issue.
msg155266 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-09 21:59
New changeset 70fbb02d588c by Victor Stinner in branch 'default':
Issue #14205: Fix test_dict.test_mutating_lookup()
http://hg.python.org/cpython/rev/70fbb02d588c
msg156858 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-03-26 19:35
See http://bugs.python.org/issue14417
msg157334 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-04-01 23:26
Was the crasher ever converted into a unittest? I think that should be done regardless of the outcome of the ongoing discussion about this issue.
msg160543 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-05-13 18:50
New changeset 93748e2d64e3 by Antoine Pitrou in branch 'default':
Issue #14417: Mutating a dict during lookup now restarts the lookup instead of raising a RuntimeError (undoes issue #14205).
http://hg.python.org/cpython/rev/93748e2d64e3
History
Date User Action Args
2022-04-11 14:57:27adminsetgithub: 58413
2012-05-13 18:50:18python-devsetmessages: + msg160543
2012-04-01 23:26:20gvanrossumsetmessages: + msg157334
2012-03-26 19:35:43Jim.Jewettsetmessages: + msg156858
2012-03-09 21:59:09python-devsetmessages: + msg155266
2012-03-09 13:52:47Mark.Shannonsetmessages: + msg155232
2012-03-09 13:05:00vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg155231
2012-03-09 13:04:09python-devsetmessages: + msg155230
2012-03-09 09:44:08vstinnersetmessages: + msg155219
2012-03-09 09:00:08Mark.Shannonsetmessages: + msg155214
2012-03-09 04:28:09Jim.Jewettsetmessages: + msg155205
2012-03-09 01:01:47gvanrossumsetmessages: + msg155199
2012-03-09 00:56:16vstinnersetmessages: + msg155197
2012-03-09 00:50:02gvanrossumsetmessages: + msg155196
2012-03-09 00:43:10vstinnersetmessages: + msg155194
2012-03-08 01:50:33python-devsetmessages: + msg155147
2012-03-07 15:06:02rhettingersetassignee: rhettinger ->
2012-03-06 20:33:48Jim.Jewettsetmessages: + msg155037
2012-03-06 18:05:09gvanrossumsetmessages: + msg155029
2012-03-06 17:43:10Jim.Jewettsetmessages: + msg155027
2012-03-06 17:29:08vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
2012-03-06 17:21:44gvanrossumsetmessages: + msg155026
2012-03-06 16:59:19vstinnersetmessages: + msg155024
2012-03-06 16:56:28Mark.Shannonsetmessages: + msg155023
2012-03-06 16:54:11vstinnersetmessages: + msg155022
2012-03-06 16:35:37Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg155021
2012-03-06 00:13:02python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg154994

resolution: fixed
stage: resolved
2012-03-05 23:50:47gvanrossumsetmessages: + msg154993
2012-03-05 23:49:20rhettingersetpriority: normal -> low
assignee: rhettinger
2012-03-05 23:43:12vstinnersetnosy: + gvanrossum
messages: + msg154991
2012-03-05 23:28:56Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg154989
2012-03-05 22:14:40pitrousetnosy: + rhettinger
2012-03-05 21:20:58vstinnersetfiles: + nomodify.patch

messages: + msg154977
2012-03-05 21:19:54vstinnercreate