This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Jim.Jewett
Recipients Jim.Jewett, Mark.Shannon, gvanrossum, python-dev, rhettinger, vstinner
Date 2012-03-06.20:33:47
SpamBayes Score 9.436896e-16
Marked as misclassified No
Message-id <CA+OGgf6MnC6cr+2yZppAMHa5UO9W31O3gPmtYZYybk0SpCXbkw@mail.gmail.com>
In-reply-to <CAP7+vJLupCP47JUCaTF3YJNEZF0NXjiuUXJf11qKYjkKcVK4dg@mail.gmail.com>
Content
On Tue, Mar 6, 2012 at 1:05 PM, Guido van Rossum

Jim Jewett:
>>...  If they're just adding new keys, or even deleting other
>> (==> NOT the one being looked up) keys, why should that
>> keep them from finding the existing, unchanged keys?

>> ... The chain terminates as soon as the dict doesn't resize; without
>> malicious intent, the odds of hitting several resizes in a row are so
>> miniscule that it probably hasn't even slowed them down.

> Now I'm torn. If we'd have this RuntimeError from the start, would we
> consider it a flaw in the dict implementation or a feature?

I would personally have considered it a flaw.  Wrapping all dict
access just in case some other code added a weird key seems ugly and
inefficient.

> RuntimeError when changing a dict's size while iterating over it is
> definitely a feature (so as to allow the implementation to rehash
> everything upon insert/delete).

In that case, you've explicitly asked for the whole set (or at least a
snapshot); the RuntimeError indicates that you can't get it.  Maybe
there should be a way to say "I don't need the set to be perfect", but
there isn't, and you aren't getting what you actually did ask for, so
an Error is appropriate.

But in this case, you're asking for a specific key, and the key is
there.  It may even be a perfectly normal string, that just happened
to hash-collide with something some other code inserted.

The RuntimeError exposes a race condition to user code -- and it may
not be the code that stuck the oddball class in the dict in the first
place.

> Note that Victor's alternate fix (nomodify.diff) has the same problem
> -- it just raises RuntimeError in the mutating thread rather than in
> the lookup thread.

Right; I'm saying that raising an error is the wrong thing to do.

But there is an analogy to the hash-code change vs collision counting.
 Breaking an application that got unlucky once is wrong.  But if the
bad luck *continues* to the point where it would only occur one time
in a zillion, RuntimeError is better than a likely infinite loop.

(It really was a question initially, but I've now convinced at least myself.)

I would rather see something like replacing

   349             else {
   350                 /* The compare did major nasty stuff to the
   351                  * dict:  start over.
   352                  * XXX A clever adversary could prevent this
   353                  * XXX from terminating.
   354                  */
   355                 return lookdict(mp, key, hash);

with

   349             else {
   350                 /* The compare did major nasty stuff to the
   351                  * dict:  start over.
   352                  * XXX A clever adversary could prevent this
   353                  * XXX from terminating.
   354                  */
   355                 return lookdict_paranoid(mp, key, hash, 1);

where lookdict_paranoid is a near-copy of lookdict that adds a
paranoia parameter and replaces those same lines with

            else {
                /* The compare did major nasty stuff *again*. */
                if (paranoia < PARANOIA_LEVEL) {
                    return lookdict_paranoid(mp, key, hash, paranoia+1);
                }
                /* Something is wrong; raise a RuntimeError. */
            }
History
Date User Action Args
2012-03-06 20:33:48Jim.Jewettsetrecipients: + Jim.Jewett, gvanrossum, rhettinger, vstinner, Mark.Shannon, python-dev
2012-03-06 20:33:48Jim.Jewettlinkissue14205 messages
2012-03-06 20:33:47Jim.Jewettcreate