Message 155037 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Jim.Jewett
Recipients	Jim.Jewett, Mark.Shannon, gvanrossum, python-dev, rhettinger, vstinner
Date	2012-03-06.20:33:47
SpamBayes Score	9.436896e-16
Marked as misclassified	No
Message-id	<CA+OGgf6MnC6cr+2yZppAMHa5UO9W31O3gPmtYZYybk0SpCXbkw@mail.gmail.com>
In-reply-to	<CAP7+vJLupCP47JUCaTF3YJNEZF0NXjiuUXJf11qKYjkKcVK4dg@mail.gmail.com>

Content
On Tue, Mar 6, 2012 at 1:05 PM, Guido van Rossum Jim Jewett: >>... If they're just adding new keys, or even deleting other >> (==> NOT the one being looked up) keys, why should that >> keep them from finding the existing, unchanged keys? >> ... The chain terminates as soon as the dict doesn't resize; without >> malicious intent, the odds of hitting several resizes in a row are so >> miniscule that it probably hasn't even slowed them down. > Now I'm torn. If we'd have this RuntimeError from the start, would we > consider it a flaw in the dict implementation or a feature? I would personally have considered it a flaw. Wrapping all dict access just in case some other code added a weird key seems ugly and inefficient. > RuntimeError when changing a dict's size while iterating over it is > definitely a feature (so as to allow the implementation to rehash > everything upon insert/delete). In that case, you've explicitly asked for the whole set (or at least a snapshot); the RuntimeError indicates that you can't get it. Maybe there should be a way to say "I don't need the set to be perfect", but there isn't, and you aren't getting what you actually did ask for, so an Error is appropriate. But in this case, you're asking for a specific key, and the key is there. It may even be a perfectly normal string, that just happened to hash-collide with something some other code inserted. The RuntimeError exposes a race condition to user code -- and it may not be the code that stuck the oddball class in the dict in the first place. > Note that Victor's alternate fix (nomodify.diff) has the same problem > -- it just raises RuntimeError in the mutating thread rather than in > the lookup thread. Right; I'm saying that raising an error is the wrong thing to do. But there is an analogy to the hash-code change vs collision counting. Breaking an application that got unlucky once is wrong. But if the bad luck continues to the point where it would only occur one time in a zillion, RuntimeError is better than a likely infinite loop. (It really was a question initially, but I've now convinced at least myself.) I would rather see something like replacing 349 else { 350 /* The compare did major nasty stuff to the 351 * dict: start over. 352 * XXX A clever adversary could prevent this 353 * XXX from terminating. 354 / 355 return lookdict(mp, key, hash); with 349 else { 350 / The compare did major nasty stuff to the 351 * dict: start over. 352 * XXX A clever adversary could prevent this 353 * XXX from terminating. 354 / 355 return lookdict_paranoid(mp, key, hash, 1); where lookdict_paranoid is a near-copy of lookdict that adds a paranoia parameter and replaces those same lines with else { / The compare did major nasty stuff again. / if (paranoia < PARANOIA_LEVEL) { return lookdict_paranoid(mp, key, hash, paranoia+1); } / Something is wrong; raise a RuntimeError. */ }

On Tue, Mar 6, 2012 at 1:05 PM, Guido van Rossum

Jim Jewett:
>>...  If they're just adding new keys, or even deleting other
>> (==> NOT the one being looked up) keys, why should that
>> keep them from finding the existing, unchanged keys?

>> ... The chain terminates as soon as the dict doesn't resize; without
>> malicious intent, the odds of hitting several resizes in a row are so
>> miniscule that it probably hasn't even slowed them down.

> Now I'm torn. If we'd have this RuntimeError from the start, would we
> consider it a flaw in the dict implementation or a feature?

I would personally have considered it a flaw.  Wrapping all dict
access just in case some other code added a weird key seems ugly and
inefficient.

> RuntimeError when changing a dict's size while iterating over it is
> definitely a feature (so as to allow the implementation to rehash
> everything upon insert/delete).

In that case, you've explicitly asked for the whole set (or at least a
snapshot); the RuntimeError indicates that you can't get it.  Maybe
there should be a way to say "I don't need the set to be perfect", but
there isn't, and you aren't getting what you actually did ask for, so
an Error is appropriate.

But in this case, you're asking for a specific key, and the key is
there.  It may even be a perfectly normal string, that just happened
to hash-collide with something some other code inserted.

The RuntimeError exposes a race condition to user code -- and it may
not be the code that stuck the oddball class in the dict in the first
place.

> Note that Victor's alternate fix (nomodify.diff) has the same problem
> -- it just raises RuntimeError in the mutating thread rather than in
> the lookup thread.

Right; I'm saying that raising an error is the wrong thing to do.

But there is an analogy to the hash-code change vs collision counting.
 Breaking an application that got unlucky once is wrong.  But if the
bad luck *continues* to the point where it would only occur one time
in a zillion, RuntimeError is better than a likely infinite loop.

(It really was a question initially, but I've now convinced at least myself.)

I would rather see something like replacing

   349             else {
   350                 /* The compare did major nasty stuff to the
   351                  * dict:  start over.
   352                  * XXX A clever adversary could prevent this
   353                  * XXX from terminating.
   354                  */
   355                 return lookdict(mp, key, hash);

with

   349             else {
   350                 /* The compare did major nasty stuff to the
   351                  * dict:  start over.
   352                  * XXX A clever adversary could prevent this
   353                  * XXX from terminating.
   354                  */
   355                 return lookdict_paranoid(mp, key, hash, 1);

where lookdict_paranoid is a near-copy of lookdict that adds a
paranoia parameter and replaces those same lines with

            else {
                /* The compare did major nasty stuff *again*. */
                if (paranoia < PARANOIA_LEVEL) {
                    return lookdict_paranoid(mp, key, hash, paranoia+1);
                }
                /* Something is wrong; raise a RuntimeError. */
            }

History
Date	User	Action	Args
2012-03-06 20:33:48	Jim.Jewett	set	recipients: + Jim.Jewett, gvanrossum, rhettinger, vstinner, Mark.Shannon, python-dev
2012-03-06 20:33:48	Jim.Jewett	link	issue14205 messages
2012-03-06 20:33:47	Jim.Jewett	create