This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gvanrossum
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-20.17:39:24
SpamBayes Score 3.330669e-16
Marked as misclassified No
Message-id <CAP7+vJJjNfzfaibbXXrXDjkzy_HC20H4GNoGbLQmK4o8087WRg@mail.gmail.com>
In-reply-to <1327035517.22.0.845370766756.issue13703@psf.upfronthosting.co.za>
Content
On Thu, Jan 19, 2012 at 8:58 PM, Frank Sievertsen <report@bugs.python.org>wrote:

>
> Frank Sievertsen <python@sievertsen.de> added the comment:
>
> >> That's true. But without the suffix, I can pretty easy and efficient
> >> guess the prefix by just seeing the result of a few well-chosen and
> >> short repr(dict(X)). I suppose that's harder with the suffix.
>
> > Since the hash function is known, it doesn't make things much
> > harder. Without suffix you just need hash('') to find out what
> > the prefix is. With suffix, two values are enough
>
> This is obvious and absolutely correct!
>
> But it's not what I talked about. I didn't talk about the result of
> hash(X), but about the result of repr(dict([(str: val), (str:
> val)....])), which is more likely to happen and not so trivial
> (if you want to know more than the last 8 bits)
>
> IMHO this problem shows that we can't advice dict() or set() for
> (potential dangerous) user-supplied keys at the moment.
>
> I prefer randomization because it fixes this problem. The
> collision-counting->exception prevents a software from becoming slow,
> but it doesn't make it work as expected.
>

That depends. If collision counting prevents the DoS attack that may be
"work as expected", assuming you believe (as I do) that "real life" data
won't ever have that many collisions.

Note that every web service is vulnerable to some form of DoS where a
sufficient number of malicious requests will keep all available servers
occupied so legitimate requests suffer delays and timeouts. The defense is
to have sufficient capacity so that a potential attacker would need a large
amount of resources to do any real damage. The hash collision attack vastly
reduces the amount of resources needed to bring down a service; crashing
early moves the balance of power significantly back, and that's all we can
ask for.

Sure, you can catch the exception. But when you get the exception,
> probably you wanted to add the items for a reason: Because you want
> them to be in the dict and that's how your software works.
>

No, real data would never make this happen, so it's a "don't care" case (at
least for the vast majority of services). An attacker could also send you
such a large amount of data that your server runs out of memory, or starts
swapping (which is almost worse). But that requires for the attacker to
have enough bandwidth to send you that data. Or they could send you very
many requests. Same requirement.

All we need to guard for here is the unfortunate multiplication of the
attacker's effort due to the behavior of the collision-resolution code in
the dict implementation. Beyond that it's every app for itself.

> Imagine an irc-server using a dict to store the connected users, using
> the nicknames as keys. Even if the irc-server catches the unexpected
> exception while connecting a new user (when adding his/her name to the
> dict), an attacker could connect 999 special-named users to prevent a
> specific user from connecting in future.
>

Or they could use many other tactics. At this point the attack is specific
to this IRC implementation and it's no longer Python's responsibility.

> Collision-counting->exception can make it possible to inhibit a
> specific future add to the dict. The outcome is highly application
> dependent.
>
> I think it fixes 95% of the attack-vectors, but not all and it adds a
> few new risks. However, of course it's much better then doing nothing
> to fix the problem.
>

Right -- it vastly increases the effort needed to attack any particular
service, and does not affect any behavior of existing Python apps.
History
Date User Action Args
2012-01-20 17:39:26gvanrossumsetrecipients: + gvanrossum, lemburg, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-20 17:39:25gvanrossumlinkissue13703 messages
2012-01-20 17:39:24gvanrossumcreate