Author lemburg
Recipients Arfrever, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, jcea, lemburg, pitrou, terry.reedy, vstinner
Date 2012-01-04.16:42:04
SpamBayes Score 3.30291e-14
Marked as misclassified No
Message-id <>
In-reply-to <>
Some comments:

1. The security implications in all this is being somewhat overemphasized.

There are many ways you can do a DoS attack on web servers. It's the
responsibility of the used web frameworks and servers to deal with
the possible cases.

It's a good idea to provide some way to protect against hash
collision attacks, but that will only solve one possible way of
causing a resource attack on a server.

There are other ways you can generate lots of CPU overhead with
little data input (e.g. think of targeting the search feature on
many Zope/Plone sites).

In order to protect against such attacks in general, we'd have to
provide a way to control CPU time and e.g. raise an exception if too
much time is being spent on a simple operation such as a key insertion.
This can be done using timers, signals or even under OS control.

The easiest way to protect against the hash collision attack is by
limiting the POST/GET/HEAD request size.

The second best way would be to limit the number of parameters that a
web framework accepts for POST/GET/HEAD request.

2. Changing the semantics of hashing in a dot release is not allowed.

If randomization of the hash start vector or some other method is
enabled by default in a dot release, this will change the semantics
of any application switching to that dot release.

The hash values of Python objects are not only used by the Python
dictionary implementation, but also by other storage mechanisms
such as on-disk dictionaries, inter-process object exchange via
share memory, memcache, etc.

Hence, if changed, the hash change should be disabled per default
for dot releases and enabled for 3.3.

3. Changing the way strings are hashed doesn't solve the problem.

Hash values of other types can easily be guessed as well, e.g.
take integers which use a trivial hash function.

We'd have to adapt all hash functions of the basic types in Python
or come up with a generic solution using e.g. double-hashing
in the dictionary/set implementations.

4. By just using a random start vector you change the absolute
hash values for specific objects, but not the overall hash sequence
or its period.

An attacker only needs to create many hash collisions, not
specific ones. It's the period of the hash function that's
important in such attacks and that doesn't change when moving to
a different start vector.

5. Hashing needs to be fast.

It's one of the most used operations in Python. Please get experts into
the boat like Tim Peters and Christian Tismer, who both have worked
on the dict implementation and the hash functions, before experimenting
with ad-hoc fixes.

6. Counting collisions could solve the issue without having to
change hashing.

Another idea would be counting the collisions and raising an
exception if the number of collisions exceed a certain

Such a change would work for all hashable Python objects and
protect against the attack without changing any hash function.

Marc-Andre Lemburg


::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
Date User Action Args
2012-01-04 16:42:06lemburgsetrecipients: + lemburg, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, alex, dmalcolm, Mark.Shannon, Zhiping.Deng, PaulMcMillan
2012-01-04 16:42:05lemburglinkissue13703 messages
2012-01-04 16:42:05lemburgcreate