Message 150616 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Arfrever, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, jcea, lemburg, pitrou, terry.reedy, vstinner
Date	2012-01-04.16:42:04
SpamBayes Score	3.3029135e-14
Marked as misclassified	No
Message-id	<4F048156.30006@egenix.com>
In-reply-to	<1325656838.58.0.858793240155.issue13703@psf.upfronthosting.co.za>

Content
Some comments: 1. The security implications in all this is being somewhat overemphasized. There are many ways you can do a DoS attack on web servers. It's the responsibility of the used web frameworks and servers to deal with the possible cases. It's a good idea to provide some way to protect against hash collision attacks, but that will only solve one possible way of causing a resource attack on a server. There are other ways you can generate lots of CPU overhead with little data input (e.g. think of targeting the search feature on many Zope/Plone sites). In order to protect against such attacks in general, we'd have to provide a way to control CPU time and e.g. raise an exception if too much time is being spent on a simple operation such as a key insertion. This can be done using timers, signals or even under OS control. The easiest way to protect against the hash collision attack is by limiting the POST/GET/HEAD request size. The second best way would be to limit the number of parameters that a web framework accepts for POST/GET/HEAD request. 2. Changing the semantics of hashing in a dot release is not allowed. If randomization of the hash start vector or some other method is enabled by default in a dot release, this will change the semantics of any application switching to that dot release. The hash values of Python objects are not only used by the Python dictionary implementation, but also by other storage mechanisms such as on-disk dictionaries, inter-process object exchange via share memory, memcache, etc. Hence, if changed, the hash change should be disabled per default for dot releases and enabled for 3.3. 3. Changing the way strings are hashed doesn't solve the problem. Hash values of other types can easily be guessed as well, e.g. take integers which use a trivial hash function. We'd have to adapt all hash functions of the basic types in Python or come up with a generic solution using e.g. double-hashing in the dictionary/set implementations. 4. By just using a random start vector you change the absolute hash values for specific objects, but not the overall hash sequence or its period. An attacker only needs to create many hash collisions, not specific ones. It's the period of the hash function that's important in such attacks and that doesn't change when moving to a different start vector. 5. Hashing needs to be fast. It's one of the most used operations in Python. Please get experts into the boat like Tim Peters and Christian Tismer, who both have worked on the dict implementation and the hash functions, before experimenting with ad-hoc fixes. 6. Counting collisions could solve the issue without having to change hashing. Another idea would be counting the collisions and raising an exception if the number of collisions exceed a certain threshold. Such a change would work for all hashable Python objects and protect against the attack without changing any hash function. Thanks, -- Marc-Andre Lemburg eGenix.com ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Some comments:

1. The security implications in all this is being somewhat overemphasized.

There are many ways you can do a DoS attack on web servers. It's the
responsibility of the used web frameworks and servers to deal with
the possible cases.

It's a good idea to provide some way to protect against hash
collision attacks, but that will only solve one possible way of
causing a resource attack on a server.

There are other ways you can generate lots of CPU overhead with
little data input (e.g. think of targeting the search feature on
many Zope/Plone sites).

In order to protect against such attacks in general, we'd have to
provide a way to control CPU time and e.g. raise an exception if too
much time is being spent on a simple operation such as a key insertion.
This can be done using timers, signals or even under OS control.

The easiest way to protect against the hash collision attack is by
limiting the POST/GET/HEAD request size.

The second best way would be to limit the number of parameters that a
web framework accepts for POST/GET/HEAD request.

2. Changing the semantics of hashing in a dot release is not allowed.

If randomization of the hash start vector or some other method is
enabled by default in a dot release, this will change the semantics
of any application switching to that dot release.

The hash values of Python objects are not only used by the Python
dictionary implementation, but also by other storage mechanisms
such as on-disk dictionaries, inter-process object exchange via
share memory, memcache, etc.

Hence, if changed, the hash change should be disabled per default
for dot releases and enabled for 3.3.

3. Changing the way strings are hashed doesn't solve the problem.

Hash values of other types can easily be guessed as well, e.g.
take integers which use a trivial hash function.

We'd have to adapt all hash functions of the basic types in Python
or come up with a generic solution using e.g. double-hashing
in the dictionary/set implementations.

4. By just using a random start vector you change the absolute
hash values for specific objects, but not the overall hash sequence
or its period.

An attacker only needs to create many hash collisions, not
specific ones. It's the period of the hash function that's
important in such attacks and that doesn't change when moving to
a different start vector.

5. Hashing needs to be fast.

It's one of the most used operations in Python. Please get experts into
the boat like Tim Peters and Christian Tismer, who both have worked
on the dict implementation and the hash functions, before experimenting
with ad-hoc fixes.

6. Counting collisions could solve the issue without having to
change hashing.

Another idea would be counting the collisions and raising an
exception if the number of collisions exceed a certain
threshold.

Such a change would work for all hashable Python objects and
protect against the attack without changing any hash function.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

History
Date	User	Action	Args
2012-01-04 16:42:06	lemburg	set	recipients: + lemburg, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, alex, dmalcolm, Mark.Shannon, Zhiping.Deng, PaulMcMillan
2012-01-04 16:42:05	lemburg	link	issue13703 messages
2012-01-04 16:42:05	lemburg	create