Message 151685 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-20.11:17:31
SpamBayes Score	1.5629165e-12
Marked as misclassified	No
Message-id	<4F194D43.4060705@egenix.com>
In-reply-to	<CAH_1eM0cdFYPJgbMgbR60o-9B_t_4prfxfMcayCO7ZUjqg1yHQ@mail.gmail.com>

Content
Charles-François Natali wrote: > > Anyway, I still think that the hash randomization is the right way to > go, simply because it does solve the problem, whereas the collision > counting doesn't: Martin made a very good point on python-dev with his > database example. For completeness, I quote Martin here: """ The main issue with that approach is that it allows a new kind of attack. An attacker now needs to find 1000 colliding keys, and submit them one-by-one into a database. The limit will not trigger, as those are just database insertions. Now, if the applications also as a need to read the entire database table into a dictionary, that will suddenly break, and not for the attacker (which would be ok), but for the regular user of the application or the site administrator. So it may be that this approach actually simplifies the attack, making the cure worse than the disease. """ Martin is correct in that it is possible to trick an application into building some data pool which can then be used as indirect input for an attack. What I don't see is what's wrong with the application raising an exception in case it finds such data in an untrusted source (reading arbitrary amounts of user data from a database is just as dangerous as reading such data from any other source). The exception will tell the programmer to be more careful and patch the application not to read untrusted data without additional precautions. It will also tell the maintainer of the application that there was indeed an attack on the system which may need to be tracked down. Note that the collision counting demo patch is trivial - I just wanted to demonstrate how it works. As already mentioned, there's room for improvement: If Python objects were to provide an additional method for calculating a universal hash value (based on an integer input parameter), the dictionary in question could use this to rehash itself and avoid the attack. Think of this as "randomization when needed". (*) Since the dict would still detect the problem, it could also raise a warning to inform the maintainer of the application. So you get the best of both worlds and randomization would only kick in when it's really needed to keep the application running.

Charles-François Natali wrote:
> 
> Anyway, I still think that the hash randomization is the right way to
> go, simply because it does solve the problem, whereas the collision
> counting doesn't: Martin made a very good point on python-dev with his
> database example.

For completeness, I quote Martin here:

"""
The main issue with that approach is that it allows a new kind of attack.

An attacker now needs to find 1000 colliding keys, and submit them
one-by-one into a database. The limit will not trigger, as those are
just database insertions.

Now, if the applications also as a need to read the entire database
table into a dictionary, that will suddenly break, and not for the
attacker (which would be ok), but for the regular user of the
application or the site administrator.

So it may be that this approach actually simplifies the attack, making
the cure worse than the disease.
"""

Martin is correct in that it is possible to trick an application
into building some data pool which can then be used as indirect
input for an attack.

What I don't see is what's wrong with the application raising
an exception in case it finds such data in an untrusted source
(reading arbitrary amounts of user data from a database is just
as dangerous as reading such data from any other source).

The exception will tell the programmer to be more careful and
patch the application not to read untrusted data without
additional precautions.

It will also tell the maintainer of the application that there
was indeed an attack on the system which may need to be
tracked down.

Note that the collision counting demo patch is trivial - I just
wanted to demonstrate how it works. As already mentioned, there's
room for improvement:

If Python objects were to provide an additional
method for calculating a universal hash value (based on an
integer input parameter), the dictionary in question could
use this to rehash itself and avoid the attack. Think of this
as "randomization when needed". (*)

Since the dict would still detect the problem, it could also
raise a warning to inform the maintainer of the application.

So you get the best of both worlds and randomization would only
kick in when it's really needed to keep the application running.

History
Date	User	Action	Args
2012-01-20 11:17:32	lemburg	set	recipients: + lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-20 11:17:32	lemburg	link	issue13703 messages
2012-01-20 11:17:31	lemburg	create