This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-20.11:17:31
SpamBayes Score 1.5629165e-12
Marked as misclassified No
Message-id <4F194D43.4060705@egenix.com>
In-reply-to <CAH_1eM0cdFYPJgbMgbR60o-9B_t_4prfxfMcayCO7ZUjqg1yHQ@mail.gmail.com>
Content
Charles-François Natali wrote:
> 
> Anyway, I still think that the hash randomization is the right way to
> go, simply because it does solve the problem, whereas the collision
> counting doesn't: Martin made a very good point on python-dev with his
> database example.

For completeness, I quote Martin here:

"""
The main issue with that approach is that it allows a new kind of attack.

An attacker now needs to find 1000 colliding keys, and submit them
one-by-one into a database. The limit will not trigger, as those are
just database insertions.

Now, if the applications also as a need to read the entire database
table into a dictionary, that will suddenly break, and not for the
attacker (which would be ok), but for the regular user of the
application or the site administrator.

So it may be that this approach actually simplifies the attack, making
the cure worse than the disease.
"""

Martin is correct in that it is possible to trick an application
into building some data pool which can then be used as indirect
input for an attack.

What I don't see is what's wrong with the application raising
an exception in case it finds such data in an untrusted source
(reading arbitrary amounts of user data from a database is just
as dangerous as reading such data from any other source).

The exception will tell the programmer to be more careful and
patch the application not to read untrusted data without
additional precautions.

It will also tell the maintainer of the application that there
was indeed an attack on the system which may need to be
tracked down.

Note that the collision counting demo patch is trivial - I just
wanted to demonstrate how it works. As already mentioned, there's
room for improvement:

If Python objects were to provide an additional
method for calculating a universal hash value (based on an
integer input parameter), the dictionary in question could
use this to rehash itself and avoid the attack. Think of this
as "randomization when needed". (*)

Since the dict would still detect the problem, it could also
raise a warning to inform the maintainer of the application.

So you get the best of both worlds and randomization would only
kick in when it's really needed to keep the application running.
History
Date User Action Args
2012-01-20 11:17:32lemburgsetrecipients: + lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-20 11:17:32lemburglinkissue13703 messages
2012-01-20 11:17:31lemburgcreate