Author dmalcolm
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, haypo, jcea, lemburg, mark.dickinson, merwok, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, zbysz
Date 2012-01-20.22:55:14
SpamBayes Score 1.16462e-13
Marked as misclassified No
Message-id <1327100080.4992.149.camel@surprise>
In-reply-to <1325854341.31.0.572411522028.issue13703@psf.upfronthosting.co.za>
Content
On Fri, 2012-01-06 at 12:52 +0000, Marc-Andre Lemburg wrote:
> Marc-Andre Lemburg <mal@egenix.com> added the comment:
> 
> Demo patch implementing the collision limit idea for Python 2.7.
> 
> ----------
> Added file: http://bugs.python.org/file24151/hash-attack.patch
> 

Marc: is this the latest version of your patch?

Whether or not we go with collision counting and/or adding a random salt
to hashes and/or something else, I've had a go at updating your patch

Although debate on python-dev seems to have turned against the
collision-counting idea, based on flaws reported by Frank Sievertsen
http://mail.python.org/pipermail/python-dev/2012-January/115726.html
it seemed to me to be worth at least adding some test cases to flesh out
the approach.  Note that the test cases deliberately avoid containing
"hostile" data.

Am attaching an updated version which:
  * adds various FIXMEs (my patch isn't ready yet, but I wanted to get
more eyes on this)

  * introduces a new TooManyHashCollisions exception, and uses that
rather than KeyError (currently it extends BaseException; am not sure
where it should sit in the exception hierarchy).

  * adds debug text to the above exception, including the repr() and
hash of the key for which the issue was triggered:
  TooManyHashCollisions: 1001 hash collisions within dict at key
ChosenHash(999, 42) with hash 42

  * moves exception-setting to a helper function, to avoid duplicated
code

  * adds a sys.max_dict_collisions, though currently with just a
copy-and-paste of the 1000 value from dictobject.c

  * starts adding a test suite to test_dict.py, using a ChosenHash
helper class (to avoid having to publish hostile data), and a context
manager for ensuring the timings of various operations fall within sane
bounds, so I can do things like this:
        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
            for i in range(sys.max_dict_collisions -1 ):
                key = ChosenHash(i, 42)
                d[key] = 0

The test suite reproduces the TooManyHashCollisions response to a basic
DoS, and also "successfully" fails due to scenario 2 in Frank's email
above (assuming I understood his email correctly).

Presumably this could also incorporate a reproducer for scenario 1 in
this email, though I don't have one yet (but I don't want to make
hostile data public).

The patch doesn't yet do anything for sets.

Hope this is helpful
Dave
Files
File name Uploaded
hash-collision-counting-dmalcolm-2012-01-20-001.patch dmalcolm, 2012-01-20.22:55:12
History
Date User Action Args
2012-01-20 22:55:17dmalcolmsetrecipients: + dmalcolm, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, haypo, christian.heimes, benjamin.peterson, merwok, grahamd, Arfrever, v+python, alex, zbysz, skrah, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-20 22:55:15dmalcolmlinkissue13703 messages
2012-01-20 22:55:14dmalcolmcreate