Author pitrou
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, haypo, jcea, lemburg, mark.dickinson, merwok, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, zbysz
Date 2012-01-21.18:57:38
SpamBayes Score 4.70179e-14
Marked as misclassified No
Message-id <1327172141.3382.12.camel@localhost.localdomain>
In-reply-to <1327165340.4992.235.camel@surprise>
Content
> In this patch, rather than reset the count each time, I keep track of
> the total number of calls to insertdict() that have happened for each
> "large dict" (i.e. for which ma_table != ma_smalltable), and the total
> number of probe iterations that have been needed to service those
> insertions/overwrites.  It raises the exception when the *number of
> probe iterations per insertion* exceeds a threshold factor (rather than
> merely comparing the number of iterations against the current ma_used of
> the dict).

This sounds much more robust than the previous attempt.

> When attacked, this leads to exceptions like this:
> AlgorithmicComplexityError: dict construction used 1697 probes whilst
> performing 53 insertions (len() == 58) at key 58 with hash 42

We'll have to discuss the name of the exception and the error message :)

> Caveats:
> * no overflow handling (what happens after 2**32 modifications to a
> long-lived dict on a 32-bit build?) - though that's fixable.

How do you suggest to fix it?

> * no idea what the scaling factor for the threshold should be (there may
> also be a deep mathematical objection here, based on how big-O notation
> is defined in terms of an arbitrary scaling factor and limit)

I'd make the threshold factor a constant, e.g. 64 or 128 (it should not
be too small, to avoid false positives).
We're interested in the actual slowdown factor, which a constant factor
models adequately. It's the slowdown factor which makes a DOS attack
using this technique efficient. Whether or not dict construction truely
degenerates into a O(n**2) operation is less relevant.

There needs to be a way to disable it: an environment variable would be
the minimum IMO.
Also, in 3.3 there should probably be a sys function to enable or
disable it at runtime. Not sure it should be backported since it's a new
API.
History
Date User Action Args
2012-01-21 18:57:39pitrousetrecipients: + pitrou, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, haypo, christian.heimes, benjamin.peterson, merwok, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-21 18:57:38pitroulinkissue13703 messages
2012-01-21 18:57:38pitroucreate