Message 151739 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-21.18:57:38
SpamBayes Score	4.7017945e-14
Marked as misclassified	No
Message-id	<1327172141.3382.12.camel@localhost.localdomain>
In-reply-to	<1327165340.4992.235.camel@surprise>

Content
> In this patch, rather than reset the count each time, I keep track of > the total number of calls to insertdict() that have happened for each > "large dict" (i.e. for which ma_table != ma_smalltable), and the total > number of probe iterations that have been needed to service those > insertions/overwrites. It raises the exception when the number of > probe iterations per insertion exceeds a threshold factor (rather than > merely comparing the number of iterations against the current ma_used of > the dict). This sounds much more robust than the previous attempt. > When attacked, this leads to exceptions like this: > AlgorithmicComplexityError: dict construction used 1697 probes whilst > performing 53 insertions (len() == 58) at key 58 with hash 42 We'll have to discuss the name of the exception and the error message :) > Caveats: > * no overflow handling (what happens after 2*32 modifications to a > long-lived dict on a 32-bit build?) - though that's fixable. How do you suggest to fix it? > no idea what the scaling factor for the threshold should be (there may > also be a deep mathematical objection here, based on how big-O notation > is defined in terms of an arbitrary scaling factor and limit) I'd make the threshold factor a constant, e.g. 64 or 128 (it should not be too small, to avoid false positives). We're interested in the actual slowdown factor, which a constant factor models adequately. It's the slowdown factor which makes a DOS attack using this technique efficient. Whether or not dict construction truely degenerates into a O(n**2) operation is less relevant. There needs to be a way to disable it: an environment variable would be the minimum IMO. Also, in 3.3 there should probably be a sys function to enable or disable it at runtime. Not sure it should be backported since it's a new API.

> In this patch, rather than reset the count each time, I keep track of
> the total number of calls to insertdict() that have happened for each
> "large dict" (i.e. for which ma_table != ma_smalltable), and the total
> number of probe iterations that have been needed to service those
> insertions/overwrites.  It raises the exception when the *number of
> probe iterations per insertion* exceeds a threshold factor (rather than
> merely comparing the number of iterations against the current ma_used of
> the dict).

This sounds much more robust than the previous attempt.

> When attacked, this leads to exceptions like this:
> AlgorithmicComplexityError: dict construction used 1697 probes whilst
> performing 53 insertions (len() == 58) at key 58 with hash 42

We'll have to discuss the name of the exception and the error message :)

> Caveats:
> * no overflow handling (what happens after 2**32 modifications to a
> long-lived dict on a 32-bit build?) - though that's fixable.

How do you suggest to fix it?

> * no idea what the scaling factor for the threshold should be (there may
> also be a deep mathematical objection here, based on how big-O notation
> is defined in terms of an arbitrary scaling factor and limit)

I'd make the threshold factor a constant, e.g. 64 or 128 (it should not
be too small, to avoid false positives).
We're interested in the actual slowdown factor, which a constant factor
models adequately. It's the slowdown factor which makes a DOS attack
using this technique efficient. Whether or not dict construction truely
degenerates into a O(n**2) operation is less relevant.

There needs to be a way to disable it: an environment variable would be
the minimum IMO.
Also, in 3.3 there should probably be a sys function to enable or
disable it at runtime. Not sure it should be backported since it's a new
API.

History
Date	User	Action	Args
2012-01-21 18:57:39	pitrou	set	recipients: + pitrou, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-21 18:57:38	pitrou	link	issue13703 messages
2012-01-21 18:57:38	pitrou	create