Message 151735 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dmalcolm
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-21.17:02:53
SpamBayes Score	3.330669e-16
Marked as misclassified	No
Message-id	<1327165340.4992.235.camel@surprise>
In-reply-to	<1327155912.3382.4.camel@localhost.localdomain>

Content
On Sat, 2012-01-21 at 14:27 +0000, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > > Thoughts? (apart from "ugh! it's ugly!" yes I know - it's late here) > > Is it guaranteed that no usage pattern can render this protection > inefficient? What if a dict is constructed by intermingling lookups and > inserts? > Similarly, what happens with e.g. the common use case of > dictdefault(list), where you append() after the lookup/insert? Does some > key distribution allow the attack while circumventing the protection? Yes, I agree that I was making an unrealistic assumption about usage patterns. There was also some global state (the "is_inserting" variable). I've tweaked the approach somewhat, moved the global to be per-dict, and am attaching a revised version of the patch: amortized-probe-counting-dmalcolm-2012-01-21-003.patch In this patch, rather than reset the count each time, I keep track of the total number of calls to insertdict() that have happened for each "large dict" (i.e. for which ma_table != ma_smalltable), and the total number of probe iterations that have been needed to service those insertions/overwrites. It raises the exception when the number of probe iterations per insertion exceeds a threshold factor (rather than merely comparing the number of iterations against the current ma_used of the dict). I believe this means that it's tracking and checking every time the dict is modified, and (I hope) protects us against any data that drives the dict implementation away from linear behavior (because that's essentially what it's testing for). [the per-dict stats are reset each time that it shrinks down to using ma_smalltable again, but I think at-risk usage patterns in which that occurs are uncommon relative to those in which it doesn't]. When attacked, this leads to exceptions like this: AlgorithmicComplexityError: dict construction used 1697 probes whilst performing 53 insertions (len() == 58) at key 58 with hash 42 i.e we have a dictionary containing 58 keys, which has seen 53 insert/overwrite operations since transitioning to the non-ma_smalltable representation (at size 6); presumably it has 128 PyDictEntries. Servicing those 53 operations has required a total 1697 iterations through the probing loop, or a little over 32 probes per insert. I just did a full run of the test suite (using run_tests.py), and it mostly passed the new tests I've added (included the test for scenario 2 from Frank's email). There were two failures: ====================================================================== FAIL: test_inheritance (test.test_pep352.ExceptionClassTests) ---------------------------------------------------------------------- AssertionError: 1 != 0 : {'AlgorithmicComplexityError'} not accounted for ---------------------------------------------------------------------- which is obviously fixable (given a decision on where the exception lives in the hierarchy) and this one: test test_mutants crashed -- Traceback (most recent call last): File "/home/david/coding/python-hg/cpython-count-collisions/Lib/test/regrtest.py", line 1214, in runtest_inner the_package = __import__(abstest, globals(), locals(), []) File "/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 159, in <module> test(100) File "/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 156, in test test_one(random.randrange(1, 100)) File "/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 132, in test_one dict2keys = fill_dict(dict2, range(n), n) File "/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 118, in fill_dict Horrid(random.choice(candidates)) AlgorithmicComplexityError: dict construction used 2753 probes whilst performing 86 insertions (len() == 64) at key Horrid(86) with hash 42 though that seems to be deliberately degenerate code. Caveats: * no overflow handling (what happens after 2*32 modifications to a long-lived dict on a 32-bit build?) - though that's fixable. no idea what the scaling factor for the threshold should be (there may also be a deep mathematical objection here, based on how big-O notation is defined in terms of an arbitrary scaling factor and limit) * not optimized; I haven't looked at performance yet * doesn't cover set(), though that also has spare space (I hope) via its own smalltable array. BTW, note that although I've been working on this variant of the collision counting approach, I'm not opposed to the hash randomization approach, or to adding extra checks in strategic places within the stdlib: I'm keen to get some kind of appropriate fix approved by the upstream Python development community so I can backport it to the various recent-to-ancient versions of CPython I support in RHEL (and Fedora), before we start seeing real-world attacks. Hope this is helpful Dave

On Sat, 2012-01-21 at 14:27 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> > Thoughts? (apart from "ugh! it's ugly!" yes I know - it's late here)
> 
> Is it guaranteed that no usage pattern can render this protection
> inefficient? What if a dict is constructed by intermingling lookups and
> inserts?
> Similarly, what happens with e.g. the common use case of
> dictdefault(list), where you append() after the lookup/insert? Does some
> key distribution allow the attack while circumventing the protection?

Yes, I agree that I was making an unrealistic assumption about usage
patterns.  There was also some global state (the "is_inserting"
variable).

I've tweaked the approach somewhat, moved the global to be per-dict, and
am attaching a revised version of the patch:
   amortized-probe-counting-dmalcolm-2012-01-21-003.patch

In this patch, rather than reset the count each time, I keep track of
the total number of calls to insertdict() that have happened for each
"large dict" (i.e. for which ma_table != ma_smalltable), and the total
number of probe iterations that have been needed to service those
insertions/overwrites.  It raises the exception when the *number of
probe iterations per insertion* exceeds a threshold factor (rather than
merely comparing the number of iterations against the current ma_used of
the dict).  I believe this means that it's tracking and checking every
time the dict is modified, and (I hope) protects us against any data
that drives the dict implementation away from linear behavior (because
that's essentially what it's testing for).  [the per-dict stats are
reset each time that it shrinks down to using ma_smalltable again, but I
think at-risk usage patterns in which that occurs are uncommon relative
to those in which it doesn't].

When attacked, this leads to exceptions like this:
AlgorithmicComplexityError: dict construction used 1697 probes whilst
performing 53 insertions (len() == 58) at key 58 with hash 42

i.e we have a dictionary containing 58 keys, which has seen 53
insert/overwrite operations since transitioning to the non-ma_smalltable
representation (at size 6); presumably it has 128 PyDictEntries.
Servicing those 53 operations has required a total 1697 iterations
through the probing loop, or a little over 32 probes per insert.

I just did a full run of the test suite (using run_tests.py), and it
mostly passed the new tests I've added (included the test for scenario 2
from Frank's email).

There were two failures:
======================================================================
FAIL: test_inheritance (test.test_pep352.ExceptionClassTests)
----------------------------------------------------------------------
AssertionError: 1 != 0 : {'AlgorithmicComplexityError'} not accounted
for
----------------------------------------------------------------------
which is obviously fixable (given a decision on where the exception
lives in the hierarchy)

and this one:
test test_mutants crashed -- Traceback (most recent call last):
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/regrtest.py", line 1214, in runtest_inner
    the_package = __import__(abstest, globals(), locals(), [])
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 159, in <module>
    test(100)
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 156, in test
    test_one(random.randrange(1, 100))
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 132, in test_one
    dict2keys = fill_dict(dict2, range(n), n)
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 118, in fill_dict
    Horrid(random.choice(candidates))
AlgorithmicComplexityError: dict construction used 2753 probes whilst
performing 86 insertions (len() == 64) at key Horrid(86) with hash 42
though that seems to be deliberately degenerate code.

Caveats:
* no overflow handling (what happens after 2**32 modifications to a
long-lived dict on a 32-bit build?) - though that's fixable.
* no idea what the scaling factor for the threshold should be (there may
also be a deep mathematical objection here, based on how big-O notation
is defined in terms of an arbitrary scaling factor and limit)
* not optimized; I haven't looked at performance yet
* doesn't cover set(), though that also has spare space (I hope) via its
own smalltable array.

BTW, note that although I've been working on this variant of the
collision counting approach, I'm not opposed to the hash randomization
approach, or to adding extra checks in strategic places within the
stdlib: I'm keen to get some kind of appropriate fix approved by the
upstream Python development community so I can backport it to the
various recent-to-ancient versions of CPython I support in RHEL (and
Fedora), before we start seeing real-world attacks.

Hope this is helpful
Dave

Files
File name	Uploaded
amortized-probe-counting-dmalcolm-2012-01-21-003.patch	dmalcolm, 2012-01-21.17:02:53

History
Date	User	Action	Args
2012-01-21 17:02:57	dmalcolm	set	recipients: + dmalcolm, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-21 17:02:55	dmalcolm	link	issue13703 messages
2012-01-21 17:02:53	dmalcolm	create