Message 151956 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dmalcolm
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-25.17:49:18
SpamBayes Score	2.220446e-16
Marked as misclassified	No
Message-id	<1327513722.2388.20.camel@surprise>
In-reply-to	<1327495499.2388.12.camel@surprise>

Content
On Wed, 2012-01-25 at 12:45 +0000, Dave Malcolm wrote: > Dave Malcolm <dmalcolm@redhat.com> added the comment: > > I've found a bug in my patch; insertdict writes the old non-randomized > hash value into me_hash at: > ep->me_hash = hash; > rather than using the randomized hash, leading to issues when tested > against a real attack. I'm attaching a revised version of the patch that should fix the above issue: hybrid-approach-dmalcolm-2012-01-25-002.patch Changes relative to -001.patch: * updated insertdict() so that when it write ep->me_hash, it uses the correct hash value. Unfortunately there doesn't seem to be a good way of reusing the value we calculated in the "paranoid" ma_lookup callbacks, without breaking ABI (suggestions welcome), so we call PyObject_RandomizedHash again. * slightly reworked the two _paranoid ma_lookup callbacks to capture the randomized hash as a local variable, in case there's a way of reusing it in insertdict() * when lookdict() calls into itself, it now calls mp->ma_lookup instead * don't generate a fatal error with an unknown ma_lookup callback. With this, I'm able to insert 200,000 non-equal PyUnicodeObject with hash()==0 into a dict on a 32-bit build --with-pydebug in 2.2 seconds; it can retrieve all the values correctly in about 4 seconds [compare with ~1.5 hours of CPU churn for inserting the same data on an optimized build without the patch on the same guest]. The amortized ratio of total work done per modification increases linearly when under an O(N^2) attack, and the dict switches itself to paranoid mode 56 insertions after ma_table stops using ma_smalltable (that's when we start tracking stats). After the transition to paranoid mode, it drops to an average of a little under 2 probes per insertion (the amortized ratio seems to be converging to about 1.9 probes per key insertion at the point where my hostile test data runs out).

On Wed, 2012-01-25 at 12:45 +0000, Dave Malcolm wrote:
> Dave Malcolm <dmalcolm@redhat.com> added the comment:
> 
> I've found a bug in my patch; insertdict writes the old non-randomized
> hash value into me_hash at:
>         ep->me_hash = hash;
> rather than using the randomized hash, leading to issues when tested
> against a real attack.

I'm attaching a revised version of the patch that should fix the above
issue:
  hybrid-approach-dmalcolm-2012-01-25-002.patch

Changes relative to -001.patch:
* updated insertdict() so that when it write ep->me_hash, it uses the
correct hash value.  Unfortunately there doesn't seem to be a good way
of reusing the value we calculated in the "paranoid" ma_lookup
callbacks, without breaking ABI (suggestions welcome),  so we call
PyObject_RandomizedHash again.
* slightly reworked the two _paranoid ma_lookup callbacks to capture the
randomized hash as a local variable, in case there's a way of reusing it
in insertdict()
* when lookdict() calls into itself, it now calls mp->ma_lookup instead
* don't generate a fatal error with an unknown ma_lookup callback.

With this, I'm able to insert 200,000 non-equal PyUnicodeObject with
hash()==0 into a dict on a 32-bit build --with-pydebug in 2.2 seconds;
it can retrieve all the values correctly in about 4 seconds [compare
with ~1.5 hours of CPU churn for inserting the same data on an optimized
build without the patch on the same guest].

The amortized ratio of total work done per modification increases
linearly when under an O(N^2) attack, and the dict switches itself to
paranoid mode 56 insertions after ma_table stops using ma_smalltable
(that's when we start tracking stats).  After the transition to paranoid
mode, it drops to an average of a little under 2 probes per insertion
(the amortized ratio seems to be converging to about 1.9 probes per key
insertion at the point where my hostile test data runs out).

Files
File name	Uploaded
hybrid-approach-dmalcolm-2012-01-25-002.patch	dmalcolm, 2012-01-25.17:49:17

History
Date	User	Action	Args
2012-01-25 17:49:22	dmalcolm	set	recipients: + dmalcolm, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-25 17:49:18	dmalcolm	link	issue13703 messages
2012-01-25 17:49:18	dmalcolm	create