This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dmalcolm
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-25.17:49:18
SpamBayes Score 2.220446e-16
Marked as misclassified No
Message-id <1327513722.2388.20.camel@surprise>
In-reply-to <1327495499.2388.12.camel@surprise>
Content
On Wed, 2012-01-25 at 12:45 +0000, Dave Malcolm wrote:
> Dave Malcolm <dmalcolm@redhat.com> added the comment:
> 
> I've found a bug in my patch; insertdict writes the old non-randomized
> hash value into me_hash at:
>         ep->me_hash = hash;
> rather than using the randomized hash, leading to issues when tested
> against a real attack.

I'm attaching a revised version of the patch that should fix the above
issue:
  hybrid-approach-dmalcolm-2012-01-25-002.patch

Changes relative to -001.patch:
* updated insertdict() so that when it write ep->me_hash, it uses the
correct hash value.  Unfortunately there doesn't seem to be a good way
of reusing the value we calculated in the "paranoid" ma_lookup
callbacks, without breaking ABI (suggestions welcome),  so we call
PyObject_RandomizedHash again.
* slightly reworked the two _paranoid ma_lookup callbacks to capture the
randomized hash as a local variable, in case there's a way of reusing it
in insertdict()
* when lookdict() calls into itself, it now calls mp->ma_lookup instead
* don't generate a fatal error with an unknown ma_lookup callback.

With this, I'm able to insert 200,000 non-equal PyUnicodeObject with
hash()==0 into a dict on a 32-bit build --with-pydebug in 2.2 seconds;
it can retrieve all the values correctly in about 4 seconds [compare
with ~1.5 hours of CPU churn for inserting the same data on an optimized
build without the patch on the same guest].

The amortized ratio of total work done per modification increases
linearly when under an O(N^2) attack, and the dict switches itself to
paranoid mode 56 insertions after ma_table stops using ma_smalltable
(that's when we start tracking stats).  After the transition to paranoid
mode, it drops to an average of a little under 2 probes per insertion
(the amortized ratio seems to be converging to about 1.9 probes per key
insertion at the point where my hostile test data runs out).
Files
File name Uploaded
hybrid-approach-dmalcolm-2012-01-25-002.patch dmalcolm, 2012-01-25.17:49:17
History
Date User Action Args
2012-01-25 17:49:22dmalcolmsetrecipients: + dmalcolm, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-25 17:49:18dmalcolmlinkissue13703 messages
2012-01-25 17:49:18dmalcolmcreate