Message 151583 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gregory.p.smith
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-18.23:25:37
SpamBayes Score	3.7797543e-13
Marked as misclassified	No
Message-id	<CAGE7PNKD0eh4MyKFcQ5Zd-OLt7i8-mgMJmxcWupS4Xs6Ncdsgw@mail.gmail.com>
In-reply-to	<CAP7+vJKcyJDzMfHWd2Lz=nbfHQc8g6xVDmSBn83KZ1q-=_Wikw@mail.gmail.com>

Content
On Wed, Jan 18, 2012 at 1:10 PM, Guido van Rossum <report@bugs.python.org> wrote: > On Wed, Jan 18, 2012 at 1:05 PM, Antoine Pitrou <report@bugs.python.org>wrote: > > > > I would hope 3.3 only gets randomized hashing. Collision counting is a > > hack to make bugfix releases 99.999%-compatible instead of 99.9% ;) > > > > Really? I'd expect the difference to be more than 2 nines. The randomized > hashing has two problems: (a) change in dict order; (b) hash varies between > processes. I cannot imagine counterexamples to the collision counting that > weren't constructed specifically as counterexamples. For the purposes of 3.3 I'd prefer to just have randomized hashing and not the collision counting in order to keep things from getting too complicated. But I will not object if we opt to do both. As much as the counting idea rubs me wrong, even if it were on by default I agree that most non-contrived things will never encounter it and it is easy to document how to work around it by disabling it should anyone actually be impeded by it. The concern I have with that approach from a web service point of view is that it too can be gamed in the more rare server situation of someone managing to fill a persistent data structure up with enough colliding values to be _close_ to the limit such that the application then dies while trying to process all future requests that _hit_ the limit (a persisting 500 error DOS rather than an exception raised only in one offending request that deserved that 500 error anyways). Not nearly as likely a scenario but it is one I'd keep an eye open for with an attacker hat on. MvL's suggestion of using AVL trees for hash bucket slots instead of our linear slot finding algorithm is a better way to fix the ultimate problem by never devolving into linear behavior at all. It is naturally more complicated but could likely even be done while maintaining ABI compatibility. I haven't pondered designs and performance costs for that. Possibly a memory hit and one or two extra indirect lookups in the normal case and some additional complexity in the iteration case. -gps

On Wed, Jan 18, 2012 at 1:10 PM, Guido van Rossum
<report@bugs.python.org> wrote:
> On Wed, Jan 18, 2012 at 1:05 PM, Antoine Pitrou <report@bugs.python.org>wrote:
> >
> > I would hope 3.3 only gets randomized hashing. Collision counting is a
> > hack to make bugfix releases 99.999%-compatible instead of 99.9% ;)
> >
>
> Really? I'd expect the difference to be more than 2 nines. The randomized
> hashing has two problems: (a) change in dict order; (b) hash varies between
> processes. I cannot imagine counterexamples to the collision counting that
> weren't constructed specifically as counterexamples.

For the purposes of 3.3 I'd prefer to just have randomized hashing and
not the collision counting in order to keep things from getting too
complicated.  But I will not object if we opt to do both.

As much as the counting idea rubs me wrong, even if it were on by
default I agree that most non-contrived things will never encounter it
and it is easy to document how to work around it by disabling it
should anyone actually be impeded by it.

The concern I have with that approach from a web service point of view
is that it too can be gamed in the more rare server situation of
someone managing to fill a persistent data structure up with enough
colliding values to be _close_ to the limit such that the application
then dies while trying to process all future requests that _hit_ the
limit (a persisting 500 error DOS rather than an exception raised only
in one offending request that deserved that 500 error anyways). Not
nearly as likely a scenario but it is one I'd keep an eye open for
with an attacker hat on.

MvL's suggestion of using AVL trees for hash bucket slots instead of
our linear slot finding algorithm is a better way to fix the ultimate
problem by never devolving into linear behavior at all. It is
naturally more complicated but could likely even be done while
maintaining ABI compatibility. I haven't pondered designs and
performance costs for that. Possibly a memory hit and one or two extra
indirect lookups in the normal case and some additional complexity in
the iteration case.

-gps

History
Date	User	Action	Args
2012-01-18 23:25:38	gregory.p.smith	set	recipients: + gregory.p.smith, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-18 23:25:37	gregory.p.smith	link	issue13703 messages
2012-01-18 23:25:37	gregory.p.smith	create