This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-18.23:25:37
SpamBayes Score 3.7797543e-13
Marked as misclassified No
Message-id <CAGE7PNKD0eh4MyKFcQ5Zd-OLt7i8-mgMJmxcWupS4Xs6Ncdsgw@mail.gmail.com>
In-reply-to <CAP7+vJKcyJDzMfHWd2Lz=nbfHQc8g6xVDmSBn83KZ1q-=_Wikw@mail.gmail.com>
Content
On Wed, Jan 18, 2012 at 1:10 PM, Guido van Rossum
<report@bugs.python.org> wrote:
> On Wed, Jan 18, 2012 at 1:05 PM, Antoine Pitrou <report@bugs.python.org>wrote:
> >
> > I would hope 3.3 only gets randomized hashing. Collision counting is a
> > hack to make bugfix releases 99.999%-compatible instead of 99.9% ;)
> >
>
> Really? I'd expect the difference to be more than 2 nines. The randomized
> hashing has two problems: (a) change in dict order; (b) hash varies between
> processes. I cannot imagine counterexamples to the collision counting that
> weren't constructed specifically as counterexamples.

For the purposes of 3.3 I'd prefer to just have randomized hashing and
not the collision counting in order to keep things from getting too
complicated.  But I will not object if we opt to do both.

As much as the counting idea rubs me wrong, even if it were on by
default I agree that most non-contrived things will never encounter it
and it is easy to document how to work around it by disabling it
should anyone actually be impeded by it.

The concern I have with that approach from a web service point of view
is that it too can be gamed in the more rare server situation of
someone managing to fill a persistent data structure up with enough
colliding values to be _close_ to the limit such that the application
then dies while trying to process all future requests that _hit_ the
limit (a persisting 500 error DOS rather than an exception raised only
in one offending request that deserved that 500 error anyways). Not
nearly as likely a scenario but it is one I'd keep an eye open for
with an attacker hat on.

MvL's suggestion of using AVL trees for hash bucket slots instead of
our linear slot finding algorithm is a better way to fix the ultimate
problem by never devolving into linear behavior at all. It is
naturally more complicated but could likely even be done while
maintaining ABI compatibility. I haven't pondered designs and
performance costs for that. Possibly a memory hit and one or two extra
indirect lookups in the normal case and some additional complexity in
the iteration case.

-gps
History
Date User Action Args
2012-01-18 23:25:38gregory.p.smithsetrecipients: + gregory.p.smith, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-18 23:25:37gregory.p.smithlinkissue13703 messages
2012-01-18 23:25:37gregory.p.smithcreate