This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dmalcolm
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-25.20:23:39
SpamBayes Score 1.2900463e-09
Marked as misclassified No
Message-id <1327522985.2388.57.camel@surprise>
In-reply-to <1327519559.3428.33.camel@localhost.localdomain>
Content
I think you're right: it will stop matching it during lookup within such
a dict, since the dict will be using the secondary hash for "abc", but
hash() for the C instance.

It will still match outside of the dict, and within other dicts.

So yes, this would be a subtle semantic change when under attack.
Bother.

Having said that, note that within the typical attack scenarios (HTTP
headers, HTTP POST, XML-RPC, JSON), we have a pure-str dict (or
sometimes a pure-bytes dict).  Potentially I could come up with a patch
that only performs this change for such a case (pure-str is easier,
given that we already track this), which would avoid the semantic change
you identify, whilst still providing protection against a wide range of
attacks.

Is it worth me working on this?

> > > Also, the level of complication is far higher than in any other of the
> > > proposed approaches so far (I mean those with patches), which isn't
> > > really a good thing.
> > 
> > So would I.  I want something I can backport, though.
> 
> Well, your approach sounds like it subtly and unpredictably changes the
> behaviour of dicts when there are too many collisions, so I'm not sure
> it's a good idea to backport it, either.
> 
> If we don't want to backport full hash randomization, I think I much
> prefer raising a BaseException when there are too many collisions,
> rather than this kind of (excessively) sophisticated workaround. You
> *are* changing a fundamental datatype in a rather complicated way.

Well, each approach has pros and cons, and we've circled around between
hash randomization vs collision counting vs other approaches for several
weeks.  I've supplied patches for 3 different approaches.

Is this discussion likely to reach a conclusion soon?  Would it be
regarded as rude if I unilaterally ship something close to:
  backport-of-hash-randomization-to-2.7-dmalcolm-2012-01-23-001.patch
in RHEL/Fedora, so that my users have some protection they can enable if
they get attacked? (see http://bugs.python.org/msg151847).  If I do
this, I can post the patches here in case other distributors want to
apply them.

As for python.org, who is empowered to make a decision here?  How can we
move this forward?
History
Date User Action Args
2012-01-25 20:23:41dmalcolmsetrecipients: + dmalcolm, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-25 20:23:40dmalcolmlinkissue13703 messages
2012-01-25 20:23:39dmalcolmcreate