Author loewis
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, haypo, jcea, lemburg, loewis, mark.dickinson, merwok, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, zbysz
Date 2012-01-26.22:34:51
SpamBayes Score 1.08935e-12
Marked as misclassified No
Message-id <20120126233450.Horde.fi7KWVNNcXdPIdUKkOc26PA@webmail.df.eu>
In-reply-to <1327615964.2219.35.camel@surprise>
Content
> as soon as any key insertion or lookup occurs where the key isn't
> exactly one of the correct types, the dict flattens any AVL trees back
> into the regular flat representation (and switches to lookdict for
> ma_lookup), analogous to the existing ma_lookup transition on dicts.

Correct.

> TREE_DELETE() invocations as appropriate; ultimately, the AVL macros
> call back to within node_cmp():
>    PyObject_Compare(left->key, right->key)

Correct.

> I suspect that you can't plug arbitrary user-defined types into it,
> since there's no way to guarantee that ordering and comparison work in
> the ways that the AVL lookup requires.

That's all true. It would be desirable to automatically determine which
types also support total order in addition to hashing, alas, there is
currently no protocol for it. On the contrary, Python has moved away
of assuming that all objects form a total order.

> [thinking aloud, if a pair of
> objects don't implement comparison at the PyObject_Compare level, is it
> possible to instead simply compare the addresses of the objects?

2.x has an elaborate logic to provide a total order on objects. It
took the entire 1.x and 2.x series to fix issues with that order, only
to recognize that it is not feasible to provide one - hence the introduction
of rich comparisons and the omission of cmp in 3.x.

For the dictionary, using addresses does not work: the order of objects
needs to be consistent with equality (i.e. x < y and x == y must not
simultaneously hold, as must x == y and y < z imply that also x < z,
else the tree lookup won't find the equal keys).

> Going higher-level, I feel that there are plenty of attacks against
> pure-str/bytes dicts, and having protection against them is worthwhile -
> even if there's no direct way to use it to protect the use-case you
> describe.

Indeed. This issue doesn't need to fix *all* possible attacks using
hash collisions. Instead, it needs to cover the common case, and it
needs to allow users to rewrite their code so that they can protect
it against this family of attacks.
History
Date User Action Args
2012-01-26 22:34:52loewissetrecipients: + loewis, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, haypo, christian.heimes, benjamin.peterson, merwok, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-26 22:34:51loewislinkissue13703 messages
2012-01-26 22:34:51loewiscreate