Message 152051 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, loewis, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-26.23:43:49
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<20120127004349.Horde.PVOKDFNNcXdPIeU1WyDHUCA@webmail.df.eu>
In-reply-to	<CAFRnB2VH2m6h3C9q-1k_0A4gG8KYfKc=vCoL+ygv4ZJ_UYJkmQ@mail.gmail.com>

Content
> I'm sorry then, but I'm a little confused. I think we pretty clearly > established earlier that requiring users to make changes anywhere they > stored user data would be dangerous, because these locations are often in > libraries or other places where the code creating and modifying the > dictionary has no idea it's user data in it. I don't consider that established for the specific case of string-like objects. Users can easily determine whether they use string-like objects, and if so, in what places, and what data gets put into them. > The proposed AVL solution fails if it requires users to fundamentally > restructure their data depending on it's origin. It doesn't fail at all. User don't have to restructure their code, let alone fundamentally. Their code may currently be vulnerable, yet not use string-like objects at all. With the proposed solution, such code will be fixed for good. It's true that the solution does not fix all cases of the vulnerability, but neither does any other proposed solution. > We have solution that is known to work in all cases: hash randomization. Well, you believe that it fixes the problem, even though it actually may not, assuming an attacker can somehow reproduce the hash function. > There were three discussed issues with it: > > a) Code assuming a stable ordering to dictionaries > b) Code assuming hashes were stable across runs. > c) Code reimplementing the hashing algorithm of a core datatype that is now > randomized. > > I don't think any of these are realistic issues I'm fairly certain that code will break in massive ways, despite any argumentation that it should not. The question really is Do we break code in a massive way, or do we fix the vulnerability for most users with no code breakage? I clearly value compatibility much higher than 100% protection against a DoS-style attack (which has many other forms of protecting against available also). > (a) was never a documented, or intended property, indeed it > breaks all the time, if you insert keys in the wrong order, use a different > platform, or anything else can change. Still, a lot of code relies on dictionary order, and successfully so, in practice. Practicality beats purity. > (b) For the same reasons code > relying on (b) only worked if you didn't change anything That's not true. You cannot practically change the way string hashing works other than by changing the interpreter source. Hashes are currently stable across runs. > and in practice I'm convinced neither of these were common (if ever existed). Are you willing to bet the trust people have in Python's bug fix policies on that? I'm not. > In summary, I think the case against hash-randomization has been seriously > overstated, and in no way is more dangerous than having a solution that > fails to solve the problem comprehensively. Further, I think it is > imperative that we reach a consensus on this quickly Well, I cannot be part of a consensus that involves massive code breakage in a bug fix release. Lacking consensus, either the release managers or the BDFL will have to pronounce.

> I'm sorry then, but I'm a little confused.  I think we pretty clearly
> established earlier that requiring users to make changes anywhere they
> stored user data would be dangerous, because these locations are often in
> libraries or other places where the code creating and modifying the
> dictionary has no idea it's user data in it.

I don't consider that established for the specific case of string-like
objects. Users can easily determine whether they use string-like objects,
and if so, in what places, and what data gets put into them.

> The proposed AVL solution fails if it requires users to fundamentally
> restructure their data depending on it's origin.

It doesn't fail at all. User don't *have* to restructure their code,
let alone fundamentally. Their code may currently be vulnerable, yet
not use string-like objects at all. With the proposed solution, such
code will be fixed for good.

It's true that the solution does not fix all cases of the vulnerability,
but neither does any other proposed solution.

> We have solution that is known to work in all cases: hash randomization.

Well, you *believe* that it fixes the problem, even though it actually
may not, assuming an attacker can somehow reproduce the hash function.

>  There were three discussed issues with it:
>
> a) Code assuming a stable ordering to dictionaries
> b) Code assuming hashes were stable across runs.
> c) Code reimplementing the hashing algorithm of a core datatype that is now
> randomized.
>
> I don't think any of these are realistic issues

I'm fairly certain that code will break in massive ways, despite any
argumentation that it should not. The question really is

Do we break code in a massive way, or do we fix the vulnerability
for most users with no code breakage?

I clearly value compatibility much higher than 100% protection against
a DoS-style attack (which has many other forms of protecting against
available also).

> (a) was never a documented, or intended property, indeed it
> breaks all the time, if you insert keys in the wrong order, use a different
> platform, or anything else can change.

Still, a lot of code relies on dictionary order, and successfully so,
in practice. Practicality beats purity.

> (b) For the same reasons code
> relying on (b) only worked if you didn't change anything

That's not true. You cannot practically change the way string hashing works
other than by changing the interpreter source. Hashes *are* currently stable
across runs.

> and in practice I'm convinced neither of these were common (if ever existed).

Are you willing to bet the trust people have in Python's bug fix policies
on that? I'm not.

> In summary, I think the case against hash-randomization has been seriously
> overstated, and in no way is more dangerous than having a solution that
> fails to solve the problem comprehensively.  Further, I think it is
> imperative that we reach a consensus on this quickly

Well, I cannot be part of a consensus that involves massive code breakage
in a bug fix release. Lacking consensus, either the release managers or
the BDFL will have to pronounce.

History
Date	User	Action	Args
2012-01-26 23:43:51	loewis	set	recipients: + loewis, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-26 23:43:50	loewis	link	issue13703 messages
2012-01-26 23:43:49	loewis	create