Message 151813 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-23.13:38:25
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<4F1D62CD.4000408@egenix.com>
In-reply-to	<CAFRnB2Webjv+mEHt9UrQOmVq82+NHsDW3zuyLM-rnqVxhD0LJQ@mail.gmail.com>

Content
Alex Gaynor wrote: > I'm able to put N pieces of data into the database on successive requests, > but then rendering that data puts it in a dictionary, which renders that > page unviewable by anyone. I think you're asking a bit much here :-) A broken app is a broken app, no matter how nice Python tries to work around it. If an app puts too much trust into user data, it will be vulnerable one way or another and regardless of how the user data enters the app. These are the collision counting possibilities we've discussed so far: With an collision counting exception you'd get a clear notice that something in your data and your application is wrong and needs fixing. The rest of your web app will continue to work fine and you won't run into a DoS problem taking down all of your web server. With the proposed enhancement of collision counting + universal hash function for Python 3.3, you'd get a warning printed to the logs, the dict implementation would self-heal and your page is viewable nonetheless. The admin would then see the log entry and get a chance to fix the problem. Note: Even if Python works around the problem successfully, there's no guarantee that the data doesn't end up being processed by some other tool in the chain with similar problems. All this is a work-around for an application bug, nothing more. Silencing the problem by e.g. using randomization in the string hash algorithm doesn't really help in identifying the bug. Overall, I don't think we should make Python's hash function non-deterministic. Even with the universal hash function idea, the dict implementation should use a predefined way of determining the next hash parameter to use, so that running the application twice against attack data will still result in the same data output.

Alex Gaynor wrote:
> I'm able to put N pieces of data into the database on successive requests,
> but then *rendering* that data puts it in a dictionary, which renders that
> page unviewable by anyone.

I think you're asking a bit much here :-) A broken app is a broken
app, no matter how nice Python tries to work around it. If an
app puts too much trust into user data, it will be vulnerable
one way or another and regardless of how the user data enters
the app.

These are the collision counting possibilities we've discussed
so far:

With an collision counting exception you'd get a clear notice that
something in your data and your application is wrong and needs
fixing. The rest of your web app will continue to work fine and
you won't run into a DoS problem taking down all of your web
server.

With the proposed enhancement of collision counting + universal hash
function for Python 3.3, you'd get a warning printed to the logs, the
dict implementation would self-heal and your page is viewable nonetheless.
The admin would then see the log entry and get a chance to fix the
problem.

Note: Even if Python works around the problem successfully, there's no
guarantee that the data doesn't end up being processed by some other
tool in the chain with similar problems. All this is a work-around
for an application bug, nothing more. Silencing the problem
by e.g. using randomization in the string hash algorithm
doesn't really help in identifying the bug.

Overall, I don't think we should make Python's hash function
non-deterministic. Even with the universal hash function idea,
the dict implementation should use a predefined way of determining
the next hash parameter to use, so that running the application
twice against attack data will still result in the same data
output.

History
Date	User	Action	Args
2012-01-23 13:38:26	lemburg	set	recipients: + lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-23 13:38:26	lemburg	link	issue13703 messages
2012-01-23 13:38:25	lemburg	create