Author lemburg
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, haypo, jcea, lemburg, mark.dickinson, merwok, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, zbysz
Date 2012-01-23.13:38:25
SpamBayes Score 0.0
Marked as misclassified No
Message-id <4F1D62CD.4000408@egenix.com>
In-reply-to <CAFRnB2Webjv+mEHt9UrQOmVq82+NHsDW3zuyLM-rnqVxhD0LJQ@mail.gmail.com>
Content
Alex Gaynor wrote:
> I'm able to put N pieces of data into the database on successive requests,
> but then *rendering* that data puts it in a dictionary, which renders that
> page unviewable by anyone.

I think you're asking a bit much here :-) A broken app is a broken
app, no matter how nice Python tries to work around it. If an
app puts too much trust into user data, it will be vulnerable
one way or another and regardless of how the user data enters
the app.

These are the collision counting possibilities we've discussed
so far:

With an collision counting exception you'd get a clear notice that
something in your data and your application is wrong and needs
fixing. The rest of your web app will continue to work fine and
you won't run into a DoS problem taking down all of your web
server.

With the proposed enhancement of collision counting + universal hash
function for Python 3.3, you'd get a warning printed to the logs, the
dict implementation would self-heal and your page is viewable nonetheless.
The admin would then see the log entry and get a chance to fix the
problem.

Note: Even if Python works around the problem successfully, there's no
guarantee that the data doesn't end up being processed by some other
tool in the chain with similar problems. All this is a work-around
for an application bug, nothing more. Silencing the problem
by e.g. using randomization in the string hash algorithm
doesn't really help in identifying the bug.

Overall, I don't think we should make Python's hash function
non-deterministic. Even with the universal hash function idea,
the dict implementation should use a predefined way of determining
the next hash parameter to use, so that running the application
twice against attack data will still result in the same data
output.
History
Date User Action Args
2012-01-23 13:38:26lemburgsetrecipients: + lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, haypo, christian.heimes, benjamin.peterson, merwok, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-23 13:38:26lemburglinkissue13703 messages
2012-01-23 13:38:25lemburgcreate