Message 150840 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	PaulMcMillan
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, gz, jcea, lemburg, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner
Date	2012-01-08.02:40:39
SpamBayes Score	1.0523804e-12
Marked as misclassified	No
Message-id	<CAO_YWRUVxD8Sw148ezEfxqV2b-0hhMqYDC8MdjU3pAjh0NxK2w@mail.gmail.com>
In-reply-to	<1325982780.36.0.940971927339.issue13703@psf.upfronthosting.co.za>

Content
> Alex, I agree the issue has to do with the origin of the data, but the modules listed are the ones that deal with the data supplied by this particular attack. They deal directly with the data. Do any of them pass the data further, or does the data stop with them? A short and very incomplete list of vulnerable standard lib modules includes: every single parsing library (json, xml, html, plus all the third party libraries that do that), all of numpy (because it processes data which probably came from a user [yes, integers can trigger the vulnerability]), difflib, the math module, most database adaptors, anything that parses metadata (including commonly used third party libs like PIL), the tarfile lib along with other compressed format handlers, the csv module, robotparser, plistlib, argparse, pretty much everything under the heading of "18. Internet Data Handling" (email, mailbox, mimetypes, etc.), "19. Structured Markup Processing Tools", "20. Internet Protocols and Support", "21. Multimedia Services", "22. Internationalization", TKinter, and all the os calls that handle filenames. The list is impossibly large, even if we completely ignore user code. This MUST be fixed at a language level. I challenge you to find me 15 standard lib components that are certain to never handle user-controlled input. > Note that changing the hash algorithm for a persistent process, even though each process may have a different seed or randomized source, allows attacks for the life of that process, if an attack vector can be created during its lifetime. This is not a problem for systems where each request is handled by a different process, but is a problem for systems where processes are long-running and handle many requests. This point has been made many times now. I urge you to read the entire thread on the mailing list. Your implementation is impractical because your "safe" implementation completely ignores all hash caching (each entry must be re-hashed for that dict). Your implementation is still vulnerable in exactly the way you mentioned if you ever have any kind of long-lived dict in your program thread. > You have entered the class of people that claim lots of vulnerabilities, without enumeration. I have enumerated. Stop making this argument.

> Alex, I agree the issue has to do with the origin of the data, but the modules listed are the ones that deal with the data supplied by this particular attack.

They deal directly with the data. Do any of them pass the data
further, or does the data stop with them? A short and very incomplete
list of vulnerable standard lib modules includes: every single parsing
library (json, xml, html, plus all the third party libraries that do
that), all of numpy (because it processes data which probably came
from a user [yes, integers can trigger the vulnerability]), difflib,
the math module, most database adaptors, anything that parses metadata
(including commonly used third party libs like PIL), the tarfile lib
along with other compressed format handlers, the csv module,
robotparser, plistlib, argparse, pretty much everything under the
heading of "18. Internet Data Handling" (email, mailbox, mimetypes,
etc.), "19. Structured Markup Processing Tools", "20. Internet
Protocols and Support", "21. Multimedia Services", "22.
Internationalization", TKinter, and all the os calls that handle
filenames. The list is impossibly large, even if we completely ignore
user code. This MUST be fixed at a language level.

I challenge you to find me 15 standard lib components that are certain
to never handle user-controlled input.

> Note that changing the hash algorithm for a persistent process, even though each process may have a different seed or randomized source, allows attacks for the life of that process, if an attack vector can be created during its lifetime. This is not a problem for systems where each request is handled by a different process, but is a problem for systems where processes are long-running and handle many requests.

This point has been made many times now. I urge you to read the entire
thread on the mailing list. Your implementation is impractical because
your "safe" implementation completely ignores all hash caching (each
entry must be re-hashed for that dict). Your implementation is still
vulnerable in exactly the way you mentioned if you ever have any kind
of long-lived dict in your program thread.

> You have entered the class of people that claim lots of vulnerabilities, without enumeration.

I have enumerated. Stop making this argument.

History
Date	User	Action	Args
2012-01-08 02:40:42	PaulMcMillan	set	recipients: + PaulMcMillan, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, skrah, dmalcolm, gz, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala
2012-01-08 02:40:41	PaulMcMillan	link	issue13703 messages
2012-01-08 02:40:39	PaulMcMillan	create