Message 249446 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	njs
Recipients	Mark.Shannon, benjamin.peterson, larry, lemburg, njs, pitrou, serhiy.storchaka
Date	2015-09-01.02:38:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1441075127.27.0.543356626649.issue24912@psf.upfronthosting.co.za>
In-reply-to

Content
Mark Shannon wrote: > So, just make sure that you insert the new object into sys.modules before doing any imports or calls to code that could import your module and it will all work fine. The problem with doing this is that you're now stuck managing two diverging namespaces: the one associated with your new object that other modules can see, and the one where your __init__.py code is doing all those imports and calls. So if you want this to work then you have to totally rewrite your package's startup sequence, OR you have to insert some code like sys.modules[__name__].__dict__.update(orig_module.__dict__) after every line of your __init__.py, OR you have to do some really complicated global analysis of every module inside your package to figure out exactly what the state of these two namespaces is at each possible point during the startup sequence and prove that the divergences don't matter... The key feature of the metamodule approach is that sys.modules["modname"].__dict__ is always the same object as your __init__.py globals(), so there's no change of divergence and it can guarantee that merely enabling metamodule for an existing package will always be safe and have no behavioural effect (until you start using the new metamodule features). This guarantee is hugely important given that the first user will probably be numpy, which is a giant crufty package with millions of users. I promise, we went over all of this on python-dev last year :-) Mark Lemburg wrote: > Python code will generally assume that it can trust > builtin types. It doesn't expect 42 + 2 to clear out the root dir, > just because some package installed from PyPI happens to feel in the > mood for Easter eggs :-) The only reason that'd be possible though is because you went and ran some untrusted code with permissions allowing it to clear out the root dir -- the only way to set up this "exploit" is to run untrusted Python code. Basically you've handed someone a gun, and now you're worried because this patch gives them a new and particularly rube-goldbergian method for pulling the trigger... Except it isn't even a new method; your nasty PyPI package can trivially implement this "easter egg" using only fully-supported features from the stdlib, in any version of Python since 2.5. Here's some nice portable code to do __class__ assignment while dodging all the checks in object_set_class: from ctypes import * def set_class(obj, new_class): ob_type_offset = object.__basicsize__ - sizeof(c_void_p) c_void_p.from_address(id(obj) + ob_type_offset).value = id(new_class) I mean, obviously ctypes is nasty and breaks the rules, I'm not saying this justifies making __class__ assignment broken as well. But this bug is no more a security problem than the existence of ctypes is. Larry Hasting wrote: > Consider for a moment Google App Engine. If GAE updated to 3.5 with this bug, users would now have the ability to inject code into other people's programs, because interned ints (and a couple other types) are shared across interpreters. Okay, fair enough :-). On GAE this would be a security bug because GAE I guess runs an extensively modified and audited fork of Python that implements a full sandbox. I assume this is also why it took them ~2 years to upgrade to 2.7, and why they're shipping 3 year old versions of all their libraries, and why they're now starting to move people to a new setup using OS-level sandboxing instead of interpreter-level sandboxing... Python.org doesn't provide any sandbox guarantees, and this bug is a tiny drop in the bucket compared to what anyone will need to do to add a trustworthy sandbox to CPython 3.5, so for me I still wouldn't call this release critical. But you're the RM, so here's a patch if you want it :-). Serhiy Storchaka wrote; > I'm not sure that allowing __class__ assignment for larger domain of types is desirable. If we will desire that it is not, any enhancements to __class__ assignment should be withdrawn. May be __class__ assignment should be discouraged, deprecated and then disabled for all classes (in 3.6+), and other ways should be proposed to solve problems that are solved with __class__ assignment. I don't necessarily object to the idea of eventually removing __class__ assignment in some future version of Python. It kind of freaks me out too. (Though Guido seems to like it.) I really, really, REALLY object to the idea of -- at this point in the release cycle! -- rejecting a new feature that has gone through review on python-dev, that solves a real problem that's impacting a bunch of people (see all the replies on the numpy-discussion thread linked above of people saying "oh ugh yes this has totally bitten me! please fix it!"), and to do this on the grounds that someone might later make an argument for it be removed again in 3.7 and that python-dev might eventually agree with that argument? I mean, c'mon. If it were breaking everything, then that would be grounds for removing it, no question there. But the problems described in this bug report are well understood, and it's trivial to fix them in a conservative way without backing out the original feature.

Mark Shannon wrote:
> So, just make sure that you insert the new object into sys.modules *before* doing any imports or calls to code that could import your module and it will all work fine.

The problem with doing this is that you're now stuck managing two diverging namespaces: the one associated with your new object that other modules can see, and the one where your __init__.py code is doing all those imports and calls. So if you want this to work then you have to totally rewrite your package's startup sequence, OR you have to insert some code like
  sys.modules[__name__].__dict__.update(orig_module.__dict__)
after *every line* of your __init__.py, OR you have to do some really complicated global analysis of every module inside your package to figure out exactly what the state of these two namespaces is at each possible point during the startup sequence and prove that the divergences don't matter...

The key feature of the metamodule approach is that sys.modules["modname"].__dict__ is always the same object as your __init__.py globals(), so there's no change of divergence and it can guarantee that merely enabling metamodule for an existing package will always be safe and have no behavioural effect (until you start using the new metamodule features). This guarantee is hugely important given that the first user will probably be numpy, which is a giant crufty package with millions of users.

I promise, we went over all of this on python-dev last year :-)

Mark Lemburg wrote:
> Python code will generally assume that it can trust
> builtin types. It doesn't expect 42 + 2 to clear out the root dir,
> just because some package installed from PyPI happens to feel in the
> mood for Easter eggs :-)

The only reason that'd be possible though is because you went and ran some untrusted code with permissions allowing it to clear out the root dir -- the only way to set up this "exploit" is to run untrusted Python code. Basically you've handed someone a gun, and now you're worried because this patch gives them a new and particularly rube-goldbergian method for pulling the trigger...

Except it isn't even a new method; your nasty PyPI package can trivially implement this "easter egg" using only fully-supported features from the stdlib, in any version of Python since 2.5. Here's some nice portable code to do __class__ assignment while dodging *all* the checks in object_set_class:

  from ctypes import *
  def set_class(obj, new_class):
      ob_type_offset = object.__basicsize__ - sizeof(c_void_p)
      c_void_p.from_address(id(obj) + ob_type_offset).value = id(new_class)

I mean, obviously ctypes is nasty and breaks the rules, I'm not saying this justifies making __class__ assignment broken as well. But this bug is no more a *security* problem than the existence of ctypes is.

Larry Hasting wrote:
> Consider for a moment Google App Engine. If GAE updated to 3.5 with this bug, users would now have the ability to inject code into other people's programs, because interned ints (and a couple other types) are shared across interpreters.

Okay, fair enough :-). On GAE this *would* be a security bug because GAE I guess runs an extensively modified and audited fork of Python that implements a full sandbox. I assume this is also why it took them ~2 years to upgrade to 2.7, and why they're shipping 3 year old versions of all their libraries, and why they're now starting to move people to a new setup using OS-level sandboxing instead of interpreter-level sandboxing...

Python.org doesn't provide any sandbox guarantees, and this bug is a tiny drop in the bucket compared to what anyone will need to do to add a trustworthy sandbox to CPython 3.5, so for me I still wouldn't call this release critical. But you're the RM, so here's a patch if you want it :-).

Serhiy Storchaka wrote;
> I'm not sure that allowing __class__ assignment for larger domain of types is desirable. If we will desire that it is not, any enhancements to __class__ assignment should be withdrawn. May be __class__ assignment should be discouraged, deprecated and then disabled for all classes (in 3.6+), and other ways should be proposed to solve problems that are solved with __class__ assignment.

I don't necessarily object to the idea of eventually removing __class__ assignment in some future version of Python. It kind of freaks me out too. (Though Guido seems to like it.)

I really, really, REALLY object to the idea of -- at this point in the release cycle! -- rejecting a new feature that has gone through review on python-dev, that solves a real problem that's impacting a bunch of people (see all the replies on the numpy-discussion thread linked above of people saying "oh ugh yes this has totally bitten me! please fix it!"), and to do this on the grounds that someone *might* later make an argument for it be removed again in 3.7 and that python-dev might eventually agree with that argument? I mean, c'mon.

If it were breaking everything, then that would be grounds for removing it, no question there. But the problems described in this bug report are well understood, and it's trivial to fix them in a conservative way without backing out the original feature.

History
Date	User	Action	Args
2015-09-01 02:38:47	njs	set	recipients: + njs, lemburg, pitrou, larry, benjamin.peterson, Mark.Shannon, serhiy.storchaka
2015-09-01 02:38:47	njs	set	messageid: <1441075127.27.0.543356626649.issue24912@psf.upfronthosting.co.za>
2015-09-01 02:38:47	njs	link	issue24912 messages
2015-09-01 02:38:44	njs	create