Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The type of cached objects is mutable #69100

Closed
serhiy-storchaka opened this issue Aug 21, 2015 · 69 comments
Closed

The type of cached objects is mutable #69100

serhiy-storchaka opened this issue Aug 21, 2015 · 69 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 24912
Nosy @malemburg, @gvanrossum, @brettcannon, @rhettinger, @ncoghlan, @pitrou, @vstinner, @larryhastings, @benjaminp, @ned-deily, @njsmith, @markshannon, @eltoder, @ericsnowcurrently, @serhiy-storchaka, @erlend-aasland
PRs
  • bpo-43973: Use Py_TPFLAGS_IMMUTABLETYPE to check for class assignments #25714
  • Files
  • issue24912.patch
  • issue24912-v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/larryhastings'
    closed_at = <Date 2015-09-06.07:53:19.607>
    created_at = <Date 2015-08-21.22:02:50.644>
    labels = ['interpreter-core', 'type-bug', 'release-blocker']
    title = 'The type of cached objects is mutable'
    updated_at = <Date 2021-04-30.11:27:56.656>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2021-04-30.11:27:56.656>
    actor = 'erlendaasland'
    assignee = 'larry'
    closed = True
    closed_date = <Date 2015-09-06.07:53:19.607>
    closer = 'larry'
    components = ['Interpreter Core']
    creation = <Date 2015-08-21.22:02:50.644>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['40310', '40312']
    hgrepos = []
    issue_num = 24912
    keywords = ['patch']
    message_count = 69.0
    messages = ['248981', '248982', '248983', '249019', '249180', '249189', '249198', '249345', '249350', '249351', '249354', '249357', '249360', '249363', '249369', '249388', '249392', '249393', '249396', '249397', '249398', '249438', '249446', '249450', '249455', '249456', '249457', '249467', '249468', '249470', '249471', '249473', '249483', '249484', '249489', '249495', '249499', '249500', '249502', '249504', '249505', '249506', '249509', '249510', '249511', '249604', '249607', '249618', '249623', '249685', '249854', '249869', '249871', '249888', '249889', '249890', '249891', '249893', '249930', '249931', '249934', '249938', '249940', '249943', '249944', '249948', '249954', '249979', '392411']
    nosy_count = 18.0
    nosy_names = ['lemburg', 'gvanrossum', 'brett.cannon', 'rhettinger', 'ncoghlan', 'pitrou', 'vstinner', 'larry', 'benjamin.peterson', 'ned.deily', 'Arfrever', 'njs', 'Mark.Shannon', 'python-dev', 'eltoder', 'eric.snow', 'serhiy.storchaka', 'erlendaasland']
    pr_nums = ['25714']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24912'
    versions = ['Python 3.5', 'Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    The type of non-heap types can be changed in 3.5. This means that the type of cached immutable objects such as small ints, empty or 1-character strings, etc can be changed.

    >>> class I(int):
    ...     __slots__ = ()
    ...     def __repr__(self):
    ...         return 'Answer to The Ultimate Question of Life, the Universe, and Everything'
    ...     def __add__(self, other):
    ...         return self * other
    ... 
    >>> (42).__class__ = I
    >>> ord('*')
    Answer to The Ultimate Question of Life, the Universe, and Everything
    >>> x = 42; x + 2
    84
    >>> class S(str):
    ...     __slots__ = ()
    ...     def __len__(self):
    ...         return 123
    ... 
    >>> 'a'.__class__ = S
    >>> i = 1; len('abc'[:i])
    123

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Aug 21, 2015
    @pitrou
    Copy link
    Member

    pitrou commented Aug 21, 2015

    Isn't there already an issue open for this?

    @serhiy-storchaka
    Copy link
    Member Author

    I don't remember. I noticed this once in comments on Rietveld, but don't remember the issue.

    @markshannon
    Copy link
    Member

    If there is another issue for this, then it doesn't seem to be a release blocker. I think it should be.

    @larryhastings
    Copy link
    Contributor

    As Python 3.5 Release Manager, my official statement is: Eek!

    @markshannon
    Copy link
    Member

    Eek! indeed :)

    I intend to track down the cause of this on Monday, and hopefully come up with a fix, unless anyone beats me to it.

    Does anyone know if there is another issue for this?

    @larryhastings
    Copy link
    Contributor

    This exciting new feature was added in checkin c0d25de5919e addressing issue bpo-22986. Perhaps the core devs who added it would like to chime in.

    @njsmith
    Copy link
    Contributor

    njsmith commented Aug 29, 2015

    Well, yeah, that indeed sucks.

    Not sure what the best solution is. Some options:

    1. "Don't do that then"

    2. Explicitly add a "__class__" property to every immutable type, that unconditionally errors out on assignment.

    3. Add a hack to typeobject.c checking for the important immutable types

    4. Something cleverer...? A new type flag?

    The immutable types are: int, float, str, tuple, bool, frozenset, complex, bytes, and... anything else?

    @serhiy-storchaka
    Copy link
    Member Author

    Later I had got a crash.

    >>> class S(str): __slots__ = ()
    ... 
    >>> 'a'.__class__ = S
    >>> 
    >>> def f(a): pass
    ... 
    Fatal Python error: non-string found in code slot

    Current thread 0xb7583700 (most recent call first):
    Aborted (core dumped)

    The stdlib is full of implicit caches. Virtually any hashable object can be cached and shared. Why __class__ assignment is allowed at all? There are only two uses of __class__ assignment in the stdlib besides tests (in Lib/importlib/util.py and in Lib/xml/sax/saxutils.py), and in both cases it looks as optimization trick.

    @benjaminp
    Copy link
    Contributor

    Probably the patch on that bug should be reverted.

    @njsmith
    Copy link
    Contributor

    njsmith commented Aug 30, 2015

    Python goes to great lengths to make __class__ assignment work in general (I was surprised too). Historically the exception has been that instances of statically allocated C extension type objects could not have their __class__ reassigned. Semantically, this is totally arbitrary, and Guido even suggested that fixing this would be a good idea in this thread:
    https://mail.python.org/pipermail/python-dev/2014-December/137430.html
    The actual rule that's causing problems here is that *immutable* objects should not allow __class__ reassignment. (And specifically, objects which are assumed to be immutable by the interpreter. Most semantically immutable types in Python are defined by users and their immutability falls under the "consenting adults" rule; it's not enforced by the interpreter. Also this is very different from "hashable" -- even immutability isn't a requirement for being hashable, e.g., all user-defined types are mutable and hashable by default!)

    By accident this category of "immutable-by-interpreter-invariant" has tended to be a subset of "defined in C using the old class definition API", so fixing the one issue uncovered the other.

    This goal that motivated this patch was getting __class__ assignment to work on modules, which are mutable and uncacheable and totally safe. With this patch it becomes possible to e.g. issue deprecation warnings on module attribute access, which lets us finally deprecate some horrible global constants in numpy that have been confusing users for a decade now:
    http://mail.scipy.org/pipermail/numpy-discussion/2015-July/073205.html
    https://pypi.python.org/pypi/metamodule

    I'd *really* prefer not to revert this totally, given the above. (Also, if you read that python-dev message above, you'll notice that the patch also caught and fixed a different really obscure interpreter-crashing bug.)

    I think the Correct solution would be to disallow __class__ assignment specifically for the classes where it violates invariants.

    If this is considered to be release critical -- and I'm not entirely sure it is, given that similar tricks have long been possible and are even referenced in the official docs?
    https://www.reddit.com/r/Python/comments/2441cv/can_you_change_the_value_of_1/ch3dwxt
    https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong
    -- but if it is considered to be release critical, and it's considered too short notice to accomplish a proper fix, then a simple hack would be to add something like

    if (!(oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) && oldto != &PyModule_Type) {
        PyErr_Format(PyExc_TypeError, ...);
        return -1;
    }

    to typeobject.c:object_set_class. As compared to reverting the whole patch, this would preserve the most important case, which is one that we know is safe, and we could then progressively relax the check further in the future...

    @markshannon
    Copy link
    Member

    Please revert c0d25de5919e.
    Breaking the interpreter in order to facilitate some obscure use case is unacceptable.

    If you want to do fancy stuff with modules, fine. Take a look at the source of the six module
    https://bitbucket.org/gutworth/six/src/cd1e81d33eaf3d6944f859c2aa7d5d3f515013c8/six.py?at=default for some tips.

    I think immutability is a fundamental property of an object. The "consenting" adults ideas is fine for accessibility. However, making previously immutable object mutable forces developers to use lots of defensive copying and causes obscure bugs (like this one).

    I do not regard the immutable of numbers as an historical accident, but as vitally important for any sort of numerical reasoning.
    Just take a look at a language with mutable strings to see the problems that causes.

    @njsmith
    Copy link
    Contributor

    njsmith commented Aug 30, 2015

    Wow, Mark, why so hostile? What's so wrong with my proposed less-intrusive patch that you feel the need to call for reversion while condescendingly pointing me to a non-solution? Of course I know about six. There was a whole python-dev thread about how neither its tricks nor any of the other tricks that python 3.4 allows actually do what we need.

    Let me try again.

    We all agree that this bug is a bug and that numbers should be immutable.

    Our old rules for __class__ assignment were also buggy; it was just an accident that they were buggy in a way that happened to prevent a different bug, ie this one. The proper way to enforce the immutability of immutable builtins is to enforce the immutability of immutable builtins, not to enforce the immutability of a bunch of random types based on an implementation detail (that happens to include the immutable builtins). Reverting the patch gives us the latter, which is why I don't think it's the proper fix.

    Now maybe we don't have time for a proper fix. I'm not sure why not -- AFAICT there would not be any disaster if this fix waited for 3.5.1. This is a scary looking bug, but the effect is that it takes something that used to require 3 obscure lines involving ctypes and makes it into 1 obscure line not involving ctypes. Which is bad. But we're hardly going to see an epidemic of people using this loophole to change the type of 1 and then complaining to python-dev that they changed the type of 1, any more than we saw such an epidemic when ctypes was introduced. So we have time to take a deep breath and come up with a proper fix, is all I'm saying. But of course this is Larry's call.

    If it is crucial to get a fix in for 3.5.0, then the least intrusive solution is not to revert the patch wholesale, but rather to add a (hacky but safe) guard to object_set_class. The guard I suggested above is stricter than necessary for correctness, but it catches all the problems described in this bug report, and the cases where it's over-strict are all cases where 3.4 was also over-strict, so there's minimal chance of it causing any regressions.

    Like I said, I'm not sure that's what we want to do. But if it is then I can throw a patch together in a few minutes.

    @markshannon
    Copy link
    Member

    Nathaniel, I'm hostile to this patch remaining in the code base.
    I'm not hostile to you, sorry that I came across as that way.

    The proper way to deal with issues like this is to revert the change and then create a new patch, otherwise it becomes impossible to revert the change if other problems emerge.

    I agree that the bug in __class__ assignment should be fixed, but such a fix should be separate from adding any new features.

    Also, I'm surprised that you assert that you can't do what your metamodule does, without ctypes.
    Your metamodule package is almost there.

    Your statement
    "Depending on the relative order of the assignment to sys.modules and imports of submodules, you can end up with different pieces of code in the same program thinking that mymodule refers to one or the other of these objects."
    is true.

    So, just make sure that you insert the new object into sys.modules *before* doing any imports or calls to code that could import your module and it will all work fine.

    @larryhastings
    Copy link
    Contributor

    I will not ship 3.5.0 with this bug.

    Consider for a moment Google App Engine. If GAE updated to 3.5 with this bug, users would now have the ability to inject code into other people's programs, because interned ints (and a couple other types) are shared across interpreters.

    Reverting the patch gives us back the old behavior. Dismissing the old behavior as "buggy" is unconvincing; it was acceptable enough that it shipped with many versions of Python, and its behavior is predictable and within the boundaries of the language spec.

    Nathaniel, I'm willing to consider fixes for this bug, if the other devs in this thread are satisfied with the fix. But I am *absolutely* leaning towards backing it out for 3.5. My goal is to ship high-quality software, and that means balancing new features against regressions and new exploits. We almost didn't find this in time before 3.5 shipped--not to spread FUD, but what other ramifications of this code are lurking in the object model, waiting to be discovered?

    p.s. I would love it if someone would add a regression test that tried mutating fields on a bunch of interned objects. Certainly such a test would be a precondition of keeping this change.

    @malemburg
    Copy link
    Member

    I agree with Mark. This feature opens up a security hole large enough to drive a train through.

    Looking at the python-dev thread, the only motivation appears to be making module look more like classes. I'd suggest to propose a PEP for making changes to module objects rather than creating a backdoor which let's you implement those changes at the expense of putting the whole interpreter at risk.

    IMO, .__class__ of static types should not be mutable. I can understand why heap types need this feature (to e.g. be able to copy objects without invoking any .__init__() methods of unknown objects as is needed for unpickle style operations), but for static types the set of supported objects is limited and the consequences of calling their constructor is known.

    @njsmith
    Copy link
    Contributor

    njsmith commented Aug 31, 2015

    I need to get to bed so I'll finish up tomorrow, but FYI I have a working patch -- I just want to add some __bases__ assignment test cases to make Larry happy. (Apparently there are no test cases for __bases__ assignment at all currently... :-(.)

    Before anyone panics about security issues, do keep in mind that the patch you're talking about reverting fixed a buffer overflow which I strongly suspect could be used to accomplish arbitrary code execution. This is not a big deal, because all it does it let you turn the ability to execute arbitrary Python code into the ability to execute arbitrary machine code. If this were the JVM then this would be a big deal, but for CPython it isn't -- there are many "vulnerabilities" like this that are included in CPython by default as features, because CPython does not even attempt to provide a secure sandbox. The bug described in the current issue is bad, but security-wise AFAIK it's less bad than arbitrary code execution: it lets you mess with code in other subinterpreters (which is already possible through other means), and it lets you trigger assert checks that abort the interpreter, but AFAICT it doesn't violate memory safety or allow arbitrary code execution.

    @malemburg
    Copy link
    Member

    On 31.08.2015 10:44, Nathaniel Smith wrote:

    Before anyone panics about security issues, do keep in mind that the patch you're talking about
    reverting fixed a buffer overflow which I strongly suspect could be used to accomplish arbitrary
    code execution.
    ... it lets you trigger assert checks that abort the interpreter, but AFAICT it doesn't violate memory safety or allow arbitrary code execution.

    I'm sure a buffer overflow can be fixed in other ways than allowing
    42 to print out the Zen of Python when asked for a repr() ;-)

    And if Serhiy can sneak in an os.system('rm -rf /') into a harmless
    operation such as 42 + 2, I do believe we can call this arbitrary
    code execution, even more so, since the patch only applies to a single
    integer object which happens to be a singleton in CPython.

    The point is: Python code will generally assume that it can trust
    builtin types. It doesn't expect 42 + 2 to clear out the root dir,
    just because some package installed from PyPI happens to feel in the
    mood for Easter eggs :-)

    @serhiy-storchaka
    Copy link
    Member Author

    I agree with Nathaniel, that this bug is not so critical to be release blocker. While it definitely should be fixed, it may wait for 3.5.1. Bug reproducing is not data driven, it needs executing special Python code, and when arbitrary Python code execution is available, there are a lot of other way to crash or compromise the interpreter. But I'm not sure that allowing __class__ assignment for larger domain of types is desirable. If we will desire that it is not, any enhancements to __class__ assignment should be withdrawn. May be __class__ assignment should be discouraged, deprecated and then disabled for all classes (in 3.6+), and other ways should be proposed to solve problems that are solved with __class__ assignment.

    Nathaniel, can you provide a patch, that keeps the fix of a buffer overflow, but withdraws the ability to assign __class__ in cases that were disabled before?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 31, 2015

    __class__ assignment can definitely be useful for monkey-patching, or other, purposes.

    @malemburg
    Copy link
    Member

    On 31.08.2015 11:55, Antoine Pitrou wrote:

    __class__ assignment can definitely be useful for monkey-patching, or other, purposes.

    The feature certainly has its place for user defined types (see the
    unpickle example), but I don't think we should allow this for
    static types.

    @njsmith
    Copy link
    Contributor

    njsmith commented Aug 31, 2015

    Actually, I fail at grep, and actually there are already tons of good tests for __bases__ assignment in test_descr.py. So, patch attached, and analysis follows.

    Background on __class__ assignment
    ----------------------------------

    While it's certainly surprising, assignment to obj.__class__ and to type.__bases__ has been an explicitly supported feature ever since the advent of new-style classes back in 2.ancient, and typeobject.c goes to great lengths to handle all the weird resulting cases (including stuff like "what if __del__ changes the __class__", "what if someone mutates __bases__ while we are in the middle of method lookup inside a metaclass's mro() method", etc.).

    To make this safe, of course, we need to somehow check that after making the assignment then the resulting object will still be a valid, self-consistent Python object. The types in question need to have consistent in-memory representations, that their __dict__ slot (if present) is at the same offset, that their memory management callbacks are consistent (if we mutate an object from type A to type B, then A and B must be written in such a way that it's legal to pass an object allocated with A.tp_new to B.tp_dealloc), etc.

    This compatibility checking occurs in two places: object_set_class and type_set_bases (and several utility functions that they share).

    For historical reasons, there are two different C representations for type objects in Python: the original representation, which involves statically-allocated structs, and the newer representation, which involves heap-allocated structs. Aside from their allocation, there are a bunch of arcane little differences in things like reference counting, how you set them, etc., which are mostly motivated by the need to maintain compatibility with old code using the original representation -- all the places where they differ are places where the newer representation does something more sensible, it's not possible to create a statically-allocated type except by writing C code that *doesn't* use the stable ABI, etc. Otherwise they're supposed to work the same (eliminating differences between built-in and user-defined classes was the whole point of new-style classes!), but all these quirky implementation differences do leak out into little semantic differences in various places.

    One of the places where they've historically leaked out is in __class__ and __bases__ assignment. When the compatibility-checking code for types was originally written, it punted on trying to handle the quirky differences between statically-allocated and heap-allocated type objects, and just declared that all statically-allocated types were incompatible with each other.

    The previous patch
    ------------------

    The patch that is causing all the debate here was reviewed in bpo-22986, after extensive discussion on python-dev, and the actual diff can be seen here:
    https://hg.python.org/cpython/rev/c0d25de5919e/

    It did two things:

    1. It cleaned up a bunch of nasty stuff in the compatibility-checking and assignment code. This fixed some real bugs (e.g., a case where incompatible classes could be considered compatible based on the contents of random memory found by following bogus pointers), and in the process made it robust enough to handle all types, both statically-allocated and heap-allocated.

    2. It removed the check in object_set_class that protected the downstream code from ever seeing statically-allocated types - that's the 'if' check here:
      https://hg.python.org/cpython/rev/c0d25de5919e/#l3.97

    As far as we know, this is all working exactly as designed! For example, Serhiy's int subclass at the beginning of the thread is compatible with the int class in the sense that the modified version of the '42' object is still a valid Python object, all its fields are valid, it won't crash the interpreter when deallocated, etc.

    But of course what we didn't think of is that the interpreter assumes that instances of some particular built-in types are truly immutable, and the interpreter's internal invariants are violated if you can in any way modify an 'int' object in place. Doh.

    This new patch
    --------------

    So the attached patch modifies the original patch by leaving the cleanups in place, but puts back a guard in object_set_class that disallows __class__ assignment for statically-allocated types EXCEPT that it still allows it specifically in the case of ModuleType subclasses. It also adds a big comment explaining the purpose of the check, and adds tests to make sure that __class__ assignment is disallowed for builtin immutable types.

    Analysis of the mergeability of this patch:

    • We still believe that the actual compatibility checking code is strictly better than it used to be, so in any case where these parts of the changes affect Python's semantics, the new results should be better (i.e. less likely to break memory safety and crash the interpreter).

    • The new guard added by this patch is conservative, but allows a strict superset of what 3.4 and earlier allowed, so it won't break any existing code.

    • The one way that the proposed new guard is less conservative than the one in 3.4 is that the new one allows ModuleType objects through. Since we believe that the compatibility-checking code is now correct for statically-allocated types, this should be fine from a memory-safety point of view, and because the interpreter definitely does not assume that ModuleType objects are immutable, then this should be safe from that perspective as well.

    Larry also asked for a "regression test that tried mutating fields on a bunch of interned objects". I'm not sure which fields he was thinking of in particular, but the only entry points that lead to code touched by the original patch are attempts to set __class__ or to set __bases__. *bases__assignment has always been illegal on statically-allocated types, and has remained illegal through all of these patches*, plus it already has a bunch of tests, so I left it alone. This patch adds tests for __class assignment on all the potentially-interned types that I could think of.

    So..... I think that covers all the bases about what this patch is and why it is appropriate for 3.5.0 (assuming that Larry sticks with treating this as a release-critical bug). I'll post a follow-up with replies to a few specific side comments that people made.

    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 1, 2015

    Mark Shannon wrote:

    So, just make sure that you insert the new object into sys.modules *before* doing any imports or calls to code that could import your module and it will all work fine.

    The problem with doing this is that you're now stuck managing two diverging namespaces: the one associated with your new object that other modules can see, and the one where your __init__.py code is doing all those imports and calls. So if you want this to work then you have to totally rewrite your package's startup sequence, OR you have to insert some code like
    sys.modules[name].__dict__.update(orig_module.__dict__)
    after *every line* of your __init__.py, OR you have to do some really complicated global analysis of every module inside your package to figure out exactly what the state of these two namespaces is at each possible point during the startup sequence and prove that the divergences don't matter...

    The key feature of the metamodule approach is that sys.modules["modname"].__dict__ is always the same object as your __init__.py globals(), so there's no change of divergence and it can guarantee that merely enabling metamodule for an existing package will always be safe and have no behavioural effect (until you start using the new metamodule features). This guarantee is hugely important given that the first user will probably be numpy, which is a giant crufty package with millions of users.

    I promise, we went over all of this on python-dev last year :-)

    Mark Lemburg wrote:

    Python code will generally assume that it can trust
    builtin types. It doesn't expect 42 + 2 to clear out the root dir,
    just because some package installed from PyPI happens to feel in the
    mood for Easter eggs :-)

    The only reason that'd be possible though is because you went and ran some untrusted code with permissions allowing it to clear out the root dir -- the only way to set up this "exploit" is to run untrusted Python code. Basically you've handed someone a gun, and now you're worried because this patch gives them a new and particularly rube-goldbergian method for pulling the trigger...

    Except it isn't even a new method; your nasty PyPI package can trivially implement this "easter egg" using only fully-supported features from the stdlib, in any version of Python since 2.5. Here's some nice portable code to do __class__ assignment while dodging *all* the checks in object_set_class:

      from ctypes import *
      def set_class(obj, new_class):
          ob_type_offset = object.__basicsize__ - sizeof(c_void_p)
          c_void_p.from_address(id(obj) + ob_type_offset).value = id(new_class)

    I mean, obviously ctypes is nasty and breaks the rules, I'm not saying this justifies making __class__ assignment broken as well. But this bug is no more a *security* problem than the existence of ctypes is.

    Larry Hasting wrote:

    Consider for a moment Google App Engine. If GAE updated to 3.5 with this bug, users would now have the ability to inject code into other people's programs, because interned ints (and a couple other types) are shared across interpreters.

    Okay, fair enough :-). On GAE this *would* be a security bug because GAE I guess runs an extensively modified and audited fork of Python that implements a full sandbox. I assume this is also why it took them ~2 years to upgrade to 2.7, and why they're shipping 3 year old versions of all their libraries, and why they're now starting to move people to a new setup using OS-level sandboxing instead of interpreter-level sandboxing...

    Python.org doesn't provide any sandbox guarantees, and this bug is a tiny drop in the bucket compared to what anyone will need to do to add a trustworthy sandbox to CPython 3.5, so for me I still wouldn't call this release critical. But you're the RM, so here's a patch if you want it :-).

    Serhiy Storchaka wrote;

    I'm not sure that allowing __class__ assignment for larger domain of types is desirable. If we will desire that it is not, any enhancements to __class__ assignment should be withdrawn. May be __class__ assignment should be discouraged, deprecated and then disabled for all classes (in 3.6+), and other ways should be proposed to solve problems that are solved with __class__ assignment.

    I don't necessarily object to the idea of eventually removing __class__ assignment in some future version of Python. It kind of freaks me out too. (Though Guido seems to like it.)

    I really, really, REALLY object to the idea of -- at this point in the release cycle! -- rejecting a new feature that has gone through review on python-dev, that solves a real problem that's impacting a bunch of people (see all the replies on the numpy-discussion thread linked above of people saying "oh ugh yes this has totally bitten me! please fix it!"), and to do this on the grounds that someone *might* later make an argument for it be removed again in 3.7 and that python-dev might eventually agree with that argument? I mean, c'mon.

    If it were breaking everything, then that would be grounds for removing it, no question there. But the problems described in this bug report are well understood, and it's trivial to fix them in a conservative way without backing out the original feature.

    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 1, 2015

    On further thought, here's a slightly improved version of the patch I posted above.

    The difference is that the first version allowed through attempted __class__ assignments where either the old or new class was a subclass of ModuleType; the new version only allows through attempted assignments if both the old AND new class are a subclass of ModuleType.

    In practice this doesn't make any difference, because the compatibility-checking code will reject any attempt to switch from a ModuleType subclass to a non ModuleType subclass or vice-versa. So both patches are correct. But the new patch is more obviously correct.

    @serhiy-storchaka
    Copy link
    Member Author

    The first time this bug was discovered in bpo-23726.

    @serhiy-storchaka
    Copy link
    Member Author

    What if just add settable __class__ property in ModuleType?

    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 1, 2015

    Doh, apparently I did once know about bpo-23726 and then forgot. Good catch.

    Technically we could add a __class__ field to ModuleType directly, but I think then the ModuleType __class__ setter would basically need to be an exact reimplementation of everything that object_set_class already does? (Since it would still need to check that this particular ModuleType subclass was layout-compatible etc. -- e.g. no new __slots__.) And the end result would behave identically to the patch I just posted, except we'd replace a 2 line check in typeobject.c with some really complicated and bug-prone code in moduleobject.c? I don't think it buys us anything.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 3, 2015

    I've reviewed the patch, and it looks good to me. Key additions:

    • truly immutable types (instances of which may be cached) are now explicitly checked in the test suite to ensure they don't support __class__ assignment, even to a subclass

    • the check for non-heaptypes in __class__ assignment has been restored, with a hardcoded whitelist to bypass that check. The only type on the whitelist for 3.5 is PyModuleType.

    Thus, this patch will also serve to protect any extension types that were relying on the non-heaptype check to prevent __class__ assignment.

    The patch also includes a long block comment before the restored check for non-heap types that explains the history and links out to this issue.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 3, 2015

    I've filed bpo-24991 to suggest reviewing the wider question of how we deal with identifying whether or not a type's instances are expected to be "truly immutable" in the Python 3.6 time frame.

    For 3.5, I think Nathaniel's proposed patch restoring the assumption that non-heap types are immutable by default, and whitelisting __class__ assignment for PyModuleType specifically is a good way to resolve this with minimal code churn late in the release cycle.

    @larryhastings
    Copy link
    Contributor

    Mark, Victor, Benjamin: how do you feel about v2 patch vs rolling back the change entirely?

    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 3, 2015

    Since I know there's a lot of text here for people to process, here's an attempted TL;DR (with inspiration from Serhiy's msg249495):

    There were three parts to the original change:

    1. Bug fixes for typeobject.c
    2. Enabling __class__ assignment for module objects
    3. Enabling __class__ assignment for other objects, in particular all non-heap types, including 'int', 'str', etc.

    Everyone agrees we want to revert (3), which is the bit that has caused all the problems. And I don't think anyone has argued for reverting (1) after learning that it is separable from the rest. So the main question still under discussion is whether we want to revert (2)+(3), or just revert (3) alone.

    (Either of these is easy to do; the attached "v2" patch implements the revert (3) alone option.)

    My position:

    When making changes for rc3 (!), we should definitely revert any functionality that is breaking things, and should definitely not revert any functionality that isn't.

    The (3) change obviously meets this bar, so we'll revert it. But when it comes to module objects specifically -- the (2) change -- then (AFAICT) no-one has identified any problem with the functionality itself, even though it's been specifically discussed and reviewed multiple times over the last year, and been enabled all the way through the pre-release with explicit tests provided, etc. The only argument I see raised against it here is that there might be some more-or-less-equivalent but maybe-more-aesthetic way of accomplishing the same thing, so maybe we should revert it now so we can get in another round of bikeshedding. And if this had been raised a few months ago I'd have been sympathetic... but IMO the rc3 release is too late to be churning functionality based purely on aesthetics, so I think we should revert (3) alone, while leaving (1)+(2) in place.

    @markshannon
    Copy link
    Member

    Larry, of the two choices, I prefer rolling back the change entirely.

    I would like to see the bug fixes for typeobject.c applied, but I see no reason why making the __class__ of module objects assignable should be included.

    @gvanrossum
    Copy link
    Member

    We should have something for rc3, which is imminent. I vote for (1) and (2) if Serhiy thinks the patch is ready.

    @serhiy-storchaka
    Copy link
    Member Author

    bpo-24912-v2.patch LGTM.

    @gvanrossum
    Copy link
    Member

    Serhiy, can you commit it and prepare a PR for Larry?

    @gvanrossum
    Copy link
    Member

    I'm creating the PR for Larry.

    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 5, 2015

    There's already a PR on bitbucket that I made, in case that's helpful. The only difference from the v2 patch are that I rephrased some of the comment in a more neutral way (based on feedback from Mark posted on the PR itself), and added a NEWS entry.

    It does have a slightly more unsightly history graph due to the edit plus merging to resolve a conflict in NEWS.

    @gvanrossum
    Copy link
    Member

    Oh. I feel dumb now. I guess I'll let Larry choose. If it's just a matter of comment text we can always improve it in 3.5.1.

    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 5, 2015

    Sorry, you were too quick for me :-)

    @markshannon
    Copy link
    Member

    Why has the change allowing the __class__ attribute of modules been merged?

    It is not necessary (as I have stated repeatedly) and not known to be safe.

    @larryhastings
    Copy link
    Contributor

    Because the BDFL asked that it be so.

    @markshannon
    Copy link
    Member

    Well, there are important reasons why not to make the __class__ attribute of modules mutable.

    First of all, there is no valid rationale.

    How do know for sure that modifying the class of the sys or builtins module is not going to cause much the same problems the changing the class of ints did?

    @markshannon
    Copy link
    Member

    Oh, and has anyone considered the potential impact this might have on Jython or PyPy? Modules are quite fundamental to the way that Python works.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 5, 2015

    New changeset 27cc5cce0292 by Guido van Rossum in branch '3.5':
    Issue bpo-24912: Prevent __class__ assignment to immutable built-in objects.
    https://hg.python.org/cpython/rev/27cc5cce0292

    New changeset 1c55f169f4ee by Guido van Rossum in branch '3.5':
    Issue bpo-24912: Prevent __class__ assignment to immutable built-in objects. (Merge 3.5.0 -> 3.5)
    https://hg.python.org/cpython/rev/1c55f169f4ee

    New changeset b045465e5dba by Guido van Rossum in branch 'default':
    Issue bpo-24912: Prevent __class__ assignment to immutable built-in objects. (Merge 3.5 -> 3.6)
    https://hg.python.org/cpython/rev/b045465e5dba

    @gvanrossum
    Copy link
    Member

    Mark, please calm down. Modules are incredibly simple objects compared to classes -- and yet you can change the __class__ of a class. You can also directly modify the __dict__ of a module. You can shoot yourself in the foot phenomenally this way, but that's not the point. The point is not to violate *internal* constraints of the interpreter. Changing immutable objects was violating such a constraint. Changing a module's __class__ is not.

    @markshannon
    Copy link
    Member

    Guido,
    I just think this change is misguided. The original rationale for the change just doesn't hold water.

    If you want the change anyway, then fine, but this seems an odd way to introduce it.

    @markshannon markshannon reopened this Sep 5, 2015
    @njsmith
    Copy link
    Contributor

    njsmith commented Sep 5, 2015

    I didn't want to get further into the weeds of debating basic design points on the tracker while blocking rc3, but, for the record -- the proposal in msg249504 that one could keep the namespace of an old and new module object effectively in sync by defining __getattr__ on the new module doesn't work AFAICT. (I assume this is the basis of Mark's assertion that the change is unmotivated.) Going through __getattr__ creates an unacceptable performance hit on attribute access. You can avoid this by copying the namespace entries over (b/c __getattr__ is only called as a fallback when __dict__ lookup fails), but then you have the same problem as before of needing to keep things in sync. (And remember that module globals can change at any time -- the __globals__ pointer for any functions defined in that module will continue to point to the old namespace.)

    Re: PyPy: last I heard their main complaint about __class__ assignment was that it was hard to restrict it to *only* handle the cases that CPython does. (They do have a concept of "heap types"; AFAICT it's just a flag that says "the CPython version of this type has some quirks that we need to check for and emulate".) Not sure about Jython, but I doubt they have a natural heap/non-heap type distinction either.

    Thanks Guido for handling the patches!

    @ericsnowcurrently
    Copy link
    Member

    While I recognize the practicality/purity argument here, I somewhat agree with Mark. Assigning to module.__class__ makes sense for the use case, but it does open up a number of negative possible side effects (e.g. changing sys.__class__). Ideally it would be restricted to just the currently running module, but doing that through __class__ assignment is not a great idea. Having a dedicated builtin function or even syntax would make more sense.

    In some ways I'm reminded of __metaclass__ in Python 2 class definitions. Something like that has come up before but didn't get a lot of momentum at the time. Ultimately I'd like to see a more explicit mechanism to meet the need that motivated the __class__ change here (as well as the replace-this-module-in-sys-modules technique). [1] While you can already accomplish this via a custom finder, that's not the most convenient tool for this sort of situation.

    FWIW, if assigning to sys.modules[name].__class__ is the blessed way then we should make it official in the language reference. The same goes for modules replacing themselves in sys.modules. If we have reservations about making either official (which I for one do [2]) then we should work on a more appropriate official solution.

    [1] It all boils down to customizing the current module's type from within the module rather than from outside (using a custom loader).
    [2] Neither spelling is very friendly.

    @larryhastings
    Copy link
    Contributor

    Marking as closed for now. If we decide it's a problem we can reopen later.

    @vstinner
    Copy link
    Member

    Update: a new Py_TPFLAGS_IMMUTABLETYPE flag was added in bpo-43908, and it's now checked rather than relying on Py_TPFLAGS_HEAPTYPE to decide if a type is "mutable" or not.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests