This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author njs
Recipients Mark.Shannon, benjamin.peterson, larry, lemburg, njs, pitrou, serhiy.storchaka
Date 2015-08-31.23:49:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1441064973.26.0.861760351152.issue24912@psf.upfronthosting.co.za>
In-reply-to
Content
Actually, I fail at grep, and actually there are already tons of good tests for __bases__ assignment in test_descr.py. So, patch attached, and analysis follows.


Background on __class__ assignment
----------------------------------

While it's certainly surprising, assignment to obj.__class__ and to type.__bases__ has been an explicitly supported feature ever since the advent of new-style classes back in 2.ancient, and typeobject.c goes to great lengths to handle all the weird resulting cases (including stuff like "what if __del__ changes the __class__", "what if someone mutates __bases__ while we are in the middle of method lookup inside a metaclass's mro() method", etc.).

To make this safe, of course, we need to somehow check that after making the assignment then the resulting object will still be a valid, self-consistent Python object. The types in question need to have consistent in-memory representations, that their __dict__ slot (if present) is at the same offset, that their memory management callbacks are consistent (if we mutate an object from type A to type B, then A and B must be written in such a way that it's legal to pass an object allocated with A.tp_new to B.tp_dealloc), etc.

This compatibility checking occurs in two places: object_set_class and type_set_bases (and several utility functions that they share).

For historical reasons, there are two different C representations for type objects in Python: the original representation, which involves statically-allocated structs, and the newer representation, which involves heap-allocated structs. Aside from their allocation, there are a bunch of arcane little differences in things like reference counting, how you set them, etc., which are mostly motivated by the need to maintain compatibility with old code using the original representation -- all the places where they differ are places where the newer representation does something more sensible, it's not possible to create a statically-allocated type except by writing C code that *doesn't* use the stable ABI, etc. Otherwise they're supposed to work the same (eliminating differences between built-in and user-defined classes was the whole point of new-style classes!), but all these quirky implementation differences do leak out into little semantic differences in various places.

One of the places where they've historically leaked out is in __class__ and __bases__ assignment. When the compatibility-checking code for types was originally written, it punted on trying to handle the quirky differences between statically-allocated and heap-allocated type objects, and just declared that all statically-allocated types were incompatible with each other.


The previous patch
------------------

The patch that is causing all the debate here was reviewed in issue22986, after extensive discussion on python-dev, and the actual diff can be seen here:
    https://hg.python.org/cpython/rev/c0d25de5919e/

It did two things:

1) It cleaned up a bunch of nasty stuff in the compatibility-checking and assignment code. This fixed some real bugs (e.g., a case where incompatible classes could be considered compatible based on the contents of random memory found by following bogus pointers), and in the process made it robust enough to handle all types, both statically-allocated and heap-allocated.

2) It removed the check in object_set_class that protected the downstream code from ever seeing statically-allocated types - that's the 'if' check here:
    https://hg.python.org/cpython/rev/c0d25de5919e/#l3.97

As far as we know, this is all working exactly as designed! For example, Serhiy's int subclass at the beginning of the thread is compatible with the int class in the sense that the modified version of the '42' object is still a valid Python object, all its fields are valid, it won't crash the interpreter when deallocated, etc.

But of course what we didn't think of is that the interpreter assumes that instances of some particular built-in types are truly immutable, and the interpreter's internal invariants are violated if you can in any way modify an 'int' object in place. Doh.


This new patch
--------------

So the attached patch modifies the original patch by leaving the cleanups in place, but puts back a guard in object_set_class that disallows __class__ assignment for statically-allocated types EXCEPT that it still allows it specifically in the case of ModuleType subclasses. It also adds a big comment explaining the purpose of the check, and adds tests to make sure that __class__ assignment is disallowed for builtin immutable types.

Analysis of the mergeability of this patch:

- We still believe that the actual compatibility checking code is strictly better than it used to be, so in any case where these parts of the changes affect Python's semantics, the new results should be better (i.e. less likely to break memory safety and crash the interpreter).

- The new guard added by this patch is conservative, but allows a strict superset of what 3.4 and earlier allowed, so it won't break any existing code.

- The one way that the proposed new guard is less conservative than the one in 3.4 is that the new one allows ModuleType objects through. Since we believe that the compatibility-checking code is now correct for statically-allocated types, this should be fine from a memory-safety point of view, and because the interpreter definitely does not assume that ModuleType objects are immutable,  then this should be safe from that perspective as well.

Larry also asked for a "regression test that tried mutating fields on a bunch of interned objects". I'm not sure which fields he was thinking of in particular, but the only entry points that lead to code touched by the original patch are attempts to set __class__ or to set __bases__. *__bases__assignment has always been illegal on statically-allocated types, and has remained illegal through all of these patches*, plus it already has a bunch of tests, so I left it alone. This patch adds tests for __class__ assignment on all the potentially-interned types that I could think of.

So..... I think that covers all the bases about what this patch is and why it is appropriate for 3.5.0 (assuming that Larry sticks with treating this as a release-critical bug). I'll post a follow-up with replies to a few specific side comments that people made.
History
Date User Action Args
2015-08-31 23:49:34njssetrecipients: + njs, lemburg, pitrou, larry, benjamin.peterson, Mark.Shannon, serhiy.storchaka
2015-08-31 23:49:33njssetmessageid: <1441064973.26.0.861760351152.issue24912@psf.upfronthosting.co.za>
2015-08-31 23:49:33njslinkissue24912 messages
2015-08-31 23:49:31njscreate