Title: Explicitly specify `MyClass.__subclasses__()` returns classes in definition order
Type: Stage:
Components: Versions:
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BNMetrics, gvanrossum, josh.r, pablogsal, pekka.klarck, xtreak
Priority: normal Keywords:

Created on 2018-09-26 06:18 by pekka.klarck, last changed 2018-11-07 22:41 by gvanrossum.

Messages (14)
msg326420 - (view) Author: Pekka Klärck (pekka.klarck) Date: 2018-09-26 06:18
I had a use case where `MyClass.__subclasses__()` returning classes in the definition order would have made the code simpler. They seemed to be returned in that order, but because the docs didn't say anything about it [1] I did some searching to find out can I trust the order to stay the same also in the future.

Based on my searches it seems that prior to Python 3.5 the information was stored in a list and that preserved the order. In Python 3.5 the information was stored into a dict for performance reasons as part of issue 17936 [2]. Back then dicts weren't ordered, so the order is undefined in Python 3.5, but in Python 3.6+ the order is again preserved. In practice Python 3.5 is the only current version where the order of `__subclasses__()` is not defined.

My proposal is to make the current, and old, behavior of returning classes from `__subclassses__()` in definition order explicit. If nobody has anything against that, I'd be willing to provide a pull request updating docs and adding unit tests making sure the order is not messed up in the future. Let me also know if even this kind of simple changes would need to go through python-ideas or python-dev. The PEP process feels overkill when there are no code changes required.

PS: I know both Antoine and Guido commented issue 17936 [3] that they don't know any use cases for the order. I can explain my use case if someone is interested.

msg327800 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-10-15 22:58
Hi and thank you for the proposal. Could you detail a bit more about your use-case? Thanks
msg327815 - (view) Author: Pekka Klärck (pekka.klarck) Date: 2018-10-16 08:38
My use case was implementing conversion of strings to different objects based on type information got from function arguments. Initially I had a converter class with methods for each conversion (e.g. `_convert_bool`, `_convert_int`) but this class got so big that I decided to split each converter into a separate class (e.g. `BooleanConverter`, `IntegerConverter`) with a common base class. The base class also has a `converter_for` classmethod that is a factory that returns a concrete converter for a certain type.

For the factory to be able to return a correct converter, it obviously needs to know what converters exist and what types they handle. My first idea was to simply go through `cls.__subclasses__()` and return the first one that handles the specified type, but because order matters for us this doesn't work. The reason order matters is that we handle both Boolean and integer types, and `bool` being a subclass of `int` means we need to handle the former first.

Because I couldn't use `cls.__subclasses__()`, I needed to implement a custom way for the converters to register themselves. That wasn't particularly complicated but some extra work anyway.

The code I wrote happens to be open source and visible at [1] if you are interested to look it more closely. The `@TypeConverter.register` stuff could have been avoided is `cls.__subclasses__()` were ordered. It could also be removed once we drop Python 3.5 support *if* order is guaranteed to be preserved going forward.

msg329389 - (view) Author: Luna Chen (BNMetrics) * Date: 2018-11-06 21:36
Hi Pekka Klärck, I would like to work on this issue and make a PR if you haven't done it yet. 

msg329395 - (view) Author: Pekka Klärck (pekka.klarck) Date: 2018-11-06 22:40
Haven't created a PR yet. Go ahead and great one if you have time Luna! We'd need a decision about this too, but if the decision is no, then it would nevertheless be a good idea to mention in the docs that the order is not guaranteed.
msg329400 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-11-07 00:37
I would kindly suggest waiting until there is some consensus on this before doing any Pull Request.
msg329402 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-11-07 03:29
Since this is how it works since 3.6, I think it's reasonable to make this part of the spec. For dictionary order we waited a release between implementing it that way and making it part of the spec; effectively we've already waited two releases (3.6 and 3.7) for this. So I say let's do it. This isn't something that's PEP-worthy anyway.
msg329437 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-11-07 21:05
First off, the OP's original case seems like a use case for functools.singledispatch. Not really related to the problem, just thought I'd mention it.

Secondly, are we sure we want to make such a guarantee? That restricts the underlying storage to ordered types (list/dict; possibly tuple at the cost of making modifications slightly more expensive), or an unordered type with additional ordering layered on it (like old-school OrderedDict).

That does tie our hands in the future. For example, it seems like it would be a perfectly reasonable approach for the internal collection of subclasses to be implemented as a weakref.WeakSet (some future version of it implemented in C, rather than the current Python layer version) so as to reduce code duplication and improve handling when a subclass disappears. Right now, tp_subclasses is a dict keyed by the raw memory address of the subclass (understandable, but eww), with a value of a weakref to the subclass itself. There is tons of custom code involved in handling this (e.g. the dict only self-cleans because the dealloc for classes explicitly removes the subclass from the parent classes, but every use of the dict still has to assume weakrefs have gone dead anyway, because of reentrancy issues; these are solved problems in WeakSet which hides all the complexity from the user). Being able to use WeakSets would mean a huge amount of special purpose code in typeobject.c could go away, but guaranteeing ordering would make that more difficult (it would require providing an ordering guarantee for WeakSet, which, being built on set, would likely require ordering guarantees for sets in general, or changing WeakSet to be built on dicts).

There is also (at least) one edge case that would need to be fixed (based on a brief skim of the code). type_set_bases (which handles assignment to __bases__ AFAICT, admittedly a niche use case) simplified its own implementation by making the process of changing __bases__ be to remove itself as a subclass of all of its original bases, then add itself as a subclass of the new bases. This is done even if there are overlaps in the bases, and even if the new bases are the same.

Minimal repro:

    >>> class A: pass
    >>> class B(A): pass
    >>> class C(A): pass
    >>> A.__subclasses__()  # Appear in definition order
    [__main__.B, __main__.C]

    >>> B.__bases__ = B.__bases__    # Should be no-op...
    >>> A.__subclasses__()           # But oops, order changed
    [__main__.C, __main__.B]

I'm not going to claim this is common or useful (I've done something like this exactly once, interactively, while making an OrderedCounter from OrderedDict and Counter back before dicts were ordered; I got the inheritance order wrong and reversed it after the fact), but making the guarantee would be more than just stating it; we'd either have to complicate the code to back it up, or qualify the guarantee with some weird, possibly CPython-specific details.
msg329438 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-11-07 21:25
Thanks for the long post! Clearly there is more here than the eye can easily see.

Nevertheless, I feel that, *in this case*, it's not likely that such a re-implementation will ever happen, so I think it is okay to constrain the future so we can guarantee (the ordering aspect of) the current behavior. The current behavior also *feels* natural, regardless of the validity of the OP's use case.

The edge case of assignment to __bases__ is a good one to call out (in the docs and in the test) but I don't think the current behavior there is sufficiently dicey to change it or to exclude it from the guarantee.
msg329439 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-11-07 21:27
I'm also a little skeptical of the OP's proposed use case for other reasons. In any circumstance other than "all classes are defined in the same module", you can't really make useful guarantees about subclass definition order, because:

1. If the converters are defined in multiple modules in a single package, the module with IntConverter could be imported first explicitly, and now BoolConverter will come second.

2. If all the provided converters occur in a single monolithic module, and some other package tries to make a converter for their own int subclass, well, IntConverter is already first in the list of subclasses, so the other package's converter will never be called (unless it's for the direct subclass of int, rather than a grandchild of int, but that's an implementation detail of the OP's project).

Essentially, to write such a class hierarchy properly, you'd need to rejigger the ordering each time a class was registered such that any converter for a parent class was pushed until after the converter for all of its descendant classes (and if there is multiple inheritance involved, you're doomed).

Even ignoring all that, their use case doesn't require explicit registration if they don't want it to. By making a simple metaclass for the root class, the metaclass's __new__ can perform registration on the descendant class's behalf, either with the definition time ordering of the current design, or with a more complicated rejiggering I described that would be necessary to ensure parent classes are considered after child classes (assuming no multiple inheritance).
msg329441 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-11-07 21:41
Josh, I have to ask -- do you have plans to rewrite the __subclasses__
implementation? Because at this point I don't really feel like arguing over
the OP's use case. It looks like you have a strong objection over the
requested feature that's quite independent from the validity of any use
cases, and I'm curious why. The argument "but then we couldn't change the
implementation in some future version" feels pretty weak.
msg329445 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-11-07 22:07
I wrote the response to the OP's use case before I saw your response; it wasn't really intended as an additional critique of the proposed change or a counterargument to your post, just a note that the required behavior could be obtained on all versions of Python via metaclasses, including on 3.5.

I have no specific plans to rewrite the typeobject.c, nor make a C implemented WeakSet. I'm just leery of adding language guarantees that limit future development when they:

1. Provide very little benefit (I doubt one package in 10,000 even uses __subclasses__, let alone relies on its ordering)

2. The benefit is achievable without herculean efforts with existing tools (metaclasses can provide the desired behavior with minimal effort at the trivial cost of an additional side-band dict on the root class)

If the guarantee never limits a proposed change, then our best case scenario is we provided a guarantee that benefits almost no one (guaranteed upside minimal). But if it limits a proposed change, we might lose out on a significant improvement in performance, code maintainability, what have you (much larger potential downside). I'm just not seeing enough of a benefit to justify the potential cost.
msg329446 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-11-07 22:11
Keep in mind, had this guarantee been in place in 3.4, the improvement in 3.5 couldn't have been made, and issue17936 might have been closed and never addressed, even once dicts were ordered, simply because we never rechecked it. It makes the whole "potential downside" more obvious, because we would have paid that price not so long ago. Knowing that 3.5 improved by breaking this guarantee was part of what made me cautious here.
msg329448 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-11-07 22:41
OK, I don't have time to keep arguing, and I doubt any other core devs
care. We should probably just close it as Won't Fix, since it's not
important enough to keep deliberating.

Luna, I recommend that you pick another project.
Date User Action Args
2018-11-07 22:41:53gvanrossumsetmessages: + msg329448
2018-11-07 22:11:53josh.rsetmessages: + msg329446
2018-11-07 22:07:27josh.rsetmessages: + msg329445
2018-11-07 21:41:38gvanrossumsetmessages: + msg329441
2018-11-07 21:27:47josh.rsetmessages: + msg329439
2018-11-07 21:25:23gvanrossumsetmessages: + msg329438
2018-11-07 21:05:15josh.rsetnosy: + josh.r
messages: + msg329437
2018-11-07 03:29:24gvanrossumsetnosy: + gvanrossum
messages: + msg329402
2018-11-07 00:37:21pablogsalsetmessages: + msg329400
2018-11-06 22:40:15pekka.klarcksetmessages: + msg329395
2018-11-06 21:36:58BNMetricssetnosy: + BNMetrics
messages: + msg329389
2018-10-16 08:38:33pekka.klarcksetmessages: + msg327815
2018-10-15 22:58:19pablogsalsetnosy: + pablogsal
messages: + msg327800
2018-09-26 10:08:43xtreaksetnosy: + xtreak
2018-09-26 06:18:03pekka.klarckcreate