Message 192060 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	ecatmur, gvanrossum, lukasz.langa, rhettinger
Date	2013-06-30.00:12:18
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAP7+vJKBvtUpj3VYSkFKbC-qR-seny5bUaWsFTuur+yi=PDA3w@mail.gmail.com>
In-reply-to	<1372526677.94.0.12961643988.issue18244@psf.upfronthosting.co.za>

Content
> Looks like the priority ordering you mention is not yet documented > anywhere. Because up till now it has not been needed -- all you can do with ABCs is use isinstance/issubclass. > It definitely makes sense but I'd like to take a step back for > a moment to consider the following questions: > > 1. What additional functionality do our users get with this ordering? In > other words, what purpose does this new ordering have? > > Apart from solving the conflict we're discussing, I can't see any. There doesn't have to be any other functionality. We're just trying to address how ABCs should be ordered relative to classes explicitly in the MRO for the purpose of @singledispatch. > 2. What disadvantages does this ordering bring to the table? > > I think that simplicity is a feature. This ordering creates > additional complexity in the language. But so does not ordering. The underlying question is how to dispatch when two or more classes in an object's class hierarchy have a different dispatch rule. This is fundamentally a question of ordering. For regular method dispatch, the ordering used is the MRO, and there is always a unique answer: the class that comes first in the MRO wins. (Ambiguities and inconsistencies in the MRO are rejected at class definition time.) This is very convenient because the issue of coming up with a total ordering of base classes is solved once and for all, and because of the total ordering we never have to reject a request to dispatch (of regular methods or attribute lookup) as ambiguous. (Note: I call it a total ordering, but the total ordering is only within a specific class's MRO. Any class's explicit bases are totally ordered in that class's MRO -- but the order of two classes may be different in a different class's MRO. This is actually relevant for the definition of C3, and we'll see this below.) For @singledispatch, we are choosing to support ABCs (because it makes sense), and so we have to think about how handle ABCs that are relevant (isinstance/issubclass returns True) but not in the MRO. Let's introduce terminology so we can talk about different cases easily. Relevant: isinstance/issubclass returns True Explicit: it's in the MRO Implicit: relevant but not explicit Registered: Implicit due to a register() call Inferred: Implicit due to a special method These categories are not always exclusive, e.g. an ABC may be registered for one class in the MRO but inferred for a different one. The registration mechanism tries to avoid outright cycles but otherwise is not as strict about rejecting ambiguities as C3 is for the MRO calculation, and this I believe is the reason we're still debating the order. A simple example of something rejected by C3: - suppose we have classes B and C derived directly from object - class X(B, C) has MRO [X, B, C, object] - class Y(C, B) has MRO [Y, C, B, object] - class Z(X, Y) is rejected because there is no consistent ordering for B and C in the MRO for Z However, if we construct a similar situation using implicit ABCs, e.g. by removing B and C from the list of explicit bases of X and Y, and instead registering X and Y as their virtual subclasses -- then Z(X, Y) is valid and Z is a virtual subclass of both B and C. But there is no ordering preference in this case, so I think it's good if @singledispatch refuses the temptation to guess here, in the case where there are registered dispatches for B and C but none for X, Y or Z. I believe the rest of the discussion is purely about what to do if B was explicitly listed as a base but C wasn't (or vice versa). There are still a few different cases; the simplest is this: class X(B) -- MRO is [X, B, object] class Y(B) -- MRO is [Y, B, object] C.register(X) C.register(Y) class Z(X, Y) -- MRO is [Z, X, Y, B, object] IIUC you want @singledispatch to still reject a call for a Z instance if the only relevant registered dispatches are for B and C -- because you claim that X, Y and Z are "just as much" subclasses of B as they are of C. (It doesn't actually matter whether we use X, Y or Z as an example here -- all of them have the same problem.) But I disagree. First, in the all-explicit example, we can equally say that X is "just as much" a subclass of B as it is of C. And yet, because both appear in the explicit MRO, B wins when dispatching for X -- and C wins when dispatching for Y, because the explicit base classes are ordered differently there. So the property of being "just as much" a base class isn't used -- but the order in the explicit MRO is. It is the nature and intent of ABC registration that it does not have to occur in the same file as the class definition. In particular, it is possible that the class definitions of B, C, X, Y and Z all occur together, but the C.register() calls occur in a different, unrelated module, which is imported only on the whim of the top-level application, or as a side effect of some unrelated import made by the top-level app. This, to me, is enough to consider registered ABC inheritance as a second-class citizen compared to explicit inclusion in the list of bases. Consider this: dispatch(Z) would consider only Z, X, Y, B, object if the C.register() calls were not made, and then it would choose B; but under your rules, if the app changed to cause the C.register() calls to be added, dispatch(Z) would complain. This sounds fragile to me, and it is the main reason why I want explicit ABCs to prevail over "merely" registered ABCs, rather than to be treated equal and possibly cause complaints based on what happened elsewhere in the program. Now let's move on to inferred ABCs. Let's assume C is actually Sized, and both X and Y implicitly subclass Sized because they implement __len__(). On the one hand, this doesn't have the problem with register() calls that may or may not be loaded -- the __len__() definitions are always there. On the other hand, to me this feels even less explicit than using register() -- while presumably everyone who uses register() knows at some basic level about ABCs, I do not make the same assumption about people who write classes that have a __len__() method. Because of Python's long history of duck typing, many __len__() methods in code that is still in use (even if it has evolved in other ways) were probably written before ABCs (or even MROs) were introduced! Now assume X's explicit base class B is actually Iterable, and X implements both __iter__() and __len__(). X's author could easily have made X a subclass of both Iterable and Sized, and thereby avoided the ambiguity. But instead, they explicitly inherited only Iterable, and left Sized implicit. (I guess their understanding of ABCs was limited. :-) I really do think that in this case there is a discernible difference between the author's attitude towards Iterable and Sized -- they cared enough about Iterable to inherit from it. So I think we should assume that if we asked them to add Sized as a base class, they would add it second, thereby resolving the ambiguity in favor of Iterable. And that's why I think that if they left Sized out, we are doing them more of a favor by preferring Iterable than by complaining. Now on to your other examples. > Firstly, [Off-topic English-major nit: it's better to write "first" instead of "firstly". English is not a very consistent language. :-) See http://www.randomhouse.com/wotd/index.pperl?date=20010629 for a nuanced view that still ends up preferring "first".] > there is currently no obvious way for users to distinguish > between implicit (which I called "inferred" above) > subclassing (via implementation) or subclassing by > `abc.register()`. What do you mean by "no obvious way to distinguish"? That you can't tell by merely looking at the MRO and calling isinstance/issubclass? But you could look in the _abc_registry. Or you could scan the source for register() calls. TBH I would be fine if these received exactly the same status -- but explicit inclusion in the MRO still ought to prevail. > This creates the dangerous situation where > backwards incompatibility introduced by switching between those ABC > registration mechanism is nearly impossible to debug. Consider an > example: version A of a library has a type which only implicitly > subclasses an ABC. User code with singledispatch is created that > works with the current state of things. Version B of the same library > uses `ABC.register(Type)` and suddenly the dispatch changes without > any way for the user to see what's going on. A similar example with > explicit subclassing and a different form of registration is easier > to debug, but not much, really. So there are an infinite number of ways to break subtle code like this by subtle changes in inheritance. I see that mostly as a fact of life, not something we must contain at all cost. I suppose even with explicit base classes it's not inconceivable that one developer's "innocent" fix of a class hierarchy breaks another developer's code. (See also http://xkcd.com/1172/ :-) So could "innocently" adding or moving a __len__() method. > Secondly, it creates this awkward situation where dispatch for any > custom `MutableMapping` can be different from the dispatch for > `dict`. Although the latter is a `MutableMapping` "only" by means of > `MutableMapping.register(dict)`, in reality the whole definition of > what a mutable mapping is comes from the behaviour found in `dict`. > Following your point of view, dict is less of a mutable mapping than > a custom type subclassing the ABC explicitly. You're saying the user > should "respect the choice of its author" but it's clearly suboptimal > here. I strongly believe I should be able to swap a mutable mapping > implementation with any other and get consistent dispatch. Hmm... I need to make this situation explicit to think about it. I think you are still looking at a case where there are exactly two possible dispatches, one for Sized and one for Iterable, right? And the issue is that if the user defines class Foo(MutableMapping), Sized and Iterable appear explicitly in Foo's MRO, in that order (because MM explicitly subclasses them in this order), so Sized wins. But, hm, because there's a call to MutableMapping.register(dict) in collections/abc.py, the way I see it, if the only dispatches possible were on Sized and Iterable, Sized should still win, because it comes first in MM's MRO. Now, if that MM.register(dict) call was absent, you might be right. But it's there (and you mention it) so I'm not sure what is going on here -- are you talking about a slightly different example, or about a different rule than I am proposing? > Thirdly, while I can't support this with any concrete examples, > I have a gut feeling that considering all three ABC subclassing > mechanisms to be equally binding will end up as a toolkit with better > composability. The priority ordering on the other hand designs > `abc.register()` and implicit subclassing as inferior MRO wannabes. Ok, when we're talking gut feelings, you're going to have a hard time convincing others that your gut is more finely tuned to digesting Python subtleties than mine. :-) Clearly my gut tells me that explicit inclusion of an ABC in the list of bases is a stronger signal than implicit subclassing; at least part of the reason why my gut feels this way is that the C3 analysis is more strict about rejecting ambiguous orderings outright than registered or inferred ABCs. (But my gut agrees with yours that there's not much to break a tie between a registered and an inferred ABC.) > Last but not least, the priority ordering will further complicate the > implementation of `functools._compose_mro()` and friends. While the > complexity of this code is not the reason of my stance on the matter, > I think it nicely shows how much the user has to keep in her head to > really know what's going on. Especially that we only consider this > ordering to be decisive on a single inheritance level. The implementation of C3 is also complex, and understanding all its rules is hard -- harder than what I had before. But it is a better rule, and that's why we use it. > 3. Why is the ordering MRO -> register -> implicit? (Note: I already relented on the latter arrow.) > The reason I'm asking is that the whole existence of `abc.register()` > seems to come from the following: > > * we want types that predate the creation of an ABC to be considered > its subclasses; We certainly want it enough for isinstance/issubclass to work and for an unambiguous dispatch to work. But we're not sure about the relative ordering in all cases, because there's no order in the registry nor for inferred ABCs, and that may affect when dispatch is considered dispatch. > * we can't use implicit subclassing because either the existence of > methods in question is not enough (e.g. Mapping versus Sequence); > or the methods are added at runtime and don't appear in __dict__. > > Considering the above, one might argue that the following order is > just as well justified: MRO -> implicit -> register. I'm aware that > the decision to put register first is because if the user is unhappy > with the dispatch, she can override the ordering by registering types > which were implicit before. But, while I can register third-party > types, I can't unregister any. In other words, I find this ordering > arbitrary. So, if you can handle "MRO is stronger than registered or inferred" we don't actually disagree on this. :-) > I hope you don't perceive my position as stubborn, I just care enough to > insist on this piece of machinery to be clearly defined and as simple as > possible (but not simpler, sure). Not at al. You may notice I enjoy the debate. :-)

> Looks like the priority ordering you mention is not yet documented
> anywhere.

Because up till now it has not been needed -- all you can do with ABCs
is use isinstance/issubclass.

> It definitely makes sense but I'd like to take a step back for
> a moment to consider the following questions:
>
> 1. What additional functionality do our users get with this ordering? In
>    other words, what purpose does this new ordering have?
>
>    Apart from solving the conflict we're discussing, I can't see any.

There doesn't have to be any other functionality. We're just trying to
address how ABCs should be ordered relative to classes explicitly in
the MRO for the purpose of @singledispatch.

> 2. What disadvantages does this ordering bring to the table?
>
>    I think that simplicity is a feature. This ordering creates
>    additional complexity in the language.

But so does not ordering.

The underlying question is how to dispatch when two or more classes in
an object's class hierarchy have a different dispatch rule. This is
fundamentally a question of ordering. For regular method dispatch, the
ordering used is the MRO, and there is always a unique answer: the
class that comes first in the MRO wins. (Ambiguities and
inconsistencies in the MRO are rejected at class definition time.)
This is very convenient because the issue of coming up with a total
ordering of base classes is solved once and for all, and because of
the total ordering we never have to reject a request to dispatch (of
regular methods or attribute lookup) as ambiguous.

(Note: I call it a total ordering, but the total ordering is only
within a specific class's MRO. Any class's explicit bases are totally
ordered in that class's MRO -- but the order of two classes may be
different in a different class's MRO. This is actually relevant for
the definition of C3, and we'll see this below.)

For @singledispatch, we are choosing to support ABCs (because it makes
sense), and so we have to think about how handle ABCs that are
relevant (isinstance/issubclass returns True) but not in the MRO.

Let's introduce terminology so we can talk about different cases easily.

Relevant: isinstance/issubclass returns True
Explicit: it's in the MRO
Implicit: relevant but not explicit
Registered: Implicit due to a register() call
Inferred: Implicit due to a special method

These categories are not always exclusive, e.g. an ABC may be
registered for one class in the MRO but inferred for a different one.

The registration mechanism tries to avoid outright cycles but
otherwise is not as strict about rejecting ambiguities as C3 is for
the MRO calculation, and this I believe is the reason we're still
debating the order. A simple example of something rejected by C3:

- suppose we have classes B and C derived directly from object
- class X(B, C) has MRO [X, B, C, object]
- class Y(C, B) has MRO [Y, C, B, object]
- class Z(X, Y) is *rejected* because there is no consistent ordering
for B and C in the MRO for Z

However, if we construct a similar situation using implicit ABCs, e.g.
by removing B and C from the list of explicit bases of X and Y, and
instead registering X and Y as their virtual subclasses -- then Z(X,
Y) is valid and Z is a virtual subclass of both B and C. But there is
no ordering preference in this case, so I think it's good if
@singledispatch refuses the temptation to guess here, in the case
where there are registered dispatches for B and C but none for X, Y or
Z.

I believe the rest of the discussion is purely about what to do if B
was explicitly listed as a base but C wasn't (or vice versa). There
are still a few different cases; the simplest is this:

class X(B) -- MRO is [X, B, object]
class Y(B) -- MRO is [Y, B, object]
C.register(X)
C.register(Y)
class Z(X, Y) -- MRO is [Z, X, Y, B, object]

IIUC you want @singledispatch to still reject a call for a Z instance
if the only relevant registered dispatches are for B and C -- because
you claim that X, Y and Z are "just as much" subclasses of B as they
are of C. (It doesn't actually matter whether we use X, Y or Z as an
example here -- all of them have the same problem.) But I disagree.
First, in the all-explicit example, we can equally say that X is "just
as much" a subclass of B as it is of C. And yet, because both appear
in the explicit MRO, B wins when dispatching for X -- and C wins when
dispatching for Y, because the explicit base classes are ordered
differently there. So the property of being "just as much" a base
class isn't used -- but the order in the explicit MRO is.

It is the nature and intent of ABC registration that it does not have
to occur in the same file as the class definition. In particular, it
is possible that the class definitions of B, C, X, Y and Z all occur
together, but the C.register() calls occur in a different, unrelated
module, which is imported only on the whim of the top-level
application, or as a side effect of some unrelated import made by the
top-level app. This, to me, is enough to consider registered ABC
inheritance as a second-class citizen compared to explicit inclusion
in the list of bases.

Consider this: dispatch(Z) would consider only Z, X, Y, B, object if
the C.register() calls were *not* made, and then it would choose B;
but under your rules, if the app changed to cause the C.register()
calls to be added, dispatch(Z) would complain. This sounds fragile to
me, and it is the main reason why I want explicit ABCs to prevail over
"merely" registered ABCs, rather than to be treated equal and possibly
cause complaints based on what happened elsewhere in the program.

Now let's move on to inferred ABCs. Let's assume C is actually Sized,
and both X and Y implicitly subclass Sized because they implement
__len__(). On the one hand, this doesn't have the problem with
register() calls that may or may not be loaded -- the __len__()
definitions are always there. On the other hand, to me this feels even
less explicit than using register() -- while presumably everyone who
uses register() knows at some basic level about ABCs, I do not make
the same assumption about people who write classes that have a
__len__() method. Because of Python's long history of duck typing,
many __len__() methods in code that is still in use (even if it has
evolved in other ways) were probably written before ABCs (or even
MROs) were introduced!

Now assume X's explicit base class B is actually Iterable, and X
implements both __iter__() and __len__(). X's author *could* easily
have made X a subclass of both Iterable and Sized, and thereby avoided
the ambiguity. But instead, they explicitly inherited only Iterable,
and left Sized implicit. (I guess their understanding of ABCs was
limited. :-) I really do think that in this case there is a
discernible difference between the author's attitude towards Iterable
and Sized -- they cared enough about Iterable to inherit from it. So I
think we should assume that *if* we asked them to add Sized as a base
class, they would add it second, thereby resolving the ambiguity in
favor of Iterable. And that's why I think that if they left Sized out,
we are doing them more of a favor by preferring Iterable than by
complaining.

Now on to your other examples.

>    Firstly,

[Off-topic English-major nit: it's better to write "first" instead of
"firstly". English is not a very consistent language. :-) See
http://www.randomhouse.com/wotd/index.pperl?date=20010629 for a
nuanced view that still ends up preferring "first".]

>    there is currently no obvious way for users to distinguish
>    between implicit

(which I called "inferred" above)

>    subclassing (via implementation) or subclassing by
>    `abc.register()`.

What do you mean by "no obvious way to distinguish"? That you can't
tell by merely looking at the MRO and calling isinstance/issubclass?
But you could look in the _abc_registry. Or you could scan the source
for register() calls.

TBH I would be fine if these received exactly the same status -- but
explicit inclusion in the MRO still ought to prevail.

>    This creates the dangerous situation where
>    backwards incompatibility introduced by switching between those ABC
>    registration mechanism is nearly impossible to debug.  Consider an
>    example: version A of a library has a type which only implicitly
>    subclasses an ABC. User code with singledispatch is created that
>    works with the current state of things. Version B of the same library
>    uses `ABC.register(Type)` and suddenly the dispatch changes without
>    any way for the user to see what's going on.  A similar example with
>    explicit subclassing and a different form of registration is easier
>    to debug, but not much, really.

So there are an infinite number of ways to break subtle code like this
by subtle changes in inheritance. I see that mostly as a fact of life,
not something we must contain at all cost. I suppose even with
explicit base classes it's not inconceivable that one developer's
"innocent" fix of a class hierarchy breaks another developer's code.
(See also http://xkcd.com/1172/ :-) So could "innocently" adding or
moving a __len__() method.

>    Secondly, it creates this awkward situation where dispatch for any
>    custom `MutableMapping` can be different from the dispatch for
>    `dict`.  Although the latter is a `MutableMapping` "only" by means of
>    `MutableMapping.register(dict)`, in reality the whole definition of
>    what a mutable mapping is comes from the behaviour found in `dict`.
>    Following your point of view, dict is less of a mutable mapping than
>    a custom type subclassing the ABC explicitly. You're saying the user
>    should "respect the choice of its author" but it's clearly suboptimal
>    here. I strongly believe I should be able to swap a mutable mapping
>    implementation with any other and get consistent dispatch.

Hmm... I need to make this situation explicit to think about it. I
think you are still looking at a case where there are exactly two
possible dispatches, one for Sized and one for Iterable, right? And
the issue is that if the user defines class Foo(MutableMapping), Sized
and Iterable appear explicitly in Foo's MRO, in that order (because MM
explicitly subclasses them in this order), so Sized wins.

But, hm, because there's a call to MutableMapping.register(dict) in
collections/abc.py, the way I see it, if the only dispatches possible
were on Sized and Iterable, Sized should still win, because it comes
first in MM's MRO. Now, if that MM.register(dict) call was absent, you
might be right. But it's there (and you mention it) so I'm not sure
what is going on here -- are you talking about a slightly different
example, or about a different rule than I am proposing?

>    Thirdly, while I can't support this with any concrete examples,
>    I have a gut feeling that considering all three ABC subclassing
>    mechanisms to be equally binding will end up as a toolkit with better
>    composability. The priority ordering on the other hand designs
>    `abc.register()` and implicit subclassing as inferior MRO wannabes.

Ok, when we're talking gut feelings, you're going to have a hard time
convincing others that your gut is more finely tuned to digesting
Python subtleties than mine. :-) Clearly my gut tells me that explicit
inclusion of an ABC in the list of bases is a stronger signal than
implicit subclassing; at least part of the reason why my gut feels
this way is that the C3 analysis is more strict about rejecting
ambiguous orderings outright than registered or inferred ABCs. (But my
gut agrees with yours that there's not much to break a tie between a
registered and an inferred ABC.)

>    Last but not least, the priority ordering will further complicate the
>    implementation of `functools._compose_mro()` and friends. While the
>    complexity of this code is not the reason of my stance on the matter,
>    I think it nicely shows how much the user has to keep in her head to
>    really know what's going on. Especially that we only consider this
>    ordering to be decisive on a single inheritance level.

The implementation of C3 is also complex, and understanding all its
rules is hard -- harder than what I had before. But it is a better
rule, and that's why we use it.

> 3. Why is the ordering MRO -> register -> implicit?

(Note: I already relented on the latter arrow.)

>    The reason I'm asking is that the whole existence of `abc.register()`
>    seems to come from the following:
>
>    * we want types that predate the creation of an ABC to be considered
>      its subclasses;

We certainly want it enough for isinstance/issubclass to work and for
an unambiguous dispatch to work. But we're not sure about the relative
ordering in all cases, because there's no order in the registry nor
for inferred ABCs, and that may affect when dispatch is considered
dispatch.

>    * we can't use implicit subclassing because either the existence of
>      methods in question is not enough (e.g. Mapping versus Sequence);
>      or the methods are added at runtime and don't appear in __dict__.
>
>    Considering the above, one might argue that the following order is
>    just as well justified: MRO -> implicit -> register. I'm aware that
>    the decision to put register first is because if the user is unhappy
>    with the dispatch, she can override the ordering by registering types
>    which were implicit before. But, while I can register third-party
>    types, I can't unregister any. In other words, I find this ordering
>    arbitrary.

So, if you can handle "MRO is stronger than registered or inferred" we
don't actually disagree on this. :-)

> I hope you don't perceive my position as stubborn, I just care enough to
> insist on this piece of machinery to be clearly defined and as simple as
> possible (but not simpler, sure).

Not at al. You may notice I enjoy the debate. :-)

History
Date	User	Action	Args
2013-06-30 00:12:20	gvanrossum	set	recipients: + gvanrossum, rhettinger, lukasz.langa, ecatmur
2013-06-30 00:12:19	gvanrossum	link	issue18244 messages
2013-06-30 00:12:18	gvanrossum	create