Message 191535 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lukasz.langa
Recipients	ecatmur, lukasz.langa
Date	2013-06-20.19:30:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1371756603.69.0.503207353444.issue18244@psf.upfronthosting.co.za>
In-reply-to

Content
I meditated on the nature of the problem. A big issue that Edward pointed out is that at times the computed MRO depends on haystack ordering. We should fix that. However, my conclusion is that whatever home-grown algorithm we come up with, it will still be unintuitive for the end user. We should rather adopt C3 linearisation, which Python uses for calculating the method resolution order. Unfortunately, it doesn't specify what to do with virtual bases (either registered or implied). To solve that we need to extend the algorithm to insert the virtual bases in a "correct" place. We have two issues here: 1. How can we determine where this "correct" place is? 2. How can we determine which order unrelated abstract classes should be put in? The correct place for a virtual base class to appear in the MRO is on the level where the functionality it describes first appears. For example: >>> from collections import * >>> class A(object): pass ... >>> class B(A): ... def __len__(self): ... return 0 # implies Sized ... >>> @Container.register ... class C(object): pass ... >>> class D(object): pass # unrelated ... >>> class X(D, C, B): ... def __call__(self): ... return 0 # implies Callable ... If we calculate the C3 MRO for strict bases only, we get the obvious: >>> from functools import _c3_mro >>> _c3_mro(X) [<class '__main__.X'>, <class '__main__.D'>, <class '__main__.C'>, <class '__main__.B'>, <class '__main__.A'>, <class 'object'>] If we want to insert related abstract base classes, they consistently appear right after the class where functionality has been added (note that the order of the `abcs` here doesn't matter): >>> _c3_mro(X, abcs=[Sized, Callable, Container]) [<class '__main__.X'>, <class 'collections.abc.Callable'>, <class '__main__.D'>, <class '__main__.C'>, <class 'collections.abc.Container'>, <class '__main__.B'>, <class 'collections.abc.Sized'>, <class '__main__.A'>, <class 'object'>] This linearisation might be somewhat surprising but I believe it's the correct way. In fact it's only surprising because it exposes ambiguous dispatch when a class virtually subclasses multiple ABCs. For example, having a simple generic function: >>> from functools import singledispatch >>> from collections import * >>> @singledispatch ... def f(arg): ... return "base" ... >>> f.register(Iterable, lambda arg: "iterable") >>> f.register(Sized, lambda arg: "sized") >>> f.register(Set, lambda arg: "set") this class evaluates reasonably: >>> class C(Sized): ... def __len__(self): ... return 0 ... >>> f(C()) 'sized' However, when we register it for another ABC, there has to be a conflict: >>> Iterable.register(C) <class '__main__.C'> >>> f(C()) Traceback (most recent call last): ... RuntimeError: Ambiguous dispatch: <class 'collections.abc.Iterable'> or <class 'collections.abc.Sized'> It doesn't matter that Sized appears explicitly in the MRO. Class C is-a Sized just as well as it is-a Iterable. However, if we also register it to be a Set as well (which subclasses both Iterable and Sized), this solves the conflict: >>> Set.register(C) <class '__main__.C'> >>> f(C()) 'set' Same goes for the case described in PEP 443 where both ABCs appear explicitly in the MRO. No conflict here becase MRO orders both bases unambiguously: >>> class D(Sized, Iterable): ... def __len__(self): ... return 0 ... def __iter__(self): ... return iter([]) ... >>> f(D()) 'sized' The answer to the second question ("How can we determine in which order to put unrelated abstract classes?") is that we can't in general. There's fortunately a common special case that can help here: ABCs often have subclasses which are also bases of the type we're building the MRO for. For example: when inserting Iterable and Sized to the MRO of dict we can derive the correct order from the fact that both Iterable and Sized have a subclass called Mapping which also happens to be a base of dict. This trick cannot be used to fix every case but it's good enough to handle many. Summing up, the new algorithm is an extension of C3, which makes it much more predictable in edge cases. ABCs are introduced in a way that is much more deterministic (and when it depends on haystack ordering it always ends up as a RuntimeError anyway). My only real gripe with the new algorithm is that it's much larger codewise. On the other hand the ABC-aware C3 implementation could be useful for other purposes as well. Nick, Guido, this needs a serious review. Took me a while to get it right.

I meditated on the nature of the problem. A big issue that Edward pointed out
is that at times the computed MRO depends on haystack ordering. We should fix
that. However, my conclusion is that whatever home-grown algorithm we come up
with, it will still be unintuitive for the end user.

We should rather adopt C3 linearisation, which Python uses for calculating the
method resolution order. Unfortunately, it doesn't specify what to do with
virtual bases (either registered or implied). To solve that we need to extend
the algorithm to insert the virtual bases in a "correct" place. We have two
issues here:

1. How can we determine where this "correct" place is?
2. How can we determine which order unrelated abstract classes should be put
   in?

The correct place for a virtual base class to appear in the MRO is on the level
where the functionality it describes first appears. For example:

  >>> from collections import *
  >>> class A(object): pass
  ...
  >>> class B(A):
  ...   def __len__(self):
  ...     return 0   # implies Sized
  ...
  >>> @Container.register
  ... class C(object): pass
  ...
  >>> class D(object): pass   # unrelated
  ...
  >>> class X(D, C, B):
  ...   def __call__(self):
  ...     return 0   # implies Callable
  ...

If we calculate the C3 MRO for strict bases only, we get the obvious:

  >>> from functools import _c3_mro
  >>> _c3_mro(X)
  [<class '__main__.X'>, <class '__main__.D'>, <class '__main__.C'>,
   <class '__main__.B'>, <class '__main__.A'>, <class 'object'>]

If we want to insert related abstract base classes, they consistently appear
right after the class where functionality has been added (note that the order
of the `abcs` here doesn't matter):

  >>> _c3_mro(X, abcs=[Sized, Callable, Container])
  [<class '__main__.X'>, <class 'collections.abc.Callable'>, <class '__main__.D'>,
   <class '__main__.C'>, <class 'collections.abc.Container'>, <class '__main__.B'>,
   <class 'collections.abc.Sized'>, <class '__main__.A'>, <class 'object'>]

This linearisation might be somewhat surprising but I believe it's the correct
way. In fact it's only surprising because it exposes ambiguous dispatch when a
class virtually subclasses multiple ABCs. For example, having a simple generic
function:

  >>> from functools import singledispatch
  >>> from collections import *
  >>> @singledispatch
  ... def f(arg):
  ...   return "base"
  ...
  >>> f.register(Iterable, lambda arg: "iterable")
  >>> f.register(Sized, lambda arg: "sized")
  >>> f.register(Set, lambda arg: "set")

this class evaluates reasonably:

  >>> class C(Sized):
  ...   def __len__(self):
  ...     return 0
  ...
  >>> f(C())
  'sized'

However, when we register it for another ABC, there has to be a conflict:

  >>> Iterable.register(C)
  <class '__main__.C'>
  >>> f(C())
  Traceback (most recent call last):
  ...
  RuntimeError: Ambiguous dispatch: <class 'collections.abc.Iterable'> or
                                    <class 'collections.abc.Sized'>

It doesn't matter that Sized appears explicitly in the MRO. Class C is-a Sized
just as well as it is-a Iterable. However, if we also register it to be a Set
as well (which subclasses both Iterable and Sized), this solves the conflict:

  >>> Set.register(C)
  <class '__main__.C'>
  >>> f(C())
  'set'

Same goes for the case described in PEP 443 where both ABCs appear explicitly
in the MRO. No conflict here becase MRO orders both bases unambiguously:

  >>> class D(Sized, Iterable):
  ...   def __len__(self):
  ...     return 0
  ...   def __iter__(self):
  ...     return iter([])
  ...
  >>> f(D())
  'sized'

The answer to the second question ("How can we determine in which order to put
unrelated abstract classes?") is that we can't in general. There's fortunately
a common special case that can help here: ABCs often have subclasses which are
also bases of the type we're building the MRO for. For example: when inserting
Iterable and Sized to the MRO of dict we can derive the correct order from the
fact that both Iterable and Sized have a subclass called Mapping which also
happens to be a base of dict. This trick cannot be used to fix every case but
it's good enough to handle many.

Summing up, the new algorithm is an extension of C3, which makes it much more
predictable in edge cases. ABCs are introduced in a way that is much more
deterministic (and when it depends on haystack ordering it always ends up as
a RuntimeError anyway). My only real gripe with the new algorithm is that it's
much larger codewise. On the other hand the ABC-aware C3 implementation could
be useful for other purposes as well. 

Nick, Guido, this needs a serious review. Took me a while to get it right.

History
Date	User	Action	Args
2013-06-20 19:30:04	lukasz.langa	set	recipients: + lukasz.langa, ecatmur
2013-06-20 19:30:03	lukasz.langa	set	messageid: <1371756603.69.0.503207353444.issue18244@psf.upfronthosting.co.za>
2013-06-20 19:30:03	lukasz.langa	link	issue18244 messages
2013-06-20 19:30:03	lukasz.langa	create