This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Define a new __key__ protocol
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: cheryl.sabella, cvrebert, josh.r, martin.panter, ncoghlan, pitrou, rhettinger, scoder
Priority: normal Keywords:

Created on 2014-02-15 00:17 by ncoghlan, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (11)
msg211253 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-15 00:17
This is an idea that would require a PEP, just writing it down here as a permanent record in case someone else wants to run with it.

Currently, the *simplest* way to define a non-identity total ordering on an immutable object is to define __hash__, __eq__ and __lt__ appropriately, and then use functools.total_ordering to add the other comparison methods.

However, many such implementations follow a very similar pattern:

    def __hash__(self):
        return hash(self._calculate_key())
    def __eq__(self, other):
        if isinstance(other, __class__):
            return self._calculate_key() == other._calculate_key()
        return NotImplemented
    def __lt__(self, other):
        if isinstance(other, __class__):
            return self._calculate_key() < other._calculate_key()
        return NotImplemented

A "__key__" protocol as an inherent part of the type system could greatly simplify that:

    def __key__(self):
        return self._calculate_key()

The interpreter would then derive appropriate implementations for __hash__ and all the rich comparison methods based on that key calculation and install them when the type object was created.

If the type is mutable (and hence orderable but not hashable), then setting "__hash__ = None" would disable the implicit hashing support (just as it can already be used to explicitly disable hash inheritance).

(Inspired by Chris Withers's python-dev thread: https://mail.python.org/pipermail/python-dev/2014-February/132332.html)
msg211254 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-15 00:26
Note: in conjuction with a class decorator (along the lines of functools.total_ordering), this idea is amenable to experimentation as a third party module. However, any such third party module shouldn't use a reserved name like __key__ - a public name like "calculate_key" would be more appropriate.
msg211890 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-21 23:21
This is a very nice idea, but does it have to be part of the interpreter core, or could it simply be supplied by a decorator in the functools module?

(the main advantage of having it in the interpreter is speed)
msg211926 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-22 14:14
I suspect it could just be a class decorator (along the lines of
total_ordering), and it should certainly be prototyped on PyPI as such a
decorator (using a different name for the key calculating method). If it
eventually happened, elevation to a core protocol would really be about
defining this as being *preferred* in the cases where it applies, and
that's a fairly weak basis for changing the type constructor.
msg311567 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-02-03 17:57
I wonder if this would make sense as a parameter to dataclass now.
msg312096 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-02-13 02:48
For now, I'm going to close this as "out of date", with the guidance being "Define a data class instead" (since that gets rid of the historical boilerplate a different way: auto-generating suitable methods based on the field declarations).

If somebody comes up with a use case for this protocol idea that isn't adequately covered by data classes, then they can bring it up on python-ideas, and we can look at revisiting the question.
msg312100 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-02-13 04:12
Do data classes let you define some fields as being excluded from the equality/ordering/hashing? I got the impression that if a field existed, it was part of the "key" no matter what, which isn't necessarily correct in the general case. Simple examples would be attributes that equivalent C++ would tag with the mutable keyword; they're not part of the logical state of the instance (e.g. debugging counters or whatever), so they shouldn't be included in the "key".
msg312101 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-02-13 04:17
Ah, never mind. Looks like dataclasses.InitVar fields seem to be the answer to excluding a field from the auto-generated methods.
msg312106 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-02-13 07:27
It isn't InitVar that you want for that use case (that's just for passing extra information to __post_init__).

Instead, you want:

    extra_field = field(compare=False): int # Excluded from __hash__, __eq_, etc

You can also exclude a field from __hash__, but keep it in the comparison methods:

    unhashed_field = field(hash=False): int # Excluded from __hash__ only
msg312130 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-02-13 16:09
Thanks, Nick.

When I first came across this issue, I thought that dataclasses would take care of what you wrote below, but after looking at the original discussion on python-dev, I thought the problem was ordering None within a comparison with None being a valid value in SQLite.

For example,
>>> a = [1, None, 'a']
>>> b = [1, 5, 'b']
>>> a == b
False
>>> a < b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'int'
msg312178 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-02-14 18:08
Allowing for None-first and None-last ordering is a fair use case, but I'm not sure a __key__ protocol is the right answer to that - as your own example shows, it gets tricky when dealing with nested containers.

It may make sense to raise the question on python-ideas for Python 3.8+, though, with Python-side ordering of database records as the main motivating use case.
History
Date User Action Args
2022-04-11 14:57:58adminsetgithub: 64831
2018-02-14 18:08:28ncoghlansetmessages: + msg312178
2018-02-13 16:09:06cheryl.sabellasetmessages: + msg312130
2018-02-13 07:27:18ncoghlansetmessages: + msg312106
2018-02-13 04:17:33josh.rsetmessages: + msg312101
2018-02-13 04:12:06josh.rsetmessages: + msg312100
2018-02-13 02:48:16ncoghlansetstatus: open -> closed
resolution: out of date
messages: + msg312096

stage: resolved
2018-02-05 04:37:11rhettingersetnosy: + rhettinger
2018-02-03 17:57:09cheryl.sabellasetnosy: + cheryl.sabella

messages: + msg311567
versions: + Python 3.8, - Python 3.5
2014-03-06 22:44:32josh.rsetnosy: + josh.r
2014-02-22 14:14:43ncoghlansetmessages: + msg211926
2014-02-21 23:21:10pitrousetnosy: + pitrou
messages: + msg211890
2014-02-21 21:14:24scodersetnosy: + scoder
2014-02-21 19:34:11cvrebertsetnosy: + cvrebert
2014-02-17 02:13:43martin.pantersetnosy: + martin.panter
2014-02-15 00:26:02ncoghlansetmessages: + msg211254
2014-02-15 00:17:55ncoghlancreate