Issue 31145: PriorityQueue.put() fails with TypeError if priority_number in tuples of (priority_number, data) are the same.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/75328

classification

Title:	PriorityQueue.put() fails with TypeError if priority_number in tuples of (priority_number, data) are the same.
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Mikołaj Babiak, lisroach, ncoghlan, r.david.murray, rhettinger
Priority:	low	Keywords:	patch

Created on 2017-08-08 14:09 by Mikołaj Babiak, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 5153	merged	rhettinger, 2018-01-11 08:50

Messages (9)
msg299922 - (view)	Author: Mikołaj Babiak (Mikołaj Babiak)	Date: 2017-08-08 14:09
# list of tuples in form of (priority_number, data) bugged = [ (-25.691, {'feedback': 13, 'sentiment': 0.309, 'support_ticket': 5}), (-25.691, {'feedback': 11, 'sentiment': 0.309, 'support_ticket': 3}), (-25.0, {'feedback': 23, 'sentiment': 0.0, 'support_ticket': 15}), ] from queue import PriorityQueue pq = PriorityQueue() for item in bugged: pq.put(item) # TypeError: '<' not supported between instances of 'dict' and 'dict' It seems that if priority_numbers are equal, heapq.heapify() falls back to comparing data element from tuple (priority_number, data). I belive this is an undesired behaviour. It is acctually listed as one of implementation challenges on https://docs.python.org/3/library/heapq.html: "Tuple comparison breaks for (priority, task) pairs if the priorities are equal and the tasks do not have a default comparison order." In python 2.7 the issue in not present and PriorityQueue.put() works as expected
msg299929 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-08-08 14:51
I don't see any way to change PriorityQueue to fix this. The non-comparability of dicts likely won't be restored, nor will the lexicographic comparison order of tuples be likely to change. One possible way forward is to provide a wrapper with the desired comparison behavior: pq = PriorityQueue() pq.put(Prioritize(13, task)) task = pq.get().item where Prioritize is implemented something like this: import functools @functools.total_ordering class Prioritize: def __init__(self, priority, item): self.priority = priority self.item = item def __eq__(self, other): return self.priority == other.priority def __lt__(self, other): return self.priority < other.priority from queue import PriorityQueue, Empty from contextlib import suppress bugged = [ (-25.691, {'feedback': 13, 'sentiment': 0.309, 'support_ticket': 5}), (-25.691, {'feedback': 11, 'sentiment': 0.309, 'support_ticket': 3}), (-25.0, {'feedback': 23, 'sentiment': 0.0, 'support_ticket': 15}), ] pq = PriorityQueue() for priority, item in bugged: pq.put(Prioritize(priority, item)) with suppress(Empty): while True: item = pq.get_nowait().item print(item)
msg302870 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-09-24 17:18
Nick, what do you think about this proposal? The problem being solved is that decorated tuples no longer work well for controlling ordering (because so many types are now non-comparable). This provides a wrapper to control which fields are used in comparison. For implementation and API, there are several ways to do it, (accept separate arguments vs accept an existing sequence, standalone class vs a tuple subclass, pure python and/or c-implementation, etc).
msg302902 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-09-25 01:49
The main downside I see to that approach is that it would still require quite a few client code changes to restore compatibility for folks upgrading from 2.7, and even though six could add a "six.Prioritize" backport, it would still be difficult for automated tools to work out where such a wrapper would be appropriate. So I'm wondering whether it might be worth defining a heapq.compareitem helper that special cases tuples, such that heapq switched to using a slightly modified definition of tuple comparisons: def compareitem(lhs, rhs): """<= variant that ensures all tuples are orderable""" is not isinstance(lhs, tuple) or not isinstance(rhs, tuple): return lhs <= rhs # Compare tuples up to first unequal pair for lhs_item, rhs_item in zip(lhs, rhs): if lhs_item != rhs_item: try: return lhs_item < rhs_item except TypeError: pass break # All item pairs equal, or unorderable pair found return len(lhs) <= len(rhs) The key difference would be that if the heap-centric tuple comparison encounters a non-equal, unorderable pair of items, it would fall back to just comparing the tuple lengths (just as regular tuple comparison does when all item pairs are equal), rather than letting the TypeError propagate the way the default tuple comparison operator does. The heap invariant would change slightly such that "storage.sort(key=heapq.compareitem)" would reliably preserve the heap invariant without raising an exception, while "storage.sort()" might instead fail with TypeError.
msg302905 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-09-25 02:15
We already have recommendations in the heapq documentation on how to do a work-around. I'm looking at the more general problem of how can we make it easy once again to decorate a value with a sort value (not just for heaps but for anyplace where comparisons are made). I would like our preferred answer to be something better than, "take all your existing functions that use comparisons and make new variants that compute and cache key functions". Instead, I would rather, "keep your existing functions simple and just wrap your data in something that specifies comparison values that are computed just once". The old Schwartzian transform (decorate-compare-undecorate) had broad applicability but was effectively killed when a simple tuple no longer served for decoration. FWIW, the DataClass discussion has also ventured into this territory (the field definitions can specify whether or not a field is included in the rich comparison methods).
msg302911 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-09-25 04:46
My rationale for asking "What if we just changed heapq back to working closer to the way it used to work?" is that it's a case where arbitrarily ordering unorderable tuples made sense, and reverting it to the old behaviour is reasonably safe: - some Py3 heapq code that previously raised TypeError would start using an arbitrary ordering instead - Py2 heapq code would get a different arbitrary ordering in Py3, but it would still get an arbitrary ordering I don't feel especially strongly about that though, so if you prefer the approach of defining a new more explicit idiom to replace the old "make a tuple" one, I think a new wrapper type is a reasonable way to go, but using "Prioritize" as a name is probably too specific to the PriorityQueue use case. As a more generic name, "KeyedItem" might work: ``` @functools.total_ordering class KeyedItem: def __init__(self, key, item): self.key = key self.item = item def __eq__(self, other): return self.key == other.key def __lt__(self, other): return self.key < other.key ``` So applying an arbitrary key function would look like: decorated = [KeyedItem(key(v), v) for v in values] And if it was a tuple subclass, it would also work with APIs like the dict constructor.
msg303093 - (view)	Author: Lisa Roach (lisroach) *	Date: 2017-09-27 04:44
I think it is a good idea to have a simple way to add a value to sort on in general, it could have some interesting use-cases. Also I am with Nick a name change would make the broader scope clearer. What I am not sure about is making the comparison have to be between two items of the same wrapper type - since right now we are comparing priority to priority attributes. It makes sense for PriorityQueues, but if we wanted to use this for something more arbitrary like comparing a string with an int priority to an int we end up having to convert both data types to the new wrapper type: str_cmp = KeyedItem(20, 'cat') int_cmp = KeyedItem(30, 30) str_cmp < int_cmp I don't like having to convert to the new wrapper unless it's relevant, I'd rather do: str_cmp = KeyedItem(20, 'cat') str_cmp < 30 It could be instead: class KeyedItem: def __init__(self, key, item): self.key = key self.item = item def __eq__(self, other): if not isinstance(other, KeyedItem): return self.key == other return self.key == other.key def __lt__(self, other): if not isinstance(other, KeyedItem): return self.key < other return self.key < other.key ...
msg307646 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-12-05 07:08
FWIW, the new dataclasses module makes it really easy to create a wrapper: from dataclasses import dataclass, field from typing import Any @dataclass(order=True) class KeyedItem: key: int item: Any=field(compare=False) def f(): pass def g(): pass print(sorted([KeyedItem(10, f), KeyedItem(5, g)])) I'm thinking of just making an example in the docs and closing this out.
msg309839 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-01-12 06:06
New changeset 0c3be9651f4f149f4a78bb7043d26db9e75cabc0 by Raymond Hettinger in branch 'master': bpo-31145: Use dataclasses to create a prioritization wrapper (#5153) https://github.com/python/cpython/commit/0c3be9651f4f149f4a78bb7043d26db9e75cabc0

History
Date	User	Action	Args
2022-04-11 14:58:49	admin	set	github: 75328
2018-01-12 06:07:15	rhettinger	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2018-01-12 06:06:36	rhettinger	set	messages: + msg309839
2018-01-11 08:50:41	rhettinger	set	keywords: + patch stage: patch review pull_requests: + pull_request5008
2017-12-05 07:08:15	rhettinger	set	messages: + msg307646
2017-09-27 04:44:49	lisroach	set	nosy: + lisroach messages: + msg303093
2017-09-25 15:39:08	r.david.murray	set	nosy: + r.david.murray
2017-09-25 04:46:26	ncoghlan	set	messages: + msg302911
2017-09-25 02:15:27	rhettinger	set	messages: + msg302905
2017-09-25 01:49:15	ncoghlan	set	messages: + msg302902
2017-09-24 17:18:22	rhettinger	set	priority: normal -> low nosy: + ncoghlan messages: + msg302870
2017-08-08 14:51:06	rhettinger	set	versions: + Python 3.7, - Python 3.6 nosy: + rhettinger messages: + msg299929 assignee: rhettinger type: behavior -> enhancement
2017-08-08 14:09:13	Mikołaj Babiak	create