classification
Title: PriorityQueue.put() fails with TypeError if priority_number in tuples of (priority_number, data) are the same.
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Mikołaj Babiak, lisroach, ncoghlan, r.david.murray, rhettinger
Priority: low Keywords:

Created on 2017-08-08 14:09 by Mikołaj Babiak, last changed 2017-09-27 04:44 by lisroach.

Messages (7)
msg299922 - (view) Author: Mikołaj Babiak (Mikołaj Babiak) Date: 2017-08-08 14:09
# list of tuples in form of (priority_number, data)
bugged = [
(-25.691, {'feedback': 13, 'sentiment': 0.309, 'support_ticket': 5}), (-25.691, {'feedback': 11, 'sentiment': 0.309, 'support_ticket': 3}), (-25.0, {'feedback': 23, 'sentiment': 0.0, 'support_ticket': 15}),
]

from queue import PriorityQueue

pq = PriorityQueue()

for item in bugged:
   pq.put(item)

# TypeError: '<' not supported between instances of 'dict' and 'dict'

It seems that if priority_numbers are equal, heapq.heapify() falls back to comparing data element from tuple (priority_number, data).
I belive this is an undesired behaviour.
It is acctually listed as one of implementation challenges on https://docs.python.org/3/library/heapq.html:
"Tuple comparison breaks for (priority, task) pairs if the priorities are equal and the tasks do not have a default comparison order."

In python 2.7 the issue in not present and PriorityQueue.put() works as expected
msg299929 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-08 14:51
I don't see any way to change PriorityQueue to fix this.  The non-comparability of dicts likely won't be restored, nor will the lexicographic comparison order of tuples be likely to change.

One possible way forward is to provide a wrapper with the desired comparison behavior:

    pq = PriorityQueue()
    pq.put(Prioritize(13, task))
    task = pq.get().item
  
where Prioritize is implemented something like this:

import functools

@functools.total_ordering
class Prioritize:

    def __init__(self, priority, item):
        self.priority = priority
        self.item = item

    def __eq__(self, other):
        return self.priority == other.priority

    def __lt__(self, other):
        return self.priority < other.priority

from queue import PriorityQueue, Empty
from contextlib import suppress

bugged = [
(-25.691, {'feedback': 13, 'sentiment': 0.309, 'support_ticket': 5}), (-25.691, {'feedback': 11, 'sentiment': 0.309, 'support_ticket': 3}), (-25.0, {'feedback': 23, 'sentiment': 0.0, 'support_ticket': 15}),
]

pq = PriorityQueue()

for priority, item in bugged:
    pq.put(Prioritize(priority, item))
with suppress(Empty):
    while True:
        item = pq.get_nowait().item
        print(item)
msg302870 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-24 17:18
Nick, what do you think about this proposal?

The problem being solved is that decorated tuples no longer work well for controlling ordering (because so many types are now non-comparable).  This provides a wrapper to control which fields are used in comparison.

For implementation and API, there are several ways to do it, (accept separate arguments vs accept an existing sequence, standalone class vs a tuple subclass, pure python and/or c-implementation, etc).
msg302902 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-09-25 01:49
The main downside I see to that approach is that it would still require quite a few client code changes to restore compatibility for folks upgrading from 2.7, and even though six could add a "six.Prioritize" backport, it would still be difficult for automated tools to work out *where* such a wrapper would be appropriate.

So I'm wondering whether it might be worth defining a heapq.compareitem helper that special cases tuples, such that heapq switched to using a slightly modified definition of tuple comparisons:

    def compareitem(lhs, rhs):
        """<= variant that ensures all tuples are orderable"""
        is not isinstance(lhs, tuple) or not isinstance(rhs, tuple):
            return lhs <= rhs
        # Compare tuples up to first unequal pair
        for lhs_item, rhs_item in zip(lhs, rhs):
            if lhs_item != rhs_item:
                try:
                    return lhs_item < rhs_item
                except TypeError:
                    pass
                break
        # All item pairs equal, or unorderable pair found
        return len(lhs) <= len(rhs)

The key difference would be that if the heap-centric tuple comparison encounters a non-equal, unorderable pair of items, it would fall back to just comparing the tuple lengths (just as regular tuple comparison does when all item pairs are equal), rather than letting the TypeError propagate the way the default tuple comparison operator does.

The heap invariant would change slightly such that "storage.sort(key=heapq.compareitem)" would reliably preserve the heap invariant without raising an exception, while "storage.sort()" might instead fail with TypeError.
msg302905 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-25 02:15
We already have recommendations in the heapq documentation on how to do a work-around.   I'm looking at the more general problem of how can we make it easy once again to decorate a value with a sort value (not just for heaps but for anyplace where comparisons are made).

I would like our preferred answer to be something better than, "take all your existing functions that use comparisons and make new variants that compute and cache key functions".   Instead, I would rather, "keep your existing functions simple and just wrap your data in something that specifies comparison values that are computed just once". 

The old Schwartzian transform (decorate-compare-undecorate) had broad applicability but was effectively killed when a simple tuple no longer served for decoration.  

FWIW, the DataClass discussion has also ventured into this territory (the field definitions can specify whether or not a field is included in the rich comparison methods).
msg302911 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-09-25 04:46
My rationale for asking "What if we just changed heapq back to working closer to the way it used to work?" is that it's a case where arbitrarily ordering unorderable tuples made sense, and reverting it to the old behaviour is reasonably safe:

- some Py3 heapq code that previously raised TypeError would start using an arbitrary ordering instead
- Py2 heapq code would get a *different* arbitrary ordering in Py3, but it would still get an arbitrary ordering

I don't feel especially strongly about that though, so if you prefer the approach of defining a new more explicit idiom to replace the old "make a tuple" one, I think a new wrapper type is a reasonable way to go, but using "Prioritize" as a name is probably too specific to the PriorityQueue use case.

As a more generic name, "KeyedItem" might work:

```
@functools.total_ordering
class KeyedItem:

    def __init__(self, key, item):
        self.key = key
        self.item = item

    def __eq__(self, other):
        return self.key == other.key

    def __lt__(self, other):
        return self.key < other.key
```

So applying an arbitrary key function would look like:

    decorated = [KeyedItem(key(v), v) for v in values]

And if it was a tuple subclass, it would also work with APIs like the dict constructor.
msg303093 - (view) Author: Lisa Roach (lisroach) * Date: 2017-09-27 04:44
I think it is a good idea to have a simple way to add a value to sort on in general, it could have some interesting use-cases. Also I am with Nick a name change would make the broader scope clearer.

What I am not sure about is making the comparison have to be between two items of the same wrapper type - since right now we are comparing priority to priority attributes. 

It makes sense for PriorityQueues, but if we wanted to use this for something more arbitrary like comparing a string with an int priority to an int we end up having to convert both data types to the new wrapper type:

  str_cmp = KeyedItem(20, 'cat')
  int_cmp  = KeyedItem(30, 30)
  str_cmp < int_cmp


I don't like having to convert to the new wrapper unless it's relevant, I'd rather do:

  str_cmp = KeyedItem(20, 'cat')
  str_cmp < 30


It could be instead:

 class KeyedItem:
    def __init__(self, key, item):
         self.key = key
         self.item = item
     def __eq__(self, other):
         if not isinstance(other, KeyedItem):
             return self.key == other
         return self.key == other.key
            
     def __lt__(self, other):
	if not isinstance(other, KeyedItem):
            return self.key < other
        return self.key < other.key
     ...
History
Date User Action Args
2017-09-27 04:44:49lisroachsetnosy: + lisroach
messages: + msg303093
2017-09-25 15:39:08r.david.murraysetnosy: + r.david.murray
2017-09-25 04:46:26ncoghlansetmessages: + msg302911
2017-09-25 02:15:27rhettingersetmessages: + msg302905
2017-09-25 01:49:15ncoghlansetmessages: + msg302902
2017-09-24 17:18:22rhettingersetpriority: normal -> low
nosy: + ncoghlan
messages: + msg302870

2017-08-08 14:51:06rhettingersetversions: + Python 3.7, - Python 3.6
nosy: + rhettinger

messages: + msg299929

assignee: rhettinger
type: behavior -> enhancement
2017-08-08 14:09:13Mikołaj Babiakcreate