classification
Title: Support heapfix() and heapremove() APIs in heapq module
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: rajathagasthya, rhettinger, stutzbach, tim.peters
Priority: normal Keywords: patch

Created on 2017-08-12 05:31 by rajathagasthya, last changed 2017-08-13 19:10 by rajathagasthya. This issue is now closed.

Files
File name Uploaded Description Edit
heapq_fix_remove.patch rajathagasthya, 2017-08-12 05:31 review
Messages (6)
msg300190 - (view) Author: Rajath Agasthya (rajathagasthya) * Date: 2017-08-12 05:31
I'd like to suggest a couple of utility methods for the heapq module which I think are useful:

1. heapfix() - Re-establishes heap invariant after item at 'any index' in the heap changes its value.

Currently, the way to fix the heap after an arbitrary item has changed is to just call heapify() on the entire heap. This is inefficient because heapify() tries to perform _siftup() operation on half of the items of the heap to maintain the heap invariant. Doing this is unnecessary because we know exactly which element (or index) was changed and became out of place. With a heapfix() function, we can just "fix" the heap from that position.

2. heapremove() - Removes an item at 'any index' from the heap and maintains heap invariant.

Pretty much same as above. If you remove an item from an arbitrary index, you have to call heapify() to restore the heap invariant.


Supporting these methods require minimal changes to the existing _siftup() and _siftdown() methods (basically just returning position values without changing any logic at all). I have a draft implementation which I've attached as a patch file here. If there's interest in this, I can write some tests and create a Github PR.

Thanks!
msg300210 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-13 04:59
-1 I don't think this is the right way to use heaps.  Also, I don't want to introduce any O(n) operations for fine-grained changes of a single element (part of the point of having a heap is to make fine-grained changes cheap).  

FWIW, it isn't common to change an element and then call heapify.  Instead, the usual approach is either mark an entry as invalid or keep a pending deletion list or sets.
msg300223 - (view) Author: Rajath Agasthya (rajathagasthya) * Date: 2017-08-13 17:04
Thanks, Raymond. I think it's fair comment, but I'm not sure which operation is O(n) though. FWIW, Go supports these operations (https://golang.org/pkg/container/heap/), so the usage is there.
msg300226 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-08-13 18:46
@Rajath, I suspect Raymond may have been bamboozled because your suggested `heapfix()` and `heapremove()` didn't appear to take any arguments.  If the index is a required argument, then these are O(log N) operations.

I thought about adding them years ago, but didn't find a _plausible_ use case:  the index at which a specific heap item appears is pretty much senseless, and typically varies over a heap's lifetime.  So how does the user know _which_ index to pass?  A linear search over the underlying list to find the index of a specific heap item is probably unacceptable.

The only hacks around that I could think of required a more complex kind of heap; e.g., restricting heap items to objects usable as dict keys, using a hidden dict to map heap items to their current index, and fiddling all the heap operations to keep that dict up to date.

So, in the absence of an obvious way to proceed, I gave up :-)
msg300227 - (view) Author: Rajath Agasthya (rajathagasthya) * Date: 2017-08-13 19:04
> The only hacks around that I could think of required a more complex kind of heap; e.g., restricting heap items to objects usable as dict keys, using a hidden dict to map heap items to their current index, and fiddling all the heap operations to keep that dict up to date.

Actually, this is *exactly* how I ended up thinking about this feature. I was using a separate dict to track the index at which objects are inserted in a heap, using the index to make updates to heap directly, and fixing the heap by calling heapify() every time which was slightly costly. Maybe I chose the wrong data structure for the problem, but constant time getMin() operation was crucial for me. And I didn't really care if inserts (or updates) were costlier. 

But I completely agree that it's a niche use case and I understand if we don't want to support this :)
msg300228 - (view) Author: Rajath Agasthya (rajathagasthya) * Date: 2017-08-13 19:10
And yes, both of these methods do take index as the argument as shown in my patch file. I should've probably specified that. 

The signatures are:

heapfix(heap, pos)
heapremove(heap, pos)
History
Date User Action Args
2017-08-13 19:10:31rajathagasthyasetmessages: + msg300228
2017-08-13 19:04:58rajathagasthyasetmessages: + msg300227
2017-08-13 18:46:55tim.peterssetnosy: + tim.peters
messages: + msg300226
2017-08-13 17:04:32rajathagasthyasetmessages: + msg300223
2017-08-13 04:59:13rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg300210

stage: resolved
2017-08-13 04:50:41rhettingersetassignee: rhettinger
2017-08-12 05:31:52rajathagasthyacreate