Title: Support heapq on typed arrays?
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: da, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-05-21 21:11 by da, last changed 2018-05-22 19:09 by rhettinger. This issue is now closed.

Messages (6)
msg317250 - (view) Author: Diego Argueta (da) * Date: 2018-05-21 21:11
It'd be really great if we could have support for using the `heapq` module on typed arrays from `array`. For example:

import array
import heapq
import random

a = array.array('I', (random.randrange(10) for _ in range(10)))

Right now this code throws a TypeError:

    TypeError: heap argument must be a list

I suppose I could use `bisect` to insert items one by one but I imagine a single call to heapify() would be more efficient, especially if I'm loading the array from a byte string.

From what I can tell the problem lies in the C implementation, since removing the _heapq imports at the end of the heapq module (in 3.6) makes it work.
msg317303 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-05-22 15:53
I don't think we should go down this path.  The efficiency of the C implementation depends on it being tightly coupled to lists.  This tool is used in the schedulers of various async tools (such as Tornando), used for merge(), nsmallest(), and nlargest() all of which depend on this foundational tool being very fast.

Also, I question whether it makes sense at all to be heapifying numpy arrays using standard library tooling.  It numpy arrays actually needed this and needed for it to be efficient, it would need to be implemented natively in numpy.
msg317311 - (view) Author: Diego Argueta (da) * Date: 2018-05-22 17:39
I was referring to the C arrays in the Python standard library:
msg317315 - (view) Author: Diego Argueta (da) * Date: 2018-05-22 17:49
However I do see your point about the speed.
msg317317 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-22 18:06

alist = list(a)
a[:] = alist

And it should be not much slower than using heapq.heapify() directly if it could support general sequences. Using it with array.array would add significant overhead due to boxing.
msg317327 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-05-22 19:09
As noted by Serhiy, the interaction with the Array type would incur significant overhead.  Your fastest approach will be to follow his suggest to first convert to a list and then perform heap manipulations.

Marking this as closed.  Thank you for the suggestion.
Date User Action Args
2018-05-22 19:09:53rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg317327

stage: resolved
2018-05-22 18:06:28serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg317317
2018-05-22 17:49:20dasetmessages: + msg317315
2018-05-22 17:39:04dasetmessages: + msg317311
2018-05-22 15:53:20rhettingersetnosy: + rhettinger

messages: + msg317303
versions: + Python 3.8, - Python 2.7, Python 3.6
2018-05-21 21:11:04dacreate