classification
Title: Make bisect.* functions accept an optional compare function
Type: enhancement Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: jonaskoelker, mciura, minmax, rhettinger
Priority: normal Keywords:

Created on 2005-04-18 19:26 by mciura, last changed 2006-11-22 18:50 by minmax. This issue is now closed.

Messages (7)
msg54448 - (view) Author: Marcin Ciura (mciura) Date: 2005-04-18 19:26
The usability of bisect.bisect_right, bisect.bisect_left,
bisect.insort_right and bisect.insort_left would increase
if they accepted an optional less_than function to compare
the keys.

Here is my use case: I have a sorted list of reversed words
and a parallel list of flags associated with these words
(taken from ispell). The list of reversed words is the one
searched; the flags are the result of a search.

Issue #1:  Now I cannot use simply a list of tuples
(rev_word,flags) and a custom less_than function that
compares only the first elements of two tuples.

Issue #2: When a word is not found in the list, I'd
like to make
an educated guess as to its flags (this makes more
sense in non-English languages, where e.g. infinitives
have a special ending), using bisect_left and bisect_right:

from bisect import *

less_than = lambda x,y: x[:3]<y[:3]
lp = bisect_left(word_list, given_rev_word, lt=less_than)
rp = bisect_right(word_list, given_rev_word, lt=less_than)
# return the union of flag_list[lp:rp]

An example (given_rev_word = 'abcpqr'):
word_list:
'abbx',
'abcaa', <- lp
'abcdd',
'abcss',
'abdf'   <- rp

Currently, the first search could be replaced with
lp = bisect_left(word_list, given_rev_word[:3])
but I can see only non-nice ways to replace the other
search.

Rolling my own class that stores a word and its flags,
with __lt__ depending on some global setting is not
thread-safe, and I find such a solution too heavyweight.

I hope that I expressed myself clearly enough. If not, let
me know, and I'll try to clarify my point.
msg54449 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-04-18 20:09
Logged In: YES 
user_id=80475

A few thoughts:

* If bisect provided an optional lt argument, it would still
not be thread-safe.  The code for the lt function can call
arbitrary Python code that runs through the eval-loop and is
hence subject to a thread-switch.

* An advantage of the class wrapper approach is that  the
prefix function gets computed just once per word.  This is
not a big deal for the specific case of [:3], but it could
be a significant benefit for other functions that are
expensive to compute (because repeated calls to bisect will
access the lt function more than once).

* My own approach to the particular use case would be to map
prefixes to a list of revwords or (revword, flag) pairs:
  d = dict(abb=['abbc'], abc=['abcaa', 'abcdd', 'abcss'],
abd=['abdf'])
This gives O(1) lookup time and limits the calls to the
prefix function to once per word.

Will leave this request open as it may yet be a good idea. 
My concern is that it will clutter the code and the API for
only a small benefit. 

Also, I'm looking at a more general purpose solution that
would make this feature request moot.  This idea is to
create a fast comparison decorator class used like this:

   dwordlist = map(ComparisonDecorator(lambda x: x[:3]),
wordlist)
   lp = bisect_left(dwordlist, given_rev_word)
   rp = bisect_right(dwordlist, given_rev_word)

msg54450 - (view) Author: Jonas Kölker (jonaskoelker) Date: 2005-04-28 20:13
Logged In: YES 
user_id=1153395

In a similar vein, I'd like to see that a `key' keyword
argument was added to bisect.* (and perhaps `reversed'
too)--it's really annoying to sort something by (not __lt__)
and not be able to bsearch it. It got added to min/max and
heapq.n(smallest|largest) in 2.5 fwiw ;)
msg54451 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-06-03 01:25
Logged In: YES 
user_id=80475

Overall, I'm -1 on this RFE.

The comparison to nsmallest() and nlargest() is inaccurate.
 They run start-to-finish in one function call.  The other
heapq methods do not use key functions because they have to
leave the original data structure unmolested between calls;
hence, there is no ability to limit the key function calls
to one per record.

Likewise, with this request, the key function calls get
wasted.  The sort() method calls key() for every record and
tosses the result afterwards.  Each subsequect call to
bisect() would need to repeat those calls for a log(N)
subset of the data.  Hence, accepting this request would
create an API that encourages a wasteful design.
msg54452 - (view) Author: Jonas Kölker (jonaskoelker) Date: 2005-06-10 23:44
Logged In: YES 
user_id=1153395

> [...] but it could be a significant benefit for other
functions that are expensive to compute (because repeated
calls to bisect will access the lt function more than once).

> Each subsequect call to bisect() would need to repeat
those calls for a log(N) subset of the data.

Which is exactly *why* I'm suggesting the key argument: to
save those extra calls to the key function.

Since that sounds counter-intuitive, let me explain: one
sorts (origkey(item), item) pairs, the bisects with
key=itemgetter(0), not calling the expensive origkey.
msg54453 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-06-11 02:26
Logged In: YES 
user_id=80475

The whole purpose of the key= argument was to avoid having
to build (key, record) tuples.  If those are going to be
built anyway, then there is little point to using key=.

Closing this because:
1) the original use case was better addressed through other
methods
2) key= for bisect does not provide the same benefits as it
does for other functions.
3) the most recent proposed use is far afield from what key=
is supposed to do
4) I think it is bad design and would encourage misguided
approaches.
msg54454 - (view) Author: Andraz Tori (minmax) Date: 2006-11-22 18:50
This really is annoying. If sort() accepts different comparison functions, so should bisect & co.

It makes things that should be easy, very hard.

If you are writting threaded programs, you know that functions passed to sort and bisect have to behave properly.
History
Date User Action Args
2005-04-18 19:26:50mciuracreate