Message 387523 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mCoding
Recipients	mCoding
Date	2021-02-22.18:12:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1614017554.0.0.495039898854.issue43300@roundup.psfhosted.org>
In-reply-to

Content
Currently, the bisect module's functions all assume the user is maintaining a sorted list/sequence in increasing order. From the docs: "This module provides support for maintaining a list in sorted order without having to sort the list after each insertion" However, bisection is not limited to this use case, nor is it limited to increasing sequences. Many algorithms involve bisecting a sequence derived from (potentially immutable) user input, such as the standard longest-increasing-subsequence algorithm. Sometimes these derived sequences are naturally reverse-sorted, as they would be in the longest-decreasing-subsequence algorithm. (I have personally had to work around bisect not working with decreasing sequences in this case). There are other natural reasons for a reverse-sorted list to appear that one might want to bisect. Of course, a monotone decreasing function applied to a sorted list would result in a reverse-sorted one. One may want to use bisect as a root-finding algorithm for a decreasing function that may not be differentiable (or even continuous). Or a user may simply have chosen to reverse-sort using sorted(a, reverse=True). In my mind, the bisect module's purpose should be to support common use cases of bisection, not specifically to maintain a sorted list. So then the question arises, how to support reverse-sorted sequences? I see a few possible routes. 1. Add a "decreasing" parameter to bisect_left, bisect_right, (and perhaps insort_left, insort_right as well). 2. Add dedicated functions bisect_left_decreasing, bisect_right_decreasing, (and perhaps insort_left_decreasing, insort_right_decreasing as well) that operate on reverse-sorted sequences. 3. Add a more general bisect_first_false(a, pred, lo, hi) (equivalent to C++'s std::partition_point) that assumes the sequence to be partitioned with respect to the predicate function and returns the first index such that the predicate is falsey, or hi if no falsey elements are found. Then reverse-sorted lists can be bisected with something like bisect_first_false(a, lambda y: y > x). This way may be too general, but it becomes very obvious what operators will be called on the values and what the post-condition of the function is, similar to the condition of a while loop. 4. Add a "cmp" parameter to bisect_left, bisect_right, (and perhaps insort_left, insort_right as well) that keys are meant to be compared with, so one would pass bisect_left(a, x, cmp=operator.gt). It could be used in conjuction with "key", i.e. "cmp" is not meant to be a replacement for "key" as it was with sorted. 5. Do nothing. People who _really_ want to bisect reverse-sorted lists will write their own bisect_*_decreasing or figure out that if "a" is reverse-sorted, the following monstrosity does the job, making ab(use) of the new "key" parameter in 3.10 and due to the fact that False < True: a = ... # reverse-sorted x = ... # bisect left on reverse-sorted pred = lambda y: not (x < y) idx = bisect_left(a, x, key=pred) # bisect right on reverse-sorted pred = lambda y: y < x idx = bisect_right(a, x, key=pred) Commentary on the choices. 1 or 4 are the most convenient from a user perspective, no extra imports and discoverable behavior. 4 also has the benefit that the element/key class does not need to define a __lt__, which can be useful in cases where the element/key has many comparable attributes and it does not make sense to pick one to be used to define a __lt__. 3 would have the most generality and usefulness to library writers, who would probably be the primary users in need of a reverse-sorted bisection implementation. 2 suffers from combinatorial explosion (many new functions) and extra importing needed, but allows the implementation of existing functions to be unchanged, as well as mildly discouraging users from using it unless they _really_ want to use it. 5 will lead to many incorrect home-grown bisection searches or other hacks being used in the wild. Personally, I am somewhat indifferent between 1 and 4, perhaps slightly learning towards 4 as it makes explicit exactly how elements/keys will be compared. I would like to see 3 implemented as well, but admit that it is likely too general a solution for the proposed issue "support reverse-sorted lists".

Currently, the bisect module's functions all assume the user is maintaining a sorted list/sequence in increasing order. From the docs:

"This module provides support for maintaining a list in sorted order without having to sort the list after each insertion"

However, bisection is not limited to this use case, nor is it limited to increasing sequences. Many algorithms involve bisecting a sequence derived from (potentially immutable) user input, such as the standard longest-increasing-subsequence algorithm. Sometimes these derived sequences are naturally reverse-sorted, as they would be in the longest-decreasing-subsequence algorithm. (I have personally had to work around bisect not working with decreasing sequences in this case). There are other natural reasons for a reverse-sorted list to appear that one might want to bisect. Of course, a monotone decreasing function applied to a sorted list would result in a reverse-sorted one. One may want to use bisect as a root-finding algorithm for a decreasing function that may not be differentiable (or even continuous). Or a user may simply have chosen to reverse-sort using sorted(a, reverse=True).

In my mind, the bisect module's purpose should be to support common use cases of bisection, not specifically to maintain a sorted list.

So then the question arises, how to support reverse-sorted sequences? I see a few possible routes.

1. Add a "decreasing" parameter to bisect_left, bisect_right, (and perhaps insort_left, insort_right as well).

2. Add dedicated functions bisect_left_decreasing, bisect_right_decreasing, (and perhaps insort_left_decreasing, insort_right_decreasing as well) that operate on reverse-sorted sequences.

3. Add a more general bisect_first_false(a, pred, lo, hi) (equivalent to C++'s std::partition_point) that assumes the sequence to be partitioned with respect to the predicate function and returns the first index such that the predicate is falsey, or hi if no falsey elements are found. Then reverse-sorted lists can be bisected with something like bisect_first_false(a, lambda y: y > x). This way may be too general, but it becomes very obvious what operators will be called on the values and what the post-condition of the function is, similar to the condition of a while loop.

4. Add a "cmp" parameter to bisect_left, bisect_right, (and perhaps insort_left, insort_right as well) that keys are meant to be compared with, so one would pass bisect_left(a, x, cmp=operator.gt). It could be used in conjuction with "key", i.e. "cmp" is not meant to be a replacement for "key" as it was with sorted.

5. Do nothing. People who _really_ want to bisect reverse-sorted lists will write their own bisect_*_decreasing or figure out that if "a" is reverse-sorted, the following monstrosity does the job, making ab(use) of the new "key" parameter in 3.10 and due to the fact that False < True:

a = ... # reverse-sorted
x = ...

# bisect left on reverse-sorted
pred = lambda y: not (x < y)
idx = bisect_left(a, x, key=pred)

# bisect right on reverse-sorted
pred = lambda y: y < x
idx = bisect_right(a, x, key=pred)


Commentary on the choices. 1 or 4 are the most convenient from a user perspective, no extra imports and discoverable behavior. 4 also has the benefit that the element/key class does not need to define a __lt__, which can be useful in cases where the element/key has many comparable attributes and it does not make sense to pick one to be used to define a __lt__. 3 would have the most generality and usefulness to library writers, who would probably be the primary users in need of a reverse-sorted bisection implementation. 2 suffers from combinatorial explosion (many new functions) and extra importing needed, but allows the implementation of existing functions to be unchanged, as well as mildly discouraging users from using it unless they _really_ want to use it. 5 will lead to many incorrect home-grown bisection searches or other hacks being used in the wild.

Personally, I am somewhat indifferent between 1 and 4, perhaps slightly learning towards 4 as it makes explicit exactly how elements/keys will be compared. I would like to see 3 implemented as well, but admit that it is likely too general a solution for the proposed issue "support reverse-sorted lists".

History
Date	User	Action	Args
2021-02-22 18:12:34	mCoding	set	recipients: + mCoding
2021-02-22 18:12:33	mCoding	set	messageid: <1614017554.0.0.495039898854.issue43300@roundup.psfhosted.org>
2021-02-22 18:12:33	mCoding	link	issue43300 messages
2021-02-22 18:12:32	mCoding	create