Issue1619060
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006-12-19 21:14 by timehorse, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
bisect.py | timehorse, 2006-12-19 21:14 | Proposed change to bisect.py with C-Implementation disabled for testing |
Messages (7) | |||
---|---|---|---|
msg54953 - (view) | Author: Jeffrey C. Jacobs (timehorse) | Date: 2006-12-19 21:14 | |
The python and c implementation of bisect do not support custom-sorted lists using the list.sort method. In order to support an arbitrarily sorted list via sort(cmp, key, reverse), I have added 3 corresponding parameters to the bisect methods for bisection and insort (insert-sorted) corresponding to the parameters in sort. This would be useful if a list is initially sorted by its sort method and then the client wishes to maintain the sort order (or reverse-sort order) while inserting an element. In this case, being able to use the same, arbitrary binary function cmp, unary function key and boolean reverse flag to preserve the list order. The change imposes 3 new branch conditions and potential no-op function calls for when key is None. I have here implemented and partially tested the python implementation and if someone besides me would find this useful, I will update the _bisectmodule.c for this change as well. The Heap functions may also find use of an arbitrary predicate function so I may look at that later, but because bisect goes hand in hand with sorting, I wanted to tackle that first. |
|||
msg54954 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2006-12-19 21:43 | |
I'm -1 on this patch. At first blush it would seem nice to progagate sort's notion of a key= function; however, sort() is an all at once operation that can guarantee the function gets called only once per key. In contrast, bisect() is more granualar so consecutive calls may need to invoke the same key= function again and again. This is almost always the the-wrong-way-to-do-it (the key function should be precomputed and either stored separately or follow a decorate-sort pattern). By including custom sorting in bisect's API we would be diverting users away from better approaches. A better idea would be to create a recipe for a SortedList class that performed pre-calculated custom keys upon insertion and maintained an internal, decorated list. |
|||
msg54955 - (view) | Author: Karl Bartel (karlb) | Date: 2007-08-17 15:09 | |
+1 for adding a reverse parameter. |
|||
msg63489 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2008-03-12 22:40 | |
Guido, what do you think about this one? It is easy to do and has been requested several times in various forums. The seeming reasonable basis for the request is that sort and bisect should fit together like a nut and bolt -- it is somewhat odd to be able to sort a list by a key function but not be able to search or insert according to the same key. My main reservation is that the existence of the API would tend to encourage bad design. Elsewhere, the intent of the key function is to be cheaper than a cmp function (such as with sort() which calls the key function once per element instead of many times for cmp). But with successive bisect/insort calls, a key function could be called multiple times for the same value. The same issue also applies to heapq: http://bugs.python.org/issue1158313 and http://bugs.python.org/issue1162363 . |
|||
msg63491 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2008-03-12 23:14 | |
Sorry, I will claim ignorance on this one. I don't recall the last time I've used a bisection, but it was probably around the time bisect.py was first added to the standard library. I do recall using heap sort as a way to compute the top N items. I've always done that by explicitly mapping the raw data to a list of (key, value) tuples. The arguments worrying about bad design make some sense to me; consistency isn't always the panacea that people want it to be... |
|||
msg63503 - (view) | Author: Jeffrey C. Jacobs (timehorse) | Date: 2008-03-13 12:36 | |
Thanks Raymond for checking on the status of this issue and Guido for thinking about it too. I still support the basic concept for the reasons I specified in the original description, namely maintaining a sorted list (or random-access iterable). Thus, although in some ways it seems like syntactic sugar to make bisect and heap match the parameter list of sort, there is good reason to do so. That having been said, the re-computation of the /key/ function for typically the same values at each invocation of an insort_* method can potentially add a lot of otherwise unnecessary calls. So, having thought about this further, it seems to me that, somehow, the bisect/insort routines functions need to cache the results of each evaluation of key in the same way that sort can maintain that information in a local variable. Ideally, IMHO, the keys computed in sort could be cached and stored with the list upon which it was operated, but this is cumbersome, would not work for generic types and would be invalid if for some reason the sort key was different from the bisect key. Typically the later case would not occur but since it might lead to unexpected behavior it should definitely be avoided. Instead, I am now thinking that the only way we can safely maintain caching information is to implement a persistent object to contain it that can be shared between each invocation of bisect/insort so as to not force re-evaluation of the same key argument. This clearly could be done with a class. So I would propose that a bisect class be created that would hold the list, the current key function and the cached key values. Of course, special care would need to be taken to take into account mutable sequences that could be changed between bisect calls. The class could be implements in a number of ways. For instance, it could create a callable instance that accepts a list and other parameters. It will then look at its state, including a check of /is/ with the input and the last list operated upon, and if that test fails, the 'key cache' is invalidated. Next, if a key method is provided, it is checked against the last key function used (/is/) and if it is not, the 'key cache' is invalidated. Then, to prevent mutability problems, the 'key cache' would actually be implemented as a dictionary mapping item to key-value and each time a key needed to be computed, the input would first be checked against the cache and only if it was not there would the key function be called. This should allow for preventing undefined behavior when handling invalid cases. Of course, if the client wishes to use bisect on 2 different lists or 2 different keys, sie could always just create 2 different bisect objects. There are of course other ways to accomplish this, such as binding the bisect object to a list and key at construction that cannot be reset (at least, should not be) once invoked. I am also wary of implementing the 'key cache' as a dictionary as it adds a hash-table lookup which is already potentially expensive. Ideally, if the bisect object could force its associated list to be immutable for the life of the bisect, this would get around the problem of external inserts and deletes that would invalidate the 'key cache'. Obviously, there may be another way to define such a class but I think the only way you can safely make sure the bisect functions don't unnecessarily compute key without modifying the list is to allow for the caching of values between invocations, and the only way you can do that safely is by putting the cache in some helper class. Of course, a class is only needed for when key is used. comp and reverse, AFAIK, are not cached when sorting. So the existing, generic bisect algorithms would still be useful and in fact, you'd want the bisect class to be optional, IMHO. Of course, Guido's suggestion of just mapping your list into a list of (key, value) tuples is a good workaround for just the key case (assuming key, comp and reverse flags are not all used at the same time for sorting). It probably should be a cookbook item that could be documented with the bisect (and heap) libraries. It can't completely solve the problem, but it will solve some conditions in the short term. Long term, I still think we should consider bisect, potentially with a class definition. |
|||
msg64402 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2008-03-24 08:31 | |
Thanks for the thoughtful reply. Am going to close for the reasons agreed by most of the respondants. See additional explanation in the closing of issue 1451588 . That all-at-once action of sort() allows it to take advantage of a single key-function call per element; in contrast, successive calls to bisect functions are more finely grained and the key-function doesn't make sense anymore. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:21 | admin | set | github: 44363 |
2008-03-24 08:31:04 | rhettinger | set | status: open -> closed resolution: rejected messages: + msg64402 |
2008-03-13 12:36:25 | timehorse | set | messages: + msg63503 |
2008-03-12 23:14:52 | gvanrossum | set | assignee: gvanrossum -> rhettinger messages: + msg63491 |
2008-03-12 22:40:55 | rhettinger | set | assignee: rhettinger -> gvanrossum messages: + msg63489 nosy: + gvanrossum |
2006-12-19 21:14:15 | timehorse | create |