Message63503
Thanks Raymond for checking on the status of this issue and Guido for
thinking about it too. I still support the basic concept for the
reasons I specified in the original description, namely maintaining a
sorted list (or random-access iterable). Thus, although in some ways it
seems like syntactic sugar to make bisect and heap match the parameter
list of sort, there is good reason to do so.
That having been said, the re-computation of the /key/ function for
typically the same values at each invocation of an insort_* method can
potentially add a lot of otherwise unnecessary calls.
So, having thought about this further, it seems to me that, somehow, the
bisect/insort routines functions need to cache the results of each
evaluation of key in the same way that sort can maintain that
information in a local variable. Ideally, IMHO, the keys computed in
sort could be cached and stored with the list upon which it was
operated, but this is cumbersome, would not work for generic types and
would be invalid if for some reason the sort key was different from the
bisect key. Typically the later case would not occur but since it might
lead to unexpected behavior it should definitely be avoided.
Instead, I am now thinking that the only way we can safely maintain
caching information is to implement a persistent object to contain it
that can be shared between each invocation of bisect/insort so as to not
force re-evaluation of the same key argument. This clearly could be
done with a class.
So I would propose that a bisect class be created that would hold the
list, the current key function and the cached key values. Of course,
special care would need to be taken to take into account mutable
sequences that could be changed between bisect calls.
The class could be implements in a number of ways. For instance, it
could create a callable instance that accepts a list and other
parameters. It will then look at its state, including a check of /is/
with the input and the last list operated upon, and if that test fails,
the 'key cache' is invalidated. Next, if a key method is provided, it
is checked against the last key function used (/is/) and if it is not,
the 'key cache' is invalidated. Then, to prevent mutability problems,
the 'key cache' would actually be implemented as a dictionary mapping
item to key-value and each time a key needed to be computed, the input
would first be checked against the cache and only if it was not there
would the key function be called. This should allow for preventing
undefined behavior when handling invalid cases. Of course, if the
client wishes to use bisect on 2 different lists or 2 different keys,
sie could always just create 2 different bisect objects. There are of
course other ways to accomplish this, such as binding the bisect object
to a list and key at construction that cannot be reset (at least, should
not be) once invoked. I am also wary of implementing the 'key cache' as
a dictionary as it adds a hash-table lookup which is already potentially
expensive. Ideally, if the bisect object could force its associated
list to be immutable for the life of the bisect, this would get around
the problem of external inserts and deletes that would invalidate the
'key cache'.
Obviously, there may be another way to define such a class but I think
the only way you can safely make sure the bisect functions don't unnecessarily compute key without modifying the list is to allow for the
caching of values between invocations, and the only way you can do that
safely is by putting the cache in some helper class. Of course, a class
is only needed for when key is used. comp and reverse, AFAIK, are not
cached when sorting. So the existing, generic bisect algorithms would
still be useful and in fact, you'd want the bisect class to be optional,
IMHO.
Of course, Guido's suggestion of just mapping your list into a list of
(key, value) tuples is a good workaround for just the key case (assuming
key, comp and reverse flags are not all used at the same time for
sorting). It probably should be a cookbook item that could be
documented with the bisect (and heap) libraries. It can't completely
solve the problem, but it will solve some conditions in the short term.
Long term, I still think we should consider bisect, potentially with a
class definition. |
|
Date |
User |
Action |
Args |
2008-03-13 12:36:26 | timehorse | set | spambayes_score: 0.00241025 -> 0.0024102472 recipients:
+ timehorse, gvanrossum, rhettinger, karlb |
2008-03-13 12:36:25 | timehorse | set | spambayes_score: 0.00241025 -> 0.00241025 messageid: <1205411785.77.0.785708243016.issue1619060@psf.upfronthosting.co.za> |
2008-03-13 12:36:25 | timehorse | link | issue1619060 messages |
2008-03-13 12:36:22 | timehorse | create | |
|