Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "key" argument to "bisect" module functions #48606

Closed
tebeka mannequin opened this issue Nov 19, 2008 · 43 comments
Closed

Add "key" argument to "bisect" module functions #48606

tebeka mannequin opened this issue Nov 19, 2008 · 43 comments
Assignees
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@tebeka
Copy link
Mannequin

tebeka mannequin commented Nov 19, 2008

BPO 4356
Nosy @rhettinger, @terryjreedy, @gpshead, @tebeka, @mdickinson, @alex, @vadmium, @dimaqq, @1st1, @NeilGirdhar, @MojoVampire, @wumpus, @pablogsal, @miss-islington, @remilapeyre, @GPery
PRs
  • bpo-4356: Add key parameter to functions in bisect module #11781
  • bpo-4356: Add key parameter to functions in bisect module #11781
  • bpo-4356: Add key function support to the bisect module #20556
  • bpo-4356: Mention the new key arguments for the bisect module APIs in the 3.10 What's new #28339
  • [3.10] bpo-4356: Mention the new key arguments for the bisect module APIs in the 3.10 What's new (GH-28339) #28340
  • Files
  • bisect-trunk.patch: bisect for trunk
  • bisect-py3k.patch: patch for py3k
  • bench_bisect_key.py: benchmark
  • SortedCollection.py: Draft for a Sorted Collection class
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2020-10-20.05:04:39.386>
    created_at = <Date 2008-11-19.17:17:27.762>
    labels = ['type-feature', 'library', '3.10']
    title = 'Add "key" argument to "bisect" module functions'
    updated_at = <Date 2021-10-04.19:18:43.811>
    user = 'https://github.com/tebeka'

    bugs.python.org fields:

    activity = <Date 2021-10-04.19:18:43.811>
    actor = 'pablogsal'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2020-10-20.05:04:39.386>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2008-11-19.17:17:27.762>
    creator = 'tebeka'
    dependencies = []
    files = ['15248', '15249', '15250', '16944']
    hgrepos = []
    issue_num = 4356
    keywords = ['patch']
    message_count = 43.0
    messages = ['76060', '76073', '76084', '76092', '76098', '76099', '76203', '76230', '94836', '103143', '103145', '103155', '103158', '103180', '103294', '113222', '115176', '115189', '115288', '152967', '152969', '152983', '208020', '237408', '242267', '246371', '246384', '261007', '305518', '335841', '337737', '337741', '343529', '343544', '343548', '343549', '345220', '345227', '345228', '379086', '401796', '401798', '403173']
    nosy_count = 22.0
    nosy_names = ['rhettinger', 'terry.reedy', 'gregory.p.smith', 'jafo', 'tebeka', 'mark.dickinson', 'alex', 'milko.krachounov', 'dmtr', 'bls', 'martin.panter', 'Dima.Tisnek', 'yselivanov', 'NeilGirdhar', 'josh.r', 'ericreynolds', 'wumpus', 'pablogsal', 'miss-islington', 'remi.lapeyre', 'gpery', 'flyingosprey']
    pr_nums = ['11781', '11781', '20556', '28339', '28340']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue4356'
    versions = ['Python 3.10']

    @tebeka
    Copy link
    Mannequin Author

    tebeka mannequin commented Nov 19, 2008

    It'd be helpful of the functions in the "bisect" modules will have a
    "key" argument just like "sort".

    @tebeka tebeka mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Nov 19, 2008
    @mdickinson
    Copy link
    Member

    This request has come up repeatedly (and been rejected) in the past. See issues 2954, 3374, 1185383, 1462228, 1451588, 1619060.

    Could you perhaps explain your particular use case for this? A few truly
    convincing use-cases might increase the chances of this getting accepted.

    @rhettinger
    Copy link
    Contributor

    Miki, the issue is that bisect calls tend to be made repeatedly, so the
    key function can be called over and over again for the same argument.
    It is almost always a better design to simply decorate the list so the
    key function never gets called more than once per element in the list.
    If we added key= to bisect, it would encourage bad design and steer
    people after from better solutions.

    @mdickinson
    Copy link
    Member

    What about cases where performance is unimportant, or where the key
    function is fast (e.g. an attribute access)? Then something like

    bisect(a, x, key=attrgetter('size'))

    is easy to write and read. Mightn't this be considered good design,
    from some perspectives?

    Another thought: if your list is a list of user-defined objects then a
    natural way to do the 'decorate' step of DSU might be to add a 'key'
    attribute to each object, rather than the usual method of constructing
    pairs. (This has the advantage that you might not have to bother with the
    'undecorate' step.) With a key argument, bisect could make use of this
    technique too.

    Disclaimer: I haven't personally had any need for a key argument on
    bisect, so all this is hypothetical. That's why I'm asking for real use-
    cases.

    @rhettinger
    Copy link
    Contributor

    I had said "almost always". Sure, if you don't care about performance
    or scalability, a key= argument would be a net win.

    We're responsible for creating an API that steers most programmers in
    the right direction (Tim sez "we read Knuth so you don't have to").
    Algorithmically, the bisect functions are at the wrong level of
    granularity for applying a key function.

    For user-defined objects, there is no need for a key-attribute since can
    just supply a custom comparison method:

    class UserDefined:
      . . .
      def cmp(self, other):
          return cmp(self.key, other.key)

    @mdickinson
    Copy link
    Member

    One case I've been thinking about is that of maintaining a list of Decimal
    objects that are sorted by absolute value. For this, having to create a
    list of (abs(x), x) pairs just seems clumsy compared to using a key
    argument to bisect.

    Perhaps this is a contrived use case, but it doesn't seem totally
    implausible.

    @tebeka
    Copy link
    Mannequin Author

    tebeka mannequin commented Nov 21, 2008

    I agree you can get around this with defining __cmp__, however same goes
    to "sort" and it was added anyway.

    My take on it is that sometimes I need to find something in a big list
    of objects, and I don't like to do DSU and not add __cmp__ to the
    objects (since some of the lists might be sorted by different attributes

    • say one list for time and one line for price).

    I'd prefer if we do implement "key" and add a warning in the docs it
    might slow you down. Which what will happen in the case of __cmp__ anyway.

    I don't see why the "key" function should be called all the time on
    inserted item, it's very easy to cache this value

    def bisect(a, x, lo=0, hi=None, key=lambda x: x):
        assert low >= 0, "low must be non-negative"
        hi = hi or len(a)
    
        x_key = key(x) 
        while lo < hi:
            mid = (lo+hi)//2
            if x_key < key(a[mid]): hi = mid
            else: lo = mid+1
        return lo

    (I'd also wish for "identity" built in, however this is another subject :)

    @terryjreedy
    Copy link
    Member

    Just a reminder that __cmp__ is gone in 3.0.
    I presume bisect, like sort, only requires __lt__ and perhaps __eq__,
    though I can find no doc of either.

    @milkokrachounov
    Copy link
    Mannequin

    milkokrachounov mannequin commented Nov 2, 2009

    I've been bugged by the lack of key= argument for bisect for some time
    now, and today I got to read this and the previous issues about the
    matter. I still fail to understand the reasons for the rejections. It
    might encourage bad design in which expensive key functions will get
    called more than once for the same element. However:

    1. Not all key= functions are computationally expensive, in fact a quick
      search shows that most key= uses come are with itemgetter, attrgetter or
      some cousin.
    2. Even for a computationally expensive function precached key isn't
      necessarily the right answer. And if the keys are huge, a key= function
      can actually be useful in implementing intelligent caching.
    3. The reason for rejection is merely a performance issue which *can* be
      fixed without any significant changes to the design of the app should it
      prove to be a real issue.

    I was writing a module that keeps a sorted list of items in which items
    can be added, removed, or changed (they might get reordered). I used
    bisect to avoid useless slow linear searches, which would be one
    obstacle for the app to scale well. However, the sort key can be changed
    at any time, so implementing __lt__ wasn't an option. Keeping a second
    list with the keys is both memory and computationally expensive as
    inserting into a list is slow. It's not a shiny example of a bisect
    use-case as the only thing I've used my module so far is to display the
    files in a directory, which means the lists are small and the changes
    are seldom, so any implementation is good enough. But bisect with key=
    would have been best if it was available. What's worse, I have a buggy
    unreadable ad hoc implementation of bisect in my code right now. ;)

    I see the following as reasons why key= would provide benefit:

    1. If you have a sorted list, bisect is a net win. Having a key= would
      enable you to utilize it without refactoring anything. The lack of key
      may as well encourage you to continue using linear searches, or other
      sub-optimal solutions.
    2. Using key= is more readable and less error-prone than keeping two
      lists in simple cases where defining a class with __le__ is an overkill.
      Two examples I had where this was the case:
      a) A class I implemented to pick numbers from a certain discrete random
      distribution with bisect. Oh, but yeah, the implementation without key=
      is twice faster.
      b) I gave a hand to someone who was implementing a dictionary looked up
      the words as you typed them with bisect.
    3. Insort. Insort is already slow enough as it is, a key= wouldn't slow
      it down much if at all. In fact, if you are keeping two lists, key=
      would speed it up. Insort with key= is a net win.

    I've implemented key= and quickly hacked a benchmark to show what
    performance hit it might have, and to put things into perspective. The
    results:

    1. Cheap key function, integers and abs(), 1000000 items, 5000 bisects
      or insorts:
      a) Bisect with a key: 0.02205014s
      b) Bisect with a second list: 0.01328707s
      c) Insort with a key: 5.83688211s
      d) Bisect with a second list, and two inserts: 16.17302299s

    2. Cheap key function, ~4000 char bytestrings and len(), 100000 items,
      5000 bisects or insorts:
      a) Bisect with a key: 0.01829195s
      b) Bisect with a second list: 0.00945401s
      c) Insort with a key: 0.25511408s
      d) Bisect with a second list, and two inserts: 0.49303603s

    3. Expensive key function, ~4000 char bytestrings and str.lower(),
      100000 (500 MB) items, 5000 bisects or insorts:
      a) Bisect with a key: 1.26837015s
      b) Bisect with a second list: 0.08390594s
      c) Insort with a key: 1.50406289s
      d) Bisect with a second list, and two inserts: 0.57737398s

    4. Expensive key function, ~4000 char bytestrings and str.lower(),
      500000 (2.5 GB) items, 5000 bisects or insorts:
      a) Bisect with a key: 1.46136308s
      b) Bisect with a second list: 0.08768606s
      c) Insort with a key: 3.05218720s
      d) Bisect with a second list, and two inserts: 6.43227386s

    5. Expensive key function, ~4000 char bytestrings and str.strip(),
      100000 (500 MB) items, 5000 bisects or insorts:
      a) Bisect with a key: 0.03311396s
      b) Bisect with a second list: 0.01374602s
      c) Insort with a key: 0.27423000s
      d) Bisect with a second list, and two inserts: 0.49585080s

    6. Expensive key function, ~4000 char bytestrings and str.strip(),
      500000 (2.5 GB) items, 5000 bisects or insorts:
      a) Bisect with a key: 0.04530501
      b) Bisect with a second list: 0.01912594
      c) Insort with a key: 1.62209797
      d) Bisect with a second list, and two inserts: 5.91734695

    Also, I tried to bench linear searches, but as they had to run in Python
    code they aren't representative of anything. In the integer test they
    went for about 250 seconds without recalculating the key, and for about
    500 with. Also, replacing them with .index() gave about 60 seconds if I
    ensured there's high probability that the number is in the list, and for
    500 if I didn't.

    In short, key= for bisect would be convenient and neat, really useful in
    rare cases, leading to more readable code in the most common cases, and
    I'm unconvinced of the perceived harm that it would cause.

    I attach a patch implementing it for trunk, py3k, and the benchmark
    script I used to test it (on trunk).

    @rhettinger rhettinger self-assigned this Nov 2, 2009
    @bls
    Copy link
    Mannequin

    bls mannequin commented Apr 14, 2010

    This was closed over a year ago, but since mark.dickinson was asking for convincing use-cases: I'm breaking up a file into line-delimited chunks. These chunks are non-overlapping, contiguous, and tend to be fairly large, so I'm just recording the start line of each chunk in a 2-ple:

    mapping = [
      (10, 'first chunk'),
      (50, 'second chunk'),
      (60, 'third chunk')
    ]

    Lines 10-49 are in the first chunk, lines 50-59 are in the second, lines 60+ are in the third. So:

    def CategorizeLine(line, mapping):
       loc = bisect.bisect([m[0] for m in mapping], line)
       if loc == 0:
          return None # before first chunk
       return mapping[loc-1][1]

    It Would Be Nice if I could write the second line as:

       loc = bisect.bisect(mapping, line, key=lambda m:m[0])

    The bisect documentation suggests pre-computing the key list, but it seems messy and error-prone to keep a redundant data structure in sync with its source. I could also rewrite my "mapping" data structure to be two parallel lists instead of one list of 2-ples, but this data structure is more readable and extensible and handles empty lists more naturally.

    @rhettinger
    Copy link
    Contributor

    I'll take another look at this one.

    @rhettinger rhettinger reopened this Apr 14, 2010
    @bls
    Copy link
    Mannequin

    bls mannequin commented Apr 14, 2010

    For what it's worth, after I posted my comment, I realized I could use tuple comparison semantics:

       loc = bisect.bisect(mapping, (line,))

    since my key happens to be at index 0. "key=" would still be nice.

    @rhettinger
    Copy link
    Contributor

    What would you guys think about adding a class that uses bisect internally so that the keyfunc() never gets called more than once per key? This could be added to the docs as a cut-and-pastable recipe or it could be added to the module directly:

    sd = SortedCollection(iterable, key=keyfunc)
    s[i]                  --> getitem at position i
    len(s)                --> length
    sd.insert_left(item)  --> None
    sd.insert_right(item) --> None
    sd.find_left(key)     --> item
    sd.find_right(key)    --> item
    sd.keyfunc            --> the key function 
    list(sd)              --> sorted list of items
    list(reversed(sd))    --> reverse sorted list of items
    s.clear()

    The implementation would follow this pattern:

     def insert_left(self, item):
       key = self._keyfunc(item)
       i = bisect_left(self._keys, key)
       self._keys.insert(i, key)
       self._items.insert(i, item)
    
     def find_left(self, key):
       i = bisect_left(self._keys, key)
       return self._items[i]
    
     def __iter__(self):
       return iter(self._items)
    
     def __getitem__(self, i):
       return self._items[i]

    @rhettinger
    Copy link
    Contributor

    Attaching a draft of a sorted collection class.

    Any comments?

    @alex
    Copy link
    Member

    alex commented Apr 16, 2010

    Looks nice to me, however I wonder if there isn't some overlap with the requests to add a key= kwarg to heapq methods (and the discussion about adding a Heap class there).

    @rhettinger
    Copy link
    Contributor

    Added a link to the a SortedCollections recipe, added example of how to do searches, and noted the O(n) performance of the insort() functions.

    See r83786

    @jafo
    Copy link
    Mannequin

    jafo mannequin commented Aug 29, 2010

    This issue came up on #python IRC, and that combined with the number of times this has been duplicated makes me think that maybe the mention of the SortedCollection recipe should be a little more prominent.

    Perhaps either moved up by the method list, or something like: "Note: No key argument is available because of performance issues. Please consider either the SortedCollection recipe or altering the objects comparison methods.

    ? It just seems like it keeps coming up, and in this case came up after the recipe reference was added.

    @rhettinger
    Copy link
    Contributor

    maybe the mention of the SortedCollection recipe should
    be a little more prominent.

    Thanks for the suggestion. Will look at moving the note higher on the page.

    @rhettinger rhettinger added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Aug 29, 2010
    @rhettinger rhettinger reopened this Aug 29, 2010
    @rhettinger
    Copy link
    Contributor

    Fixed in r84383.

    @terryjreedy
    Copy link
    Member

    On python-ideas, thread "Optional key to bisect's functions?" Guido wrote "Bingo. That clinches it. We need to add key=." 'That' being the fact that values that have keys may not be comparable themselves (in py3), so that comparing (key,value) pairs may raise TypeError. This follows a reply by him yesterday saying that he has wanted this feature occasionally. I am re-opening this rather than a new issue because there is already a patch with tests.

    @dimaqq
    Copy link
    Mannequin

    dimaqq mannequin commented Jan 13, 2014

    I've worked around this in 2.6/2.7 like this:

    class Arr:
        def __getitem__(self, i):
            return foo(i)  # your key function
        def __len__(self):
            return 10000000  # your max index value

    bisect.bisect(Arr(), value, ...)

    @dmtr
    Copy link
    Mannequin

    dmtr mannequin commented Mar 7, 2015

    Use case: a custom immutable array with a large number of items and indirect key field access. For example ctypes.array, memoryview or ctypes.pointer or any other custom container.

    1. I'm not sure how anyone can consider a precached key array as a right ans scalable answer. It is just a ridiculuos idea. Precashing key array is a O(N) operation. While bisect is O(log(N)).

    2. @Raymond There is a statement that "adding 'key()' would encourage bad design and steer people after from better solutions." Well, right now, the design that is being encouraged results in O(N) code. Because lazy developers are just 'pre-cacaching' (copying) the keys!

    3. There is a statement that key() have to be called multiple times per item. What? Why?

    4. There is a statement that one can always add a _cmp_ function to an object. This is ridiculuous. The object could be immutable. There should be no need to modify the object/array when all that you need to do is to bisect it.

    @ericreynolds
    Copy link
    Mannequin

    ericreynolds mannequin commented Apr 30, 2015

    In the meantime here is a workaround

    https://gist.github.com/ericremoreynolds/2d80300dabc70eebc790

    @mdickinson
    Copy link
    Member

    I've just encountered another case where the lack of a key on bisect has led to more complicated and error-prone code than necessary.

    Quick summary: we've got a list containing an ordered collection of non-overlapping open intervals on the real line. (In case anyone wants to know, the intervals represent the depths of chunks of interest in a rock core sample.) There's then code for splitting an interval into two pieces at a given point, and for merging two adjacent intervals. Splitting involves (i) finding the relevant interval using a bisect search based on the left endpoint of each interval, then (ii) replacing that interval with two new intervals in the list.

    The fact that the list is being modified after every bisect search makes it messy to cache the left endpoints, since that cache has to be updated along with the list at every stage. The operations are also relatively rare, which makes it feel inefficient to be computing *all* the left endpoints of the intervals in the first place.

    Adding comparisons to our interval class is doable, but invasive and unnatural, since the actual class carries other pieces of data and there are many possible meanings for < with respect to that class: it doesn't make sense to hard-code the comparison with respect to depths in that class's __lt__ method.

    So I think there's a strong case for a key argument in some future version of Python.

    [Of course, for our app we're on Python 2.7, so this issue won't help us directly. We're probably going to go with either reimplementing bisect or using a proxy array like the one suggested by Eric Reynolds.]

    @rhettinger
    Copy link
    Contributor

    I'll add a key= variant for Python 3.6.

    @NeilGirdhar
    Copy link
    Mannequin

    NeilGirdhar mannequin commented Feb 29, 2016

    I'm also looking for bisect with a key argument (numpy array of records, want to search on one element). However, I don't see bisect in the what's new: https://docs.python.org/3.6/whatsnew/3.6.html ? Any luck with the implementation?

    @gpshead
    Copy link
    Member

    gpshead commented Nov 3, 2017

    obviously didn't make it in 3.6 but this still seems desirable. I just saw someone at work propose a trivial port of golang's sort.Search - https://golang.org/pkg/sort/#Search - in Python which caused me to hunt for an issue on bisect key= support.

    @gpshead gpshead added the 3.7 (EOL) end of life label Nov 3, 2017
    @remilapeyre remilapeyre mannequin added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Feb 8, 2019
    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Feb 18, 2019

    Hi everybody, I opened PR 11781 to add a key argument to functions in the bisect
    module. I agree with @dmtr's points that this addition is not a bad design.

    As far as I can tell, the key function is at called at most once per item as this
    example where an assertion would break shows:

        import bisect
        from collections import defaultdict
    
    
        class Test:
            def __init__(self, value):
                self.value = value
    
    
        cache = defaultdict(int)
    
        def key(e):
            cache[e] += 1
            assert cache[e] <= 1
            return e.value
    
    
        l = [Test(i) for i in range(10000)]
    bisect.bisect(l, Test(25), key=key)
    
        ➜  cpython git:(add-key-argument-to-bisect) ./python.exe
        Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb  7 2019, 17:33:24) 
        [Clang 10.0.0 (clang-1000.10.44.4)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import bisect
        >>> from collections import defaultdict
        >>> 
        >>> 
        >>> class Test:
        ...     def __init__(self, value):
        ...         self.value = value
        ... 
        >>> 
        >>> cache = defaultdict(int)
        >>> 
        >>> def key(e):
        ...     cache[e] += 1
        ...     assert cache[e] <= 1
        ...     return e.value
        ... 
        >>> 
        >>> l = [Test(i) for i in range(10000)]
        >>> 
        >>> bisect.bisect(l, Test(25), key=key)
        26

    This argument can be used where the objects are immutable and I have not been able
    to see changes in bisect speed using Victor Stinner perf module:

    (cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
    (cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ git checkout cd90f6a
    (cpython-venv) ➜ cpython git:(cd90f6a) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
    (cpython-venv) ➜ cpython git:(cd90f6a) ✗ python -m perf compare_to cd90f6a.json b7aaa1a.json
    Mean +- std dev: [cd90f6a] 36.2 us +- 1.0 us -> [b7aaa1a] 35.7 us +- 0.5 us: 1.01x faster (-1%)

    (cd90f6a was somtime faster than b7aaa1a, sometime slower but they always
    were less than one std dev from one another)

    As I wrote in the discussion of the PR, I suspect the branch predictor to predict
    reliably the branching in the hot path (though I don't know much about that and
    would love some input).

    For the record, here is the performance when a key function is given:

    (cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25, key=lambda e: e)"

    .........................................
    Mean +- std dev: 59.3 us +- 1.0 us

    It seems to me that adding the key parameter is the best solution possible.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Mar 12, 2019

    Hi, there has been renewed interest from @alexchamberlain and @f6v on GitHub for this feature. Would it be possible to get the patch reviewed so we can add it in 3.8?

    @rhettinger
    Copy link
    Contributor

    I'll look at it this weekend.

    @flyingosprey
    Copy link
    Mannequin

    flyingosprey mannequin commented May 26, 2019

    Can anyone add "reverse" support? Key and reverse support are both functional requirement.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 26, 2019

    I think it could be done with key=lambda item: -item if the key argument is added.

    @NeilGirdhar
    Copy link
    Mannequin

    NeilGirdhar mannequin commented May 26, 2019

    The problem with key=lambda item: -item is that item cannot always be easily negated. For example, tuples are often used as keys.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 26, 2019

    Can you not use functools.cmp_to_key() for this?

    Here's an example:

    >>> import bisect, functools
    >>> l = [('f', 5), ('d', 3), ('c', 2), ('b', 1), ('a', 0)]
    >>> def cmp(a, b):
    ...     if a > b: return -1
    ...     if a < b: return 1
    ...     return 0
    ... 
    >>> bisect.bisect(l, ('e', 4), key=functools.cmp_to_key(cmp))
    1
    >>> l
    [('f', 5), ('d', 3), ('c', 2), ('b', 1), ('a', 0)]
    >>> bisect.insort(l, ('e', 4), key=functools.cmp_to_key(cmp))
    >>> l
    [('f', 5), ('e', 4), ('d', 3), ('c', 2), ('b', 1), ('a', 0)]

    @remilapeyre remilapeyre mannequin added the 3.9 only security fixes label May 26, 2019
    @GPery
    Copy link
    Mannequin

    GPery mannequin commented Jun 11, 2019

    I opened a relevant PR, #11781.

    I believe a key parameter is inferior to a comparison callback. The former is a specific case of the latter, and in my use case would force me to create another class to serve as a comparator for my objects, at which point I might as well wrap them and add __lt__.

    Furthermore, I believe this is more in-line with similar standard functions in other languages such as C++ (std::sort), Java (PriorityQueue) or Rust (slice.sort_by).

    @mdickinson
    Copy link
    Member

    I opened a relevant PR, #11781.

    Did you mean #13970 ?

    @GPery
    Copy link
    Mannequin

    GPery mannequin commented Jun 11, 2019

    I did, thanks!

    @rhettinger rhettinger removed the 3.8 only security fixes label Jun 11, 2019
    @gpshead gpshead added 3.10 only security fixes and removed 3.9 only security fixes labels Aug 21, 2020
    @rhettinger
    Copy link
    Contributor

    New changeset 871934d by Raymond Hettinger in branch 'master':
    bpo-4356: Add key function support to the bisect module (GH-20556)
    871934d

    @pablogsal
    Copy link
    Member

    New changeset 1aaa859 by Pablo Galindo Salgado in branch 'main':
    bpo-4356: Mention the new key arguments for the bisect module APIs in the 3.10 What's new (GH-28339)
    1aaa859

    @pablogsal
    Copy link
    Member

    New changeset dda5ff2 by Miss Islington (bot) in branch '3.10':
    bpo-4356: Mention the new key arguments for the bisect module APIs in the 3.10 What's new (GH-28339) (GH-28340)
    dda5ff2

    @pablogsal
    Copy link
    Member

    New changeset 7128864 by Pablo Galindo (Miss Islington (bot)) in branch '3.10':
    bpo-4356: Mention the new key arguments for the bisect module APIs in the 3.10 What's new (GH-28339) (GH-28340)
    7128864

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants