Message335841
Hi everybody, I opened PR 11781 to add a key argument to functions in the bisect
module. I agree with @dmtr's points that this addition is not a bad design.
As far as I can tell, the key function is at called at most once per item as this
example where an assertion would break shows:
import bisect
from collections import defaultdict
class Test:
def __init__(self, value):
self.value = value
cache = defaultdict(int)
def key(e):
cache[e] += 1
assert cache[e] <= 1
return e.value
l = [Test(i) for i in range(10000)]
bisect.bisect(l, Test(25), key=key)
➜ cpython git:(add-key-argument-to-bisect) ./python.exe
Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb 7 2019, 17:33:24)
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bisect
>>> from collections import defaultdict
>>>
>>>
>>> class Test:
... def __init__(self, value):
... self.value = value
...
>>>
>>> cache = defaultdict(int)
>>>
>>> def key(e):
... cache[e] += 1
... assert cache[e] <= 1
... return e.value
...
>>>
>>> l = [Test(i) for i in range(10000)]
>>>
>>> bisect.bisect(l, Test(25), key=key)
26
This argument can be used where the objects are immutable and I have not been able
to see changes in bisect speed using Victor Stinner perf module:
(cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
(cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ git checkout cd90f6a369
(cpython-venv) ➜ cpython git:(cd90f6a369) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
(cpython-venv) ➜ cpython git:(cd90f6a369) ✗ python -m perf compare_to cd90f6a369.json b7aaa1adad.json
Mean +- std dev: [cd90f6a369] 36.2 us +- 1.0 us -> [b7aaa1adad] 35.7 us +- 0.5 us: 1.01x faster (-1%)
(cd90f6a369 was somtime faster than b7aaa1adad, sometime slower but they always
were less than one std dev from one another)
As I wrote in the discussion of the PR, I suspect the branch predictor to predict
reliably the branching in the hot path (though I don't know much about that and
would love some input).
For the record, here is the performance when a key function is given:
(cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25, key=lambda e: e)"
.........................................
Mean +- std dev: 59.3 us +- 1.0 us
It seems to me that adding the key parameter is the best solution possible. |
|
Date |
User |
Action |
Args |
2019-02-18 15:13:33 | remi.lapeyre | set | recipients:
+ remi.lapeyre, rhettinger, terry.reedy, gregory.p.smith, jafo, tebeka, mark.dickinson, alex, milko.krachounov, dmtr, bls, martin.panter, Dima.Tisnek, yselivanov, NeilGirdhar, josh.r, ericreynolds, wumpus |
2019-02-18 15:13:33 | remi.lapeyre | set | messageid: <1550502813.23.0.362995259727.issue4356@roundup.psfhosted.org> |
2019-02-18 15:13:33 | remi.lapeyre | link | issue4356 messages |
2019-02-18 15:13:32 | remi.lapeyre | create | |
|