Issue 38278: Need a more efficient way to perform dict.get(key, default)

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/82459

classification

Title:	Need a more efficient way to perform dict.get(key, default)
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Mark.Shannon, benspiller, gvanrossum, iritkatriel, methane, serhiy.storchaka, ta1hia, vstinner
Priority:	normal	Keywords:

Created on 2019-09-25 14:35 by benspiller, last changed 2022-04-11 14:59 by admin.

Messages (8)
msg353206 - (view)	Author: Ben Spiller (benspiller) *	Date: 2019-09-25 14:35
In performance-critical python code, it's quite common to need to get an item from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't yet exist. The obvious way to do this based on the documentation is to call the dict.get() method: > python -m timeit -s "d={'abc':123}" "x=d.get('abc',None)" 5000000 loops, best of 5: 74.6 nsec per loop ... however the performance of natural approach is very poor (2.2 times slower!) compared to the time really needed to look up the value: >python -m timeit -s "d={'abc':123}" "x=d['abc']" 5000000 loops, best of 5: 33 nsec per loop There are various ways to do this more efficiently, but they all have significant performance or functional drawbacks: custom dict subclass with __missing__() override: promising approach, but use of a custom class instead of dict seems to increase [] cost significantly: >python -m timeit -s "class mydict(dict):" -s " def __missing__(self,key):return None" -s "d = mydict({'abc':123})" "x=d['abc']" 5000000 loops, best of 5: 60.4 nsec per loop get() with caching of function lookup - somewhat better but not great: >python -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)" 5000000 loops, best of 5: 59.8 nsec per loop [] and "in" (inevitably a bit slow due to needing to do the lookup twice): >python -m timeit -s "d={'abc':123}" "x=d['abc'] if 'abc' in d else None" 5000000 loops, best of 5: 53.9 nsec per loop try/except approach: quickest solution if it exists, but clunky syntax, and VERY slow if it doesn't exist >python -m timeit -s "d={'abc':123}" "try:" " x=d['abc']" "except KeyError: pass" 5000000 loops, best of 5: 41.8 nsec per loop >python -m timeit -s "d={'abc':123}" "try:" " x=d['XXX']" "except KeyError: pass" 1000000 loops, best of 5: 174 nsec per loop collections.defaultdict: reasonable performance if key exists, but unwanted behaviour of adding the key if missing (which if used with an unbounded universe of keys could produce a memory leak): >python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']" 5000000 loops, best of 5: 34.3 nsec per loop I bet we can do better! Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?
msg353207 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-09-25 14:46
dict.get() is a method call wheras "key in dict" and "dict[key]" are operators. Method calls are still slower than operators, even after past optimizations. For example, when dict.get was converted to METH_FASTCALL, it was around 20 ns faster: https://vstinner.github.io/fastcall-microbenchmarks.html See also closed bpo-17170 "string method lookup is too slow".
msg353208 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-09-25 14:49
>python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']" 5000000 loops, best of 5: 34.3 nsec per loop This benchmark is not a fair comparison: the 'XXX' key is created at the first access. In short, this benchmark measure the performance of a dict lookup: >python -m timeit -s "d={'abc':123}" "x=d['abc']" 5000000 loops, best of 5: 33 nsec per loop
msg353209 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-09-25 14:52
> Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest? This issue doesn't propose any concrete solution, but discuss ideas. I suggest you to open a thread on the python-ideas mailing list instead. I suggest to close this issue. I bet that defaultdict is not faster once you will manage to write a fair micro-benchmark.
msg353210 - (view)	Author: Ben Spiller (benspiller) *	Date: 2019-09-25 14:52
Thanks... yep I realise method calls are slower than operators, am hoping we can still find a cunning way to speed up this use case nonetheless. :D For example by having a configuration option on dict (or making a new subclass) that gives the (speedy!) [] operator the same no-exception semantics you'd get from calling get(). As you can see from my timeit benchmarks none of the current workarounds are very appealing for this use case, and a 2.2x slowdown for this common operation is a shame.
msg353211 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-09-25 14:56
I also suggest you to not focus on such micro-benchmarks. It's not relevant to make a whole application faster. Use PyPy if you care about performances. PyPy has a very efficient implementation for dictionary and it's JIT compiler can go way further than CPython. In some cases, PyPy can even replace hash table lookup with array access: https://twitter.com/cfbolz/status/1175493837516627968
msg353333 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-09-26 19:06
Was LOAD_METHOD optimized for builtin methods?
msg397450 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-07-13 22:56
> Was LOAD_METHOD optimized for builtin methods? Maybe this can be done with specialization.

History
Date	User	Action	Args
2022-04-11 14:59:20	admin	set	github: 82459
2021-07-14 02:31:58	methane	set	nosy: + methane
2021-07-13 22:56:44	iritkatriel	set	nosy: + gvanrossum, Mark.Shannon, iritkatriel messages: + msg397450
2019-09-26 19:06:32	serhiy.storchaka	set	messages: + msg353333
2019-09-26 17:25:19	ta1hia	set	nosy: + ta1hia
2019-09-25 14:56:19	vstinner	set	messages: + msg353211
2019-09-25 14:52:30	benspiller	set	messages: + msg353210
2019-09-25 14:52:09	vstinner	set	messages: + msg353209
2019-09-25 14:49:32	vstinner	set	messages: + msg353208
2019-09-25 14:46:22	vstinner	set	nosy: + serhiy.storchaka, vstinner messages: + msg353207
2019-09-25 14:35:17	benspiller	create