Message 353206 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	benspiller
Recipients	benspiller
Date	2019-09-25.14:35:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1569422117.26.0.730288338211.issue38278@roundup.psfhosted.org>
In-reply-to

Content
In performance-critical python code, it's quite common to need to get an item from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't yet exist. The obvious way to do this based on the documentation is to call the dict.get() method: > python -m timeit -s "d={'abc':123}" "x=d.get('abc',None)" 5000000 loops, best of 5: 74.6 nsec per loop ... however the performance of natural approach is very poor (2.2 times slower!) compared to the time really needed to look up the value: >python -m timeit -s "d={'abc':123}" "x=d['abc']" 5000000 loops, best of 5: 33 nsec per loop There are various ways to do this more efficiently, but they all have significant performance or functional drawbacks: custom dict subclass with __missing__() override: promising approach, but use of a custom class instead of dict seems to increase [] cost significantly: >python -m timeit -s "class mydict(dict):" -s " def __missing__(self,key):return None" -s "d = mydict({'abc':123})" "x=d['abc']" 5000000 loops, best of 5: 60.4 nsec per loop get() with caching of function lookup - somewhat better but not great: >python -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)" 5000000 loops, best of 5: 59.8 nsec per loop [] and "in" (inevitably a bit slow due to needing to do the lookup twice): >python -m timeit -s "d={'abc':123}" "x=d['abc'] if 'abc' in d else None" 5000000 loops, best of 5: 53.9 nsec per loop try/except approach: quickest solution if it exists, but clunky syntax, and VERY slow if it doesn't exist >python -m timeit -s "d={'abc':123}" "try:" " x=d['abc']" "except KeyError: pass" 5000000 loops, best of 5: 41.8 nsec per loop >python -m timeit -s "d={'abc':123}" "try:" " x=d['XXX']" "except KeyError: pass" 1000000 loops, best of 5: 174 nsec per loop collections.defaultdict: reasonable performance if key exists, but unwanted behaviour of adding the key if missing (which if used with an unbounded universe of keys could produce a memory leak): >python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']" 5000000 loops, best of 5: 34.3 nsec per loop I bet we can do better! Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?

In performance-critical python code, it's quite common to need to get an item from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't yet exist. The obvious way to do this based on the documentation is to call the dict.get() method:

> python -m timeit -s "d={'abc':123}" "x=d.get('abc',None)"
5000000 loops, best of 5: 74.6 nsec per loop

... however the performance of natural approach is very poor (2.2 times slower!) compared to the time really needed to look up the value:
>python -m timeit -s "d={'abc':123}" "x=d['abc']"
5000000 loops, best of 5: 33 nsec per loop

There are various ways to do this more efficiently, but they all have significant performance or functional drawbacks:

custom dict subclass with __missing__() override: promising approach, but use of a custom class instead of dict seems to increase [] cost significantly:
>python -m timeit -s "class mydict(dict):" -s "  def __missing__(self,key):return None" -s "d = mydict({'abc':123})" "x=d['abc']"
5000000 loops, best of 5: 60.4 nsec per loop

get() with caching of function lookup - somewhat better but not great:
>python -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)"
5000000 loops, best of 5: 59.8 nsec per loop

[] and "in" (inevitably a bit slow due to needing to do the lookup twice):
>python -m timeit -s "d={'abc':123}" "x=d['abc'] if 'abc' in d else None"
5000000 loops, best of 5: 53.9 nsec per loop

try/except approach: quickest solution if it exists, but clunky syntax, and VERY slow if it doesn't exist
>python -m timeit -s "d={'abc':123}" "try:" "   x=d['abc']" "except KeyError: pass"
5000000 loops, best of 5: 41.8 nsec per loop
>python -m timeit -s "d={'abc':123}" "try:" "   x=d['XXX']" "except KeyError: pass"
1000000 loops, best of 5: 174 nsec per loop

collections.defaultdict: reasonable performance if key exists, but unwanted behaviour of adding the key if missing (which if used with an unbounded universe of keys could produce a memory leak):
>python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']"
5000000 loops, best of 5: 34.3 nsec per loop

I bet we can do better! 

Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?

History
Date	User	Action	Args
2019-09-25 14:35:17	benspiller	set	recipients: + benspiller
2019-09-25 14:35:17	benspiller	set	messageid: <1569422117.26.0.730288338211.issue38278@roundup.psfhosted.org>
2019-09-25 14:35:17	benspiller	link	issue38278 messages
2019-09-25 14:35:16	benspiller	create