classification
Title: Seemingly unnecessary complexification of foo(**kw)
Type: Stage:
Components: Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Mark.Shannon Nosy List: Mark.Shannon, brandtbucher, josh.r, serhiy.storchaka, xmorel
Priority: normal Keywords:

Created on 2020-10-14 10:48 by xmorel, last changed 2020-10-16 04:39 by josh.r.

Messages (4)
msg378613 - (view) Author: Xavier Morel (xmorel) * Date: 2020-10-14 10:48
Following bpo-39320 the highly specialised bytecode for vararg calls were replaced by simpler ones, but there seems to be at least one area where the generated bytecode regressed for possibly no reason?

In Python 3.8, foo(**var) compiles to:

0 LOAD_GLOBAL              0 (foo)
2 BUILD_TUPLE              0
4 LOAD_FAST                2 (var)
6 CALL_FUNCTION_EX         1

In Python 3.9, it compiles to:

0 LOAD_GLOBAL              0 (foo)
2 BUILD_TUPLE              0
4 BUILD_MAP                0
6 LOAD_FAST                2 (var)
8 DICT_MERGE               1
0 CALL_FUNCTION_EX         1

The PR 18141 does not seem to change the implementation of CALL_FUNCTION_EX so I would expect that if it was fine with taking the `var` arbitrary mapping before it stil is now, and the extra two opcodes (and creation of a dict) is unnecessary?
msg378667 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-10-15 09:25
Have you observed any slowdown or incorrect behaviour?

The 3.8 bytecode looks incorrect to me.
The C-API documentation doesn't prohibit callables from mutating the dictionary they receive.
Unless a copy is made, then a callee could mutate `var`.

https://docs.python.org/3/c-api/call.html
msg378674 - (view) Author: Xavier Morel (xmorel) * Date: 2020-10-15 11:07
I have not noticed anything, I was just looking at the bytecode changes and stumbled upon this oddity. Though I would expect a small slowdown as every fn(**kw) would now incur an extra dict copy, unless there’s something in call_function_ex which copies the input dict iff its ref count is not one?

For whatever that’s worth, the 3.8 bytecode has been there since call_function_ex was added in 3.6 and before that call_function_kw looks identical (load_global foo, load_local var, call_function_kw)
msg378686 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-15 15:17
$ python3.8 -m timeit -s "a = {'a': 1}" "dict(**a)"
2000000 loops, best of 5: 113 nsec per loop
$ python3.9 -m timeit -s "a = {'a': 1}" "dict(**a)"
2000000 loops, best of 5: 181 nsec per loop
History
Date User Action Args
2020-10-16 04:39:07josh.rsetnosy: + josh.r
2020-10-15 15:17:29serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg378686
2020-10-15 15:13:14brandtbuchersetnosy: + brandtbucher
2020-10-15 11:07:54xmorelsetmessages: + msg378674
2020-10-15 09:25:58Mark.Shannonsetmessages: + msg378667
2020-10-15 08:06:35rhettingersetassignee: Mark.Shannon
2020-10-14 10:48:31xmorelcreate