Issue 41835: Speed up dict vectorcall creation using keywords

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86001

classification

Title:	Speed up dict vectorcall creation using keywords
Type:		Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.10

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Marco Sulla, Mark.Shannon, methane, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2020-09-22 12:56 by Marco Sulla, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pr23106.txt	methane, 2020-11-02 13:15	long result
bench_kwcall.py	methane, 2020-11-02 13:25

Pull Requests
URL	Status	Linked	Edit
PR 22346	closed	Marco Sulla, 2020-09-22 12:56
PR 22909	closed	methane, 2020-10-23 05:36
PR 23106	closed	methane, 2020-11-02 11:58

Messages (25)
msg377318 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-09-22 12:56
I've done a PR that speeds up the vectorcall creation of a dict using keyword arguments. The PR in practice creates a insertdict_init(), a specialized version of insertdict. I quote the comment to the function: Same to insertdict but specialized for inserting without resizing and for dict that are populated in a loop and was empty before (see the empty arg). Note that resizing must be done before calling this function. If not possible, use insertdict(). Furthermore, ma_version_tag is left unchanged, you have to change it after calling this function (probably at the end of a loop). This change speeds up the code up to a 30%. Tested with: python -m timeit -n 2000 --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(10000)}" "dict(**o)"
msg377359 - (view)	Author: Inada Naoki (methane) *	Date: 2020-09-23 05:19
I have a Linux desktop machine for benchmarking & profiling in my office. But the machine is offline and I am working from home several weeks. So please wait several weeks until I confirm your branch. > This change speeds up the code up to a 30%. Tested with: > > python -m timeit -n 2000 --setup "from uuid import uuid4 ; o = > {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i > in range(10000)}" "dict(o)" `dict(o)` is not common use case. Could you provide some other benchmarks?
msg377397 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-09-23 15:36
> `dict(**o)` is not common use case. Could you provide some other benchmarks? You can do python -m timeit -n 2000000 "dict(key1=1, key2=2, key3=3, key4=4, key5=5, key6=6, key7=7, key8=8, key9=9, key10=10)" or with pyperf. In this case, since the dict is little, I observed a speedup of 25%.
msg379290 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-10-22 12:19
Another bench: python -m pyperf timeit --rigorous "dict(ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')" Result without pull: Mean +- std dev: 486 ns +- 8 ns Result with pull: Mean +- std dev: 328 ns +- 4 ns I compiled both with optimizations and lto. Some arch info: python -VV Python 3.10.0a1+ (heads/master-dirty:dde91b1953, Oct 22 2020, 14:00:51) [GCC 10.1.1 20200718] uname -a Linux buzz 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS
msg379409 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-23 05:47
Ok. Performance improvement comes from: a. Presizing b. Bypassing some checks in PyDict_SetItem c. Avoiding duplication check. (b) is relatively small so I tried to focus on (a) and (b). See GH-22909. In case of simple keyword arguments, it is 10% faster than GH-22346: ``` $ ./python -m pyperf timeit --compare-to ./python-speedup_kw "dict(ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')" python-speedup_kw: ..................... 357 ns +- 10 ns python: ..................... 323 ns +- 4 ns Mean +- std dev: [python-speedup_kw] 357 ns +- 10 ns -> [python] 323 ns +- 4 ns: 1.11x faster (-10%) ``` In case of `dict(d, key=val)` case, it is 8% slower than GH-22346, but still 8% faster than master. ``` $ ./python -m pyperf timeit --compare-to ./python-speedup_kw -s 'd={"foo":"bar"}' "dict(d, ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')" python-speedup_kw: ..................... 505 ns +- 15 ns python: ..................... 546 ns +- 17 ns Mean +- std dev: [python-speedup_kw] 505 ns +- 15 ns -> [python] 546 ns +- 17 ns: 1.08x slower (+8%) $ ./python -m pyperf timeit --compare-to ./python-master -s 'd={"foo":"bar"}' "dict(d, ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')" python-master: ..................... 598 ns +- 10 ns python: ..................... 549 ns +- 19 ns Mean +- std dev: [python-master] 598 ns +- 10 ns -> [python] 549 ns +- 19 ns: 1.09x faster (-8%) ``` Additionally, I expect we can reuse this new code to optimize BUILD_CONST_KEY_MAP.
msg379435 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-23 14:06
@Marco Sulla Please take a look at GH-22909. It is simplified version of your PR. And I wrote another optimization based on it #42126.
msg379436 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2020-10-23 14:06
Could we get a pyperformance benchmark run on this please?
msg379458 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-10-23 17:42
@methane: well, to be honest, I don't see much difference between the two pulls. The major difference is that you merged insertdict_init in dict_merge_init. But I kept insertdict_init separate on purpose, because this function can be used in other future dedicated function on creation time only. Furthermore it's more simple to maintain, since it's quite identical to insertdict.
msg379475 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-10-23 20:06
@Mark.Shannon I tried to run pyperformance, but wheel does not work for Python 3.10. I get the error: AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64')
msg379523 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-24 07:21
@Mark.Shannon I had seen some speedup on tornado benchmark when I didn't use PGO+LTO. but it was noise. Now I use PGO+LTO. master vs PR-22909: $ ./python -m pyperf compare_to master-opt.json speedup_kw-opt.json -G --min-speed=1 Slower (11): - spectral_norm: 147 ms +- 1 ms -> 153 ms +- 2 ms: 1.04x slower (+4%) - pickle_dict: 28.6 us +- 0.1 us -> 29.5 us +- 0.6 us: 1.03x slower (+3%) - regex_compile: 199 ms +- 1 ms -> 204 ms +- 4 ms: 1.03x slower (+3%) - chameleon: 9.75 ms +- 0.10 ms -> 9.99 ms +- 0.09 ms: 1.02x slower (+2%) - logging_format: 10.9 us +- 0.2 us -> 11.1 us +- 0.2 us: 1.02x slower (+2%) - sqlite_synth: 3.29 us +- 0.05 us -> 3.36 us +- 0.05 us: 1.02x slower (+2%) - regex_v8: 26.1 ms +- 0.1 ms -> 26.5 ms +- 0.3 ms: 1.02x slower (+2%) - json_dumps: 14.6 ms +- 0.1 ms -> 14.8 ms +- 0.1 ms: 1.02x slower (+2%) - logging_simple: 9.88 us +- 0.18 us -> 10.0 us +- 0.2 us: 1.02x slower (+2%) - nqueens: 105 ms +- 1 ms -> 107 ms +- 2 ms: 1.01x slower (+1%) - raytrace: 511 ms +- 5 ms -> 517 ms +- 6 ms: 1.01x slower (+1%) Faster (10): - regex_dna: 233 ms +- 1 ms -> 229 ms +- 1 ms: 1.02x faster (-2%) - unpickle: 14.7 us +- 0.1 us -> 14.5 us +- 0.2 us: 1.02x faster (-1%) - deltablue: 8.17 ms +- 0.29 ms -> 8.06 ms +- 0.17 ms: 1.01x faster (-1%) - mako: 16.8 ms +- 0.2 ms -> 16.6 ms +- 0.1 ms: 1.01x faster (-1%) - xml_etree_iterparse: 117 ms +- 1 ms -> 116 ms +- 1 ms: 1.01x faster (-1%) - scimark_monte_carlo: 117 ms +- 2 ms -> 115 ms +- 1 ms: 1.01x faster (-1%) - xml_etree_parse: 164 ms +- 3 ms -> 162 ms +- 1 ms: 1.01x faster (-1%) - unpack_sequence: 62.7 ns +- 0.7 ns -> 62.0 ns +- 0.7 ns: 1.01x faster (-1%) - regex_effbot: 3.43 ms +- 0.01 ms -> 3.39 ms +- 0.02 ms: 1.01x faster (-1%) - scimark_fft: 405 ms +- 4 ms -> 401 ms +- 1 ms: 1.01x faster (-1%) Benchmark hidden because not significant (39)
msg379524 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-24 07:36
@Marco Sulla > @methane: well, to be honest, I don't see much difference between the two pulls. The major difference is that you merged insertdict_init in dict_merge_init. Not only it but also some simplification which make 10% faster than GH-22346. > But I kept insertdict_init separate on purpose, because this function can be used in other future dedicated function on creation time only. Where do you expect to use it? Would you implement some more optimization based on your PR to demonstrate your idea? I confirmed that GH-22909 can be used to optimize BUILD_CONST_KEY_MAP (GH-22911). That's why I merged two functions. > AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64') Try `pip install pyperformance==1.0.0`.
msg379525 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-24 09:10
I confirmed _PyDict_FromItems() can be used to optimize _PyStack_AsDict() too. See https://github.com/methane/cpython/pull/25 But I can not confirm significant performance gain from it too.
msg379528 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-10-24 12:47
I commented out sqlalchemy in the requirements.txt in the pyperformance source code, and it worked. I had also to skip tornado: pyperformance run -r -b,-sqlalchemy_declarative,-sqlalchemy_imperative,-tornado_http -o ../perf_master.json This is my result: pyperformance compare perf_master.json perf_dict_init.json -O table \| grep Significant \| 2to3 \| 356 ms \| 348 ms \| 1.02x faster \| Significant (t=7.28) \| \| fannkuch \| 485 ms \| 468 ms \| 1.04x faster \| Significant (t=9.68) \| \| pathlib \| 22.5 ms \| 22.1 ms \| 1.02x faster \| Significant (t=13.02) \| \| pickle_dict \| 29.0 us \| 30.3 us \| 1.05x slower \| Significant (t=-92.36) \| \| pickle_list \| 4.55 us \| 4.64 us \| 1.02x slower \| Significant (t=-10.87) \| \| pyflate \| 735 ms \| 702 ms \| 1.05x faster \| Significant (t=6.67) \| \| regex_compile \| 197 ms \| 193 ms \| 1.02x faster \| Significant (t=2.81) \| \| regex_v8 \| 24.5 ms \| 23.9 ms \| 1.02x faster \| Significant (t=17.63) \| \| scimark_fft \| 376 ms \| 386 ms \| 1.03x slower \| Significant (t=-15.07) \| \| scimark_lu \| 154 ms \| 158 ms \| 1.03x slower \| Significant (t=-12.94) \| \| sqlite_synth \| 3.35 us \| 3.21 us \| 1.04x faster \| Significant (t=17.65) \| \| telco \| 6.54 ms \| 7.14 ms \| 1.09x slower \| Significant (t=-8.51) \| \| unpack_sequence \| 58.8 ns \| 61.5 ns \| 1.04x slower \| Significant (t=-19.66) \| It's strange that some benchmarks are slower, since the patch only do two additional checks to dict_vectorcall. Maybe they use many little dicts? @methane: > Would you implement some more optimization based on your PR to demonstrate your idea? I already done them, I'll do a PR.
msg380000 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-10-30 21:49
Well, following your example, since split dicts seems to be no more supported, I decided to be more drastic. If you see the last push in PR 22346, I do not check anymore but always resize, so the dict is always combined. This seems to be especially good for the "unpack_sequence" bench, even if I do not know what it is: \| chaos \| 132 ms \| 136 ms \| 1.03x slower \| Significant (t=-18.09) \| \| crypto_pyaes \| 136 ms \| 141 ms \| 1.03x slower \| Significant (t=-11.60) \| \| float \| 133 ms \| 137 ms \| 1.03x slower \| Significant (t=-16.94) \| \| go \| 276 ms \| 282 ms \| 1.02x slower \| Significant (t=-11.58) \| \| logging_format \| 12.3 us \| 12.6 us \| 1.02x slower \| Significant (t=-9.75) \| \| logging_silent \| 194 ns \| 203 ns \| 1.05x slower \| Significant (t=-9.00) \| \| logging_simple \| 11.3 us \| 11.6 us \| 1.02x slower \| Significant (t=-12.56) \| \| mako \| 16.5 ms \| 17.4 ms \| 1.05x slower \| Significant (t=-17.34) \| \| meteor_contest \| 116 ms \| 120 ms \| 1.04x slower \| Significant (t=-25.59) \| \| nbody \| 158 ms \| 166 ms \| 1.05x slower \| Significant (t=-12.73) \| \| nqueens \| 107 ms \| 111 ms \| 1.03x slower \| Significant (t=-11.39) \| \| pickle_pure_python \| 631 us \| 619 us \| 1.02x faster \| Significant (t=6.28) \| \| regex_compile \| 206 ms \| 214 ms \| 1.04x slower \| Significant (t=-24.24) \| \| regex_v8 \| 28.4 ms \| 26.7 ms \| 1.06x faster \| Significant (t=10.92) \| \| richards \| 87.8 ms \| 90.3 ms \| 1.03x slower \| Significant (t=-10.91) \| \| scimark_lu \| 165 ms \| 162 ms \| 1.02x faster \| Significant (t=4.55) \| \| scimark_sor \| 210 ms \| 215 ms \| 1.02x slower \| Significant (t=-10.14) \| \| scimark_sparse_mat_mult \| 6.45 ms \| 6.64 ms \| 1.03x slower \| Significant (t=-6.66) \| \| spectral_norm \| 158 ms \| 171 ms \| 1.08x slower \| Significant (t=-29.11) \| \| sympy_expand \| 599 ms \| 619 ms \| 1.03x slower \| Significant (t=-21.93) \| \| sympy_str \| 376 ms \| 389 ms \| 1.04x slower \| Significant (t=-23.80) \| \| sympy_sum \| 233 ms \| 239 ms \| 1.02x slower \| Significant (t=-14.70) \| \| telco \| 7.40 ms \| 7.61 ms \| 1.03x slower \| Significant (t=-10.08) \| \| unpack_sequence \| 70.0 ns \| 56.1 ns \| 1.25x faster \| Significant (t=10.62) \| \| xml_etree_generate \| 108 ms \| 106 ms \| 1.02x faster \| Significant (t=5.52) \| \| xml_etree_iterparse \| 133 ms \| 130 ms \| 1.02x faster \| Significant (t=11.33) \| \| xml_etree_parse \| 208 ms \| 204 ms \| 1.02x faster \| Significant (t=9.19) \|
msg380033 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-31 01:26
unpack_sequence is very sensitive benchmark. Speed is dramatically changed by code alignment. PGO+LTO will reduce the noise, but we see noise always. I believe there is no significant performance change in macro benchmarks when optimizing this part. Not significant in macro benchmarks doesn't mean we must reject the optimization, because pyperformance doesn't cover whole application in the world. But it means that we must be conservative about the optimization.
msg380051 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-10-31 10:29
Both changes add significant amount of code (100 and 85 lines correspondingly). Even if they speed up a particular case of dict constructor it is not common use case. I think that it would be better to reject these changes. They make maintenance harder, the benefit seems insignificant, and there is always a danger that new code can slow down other code. The dict object is performance critical for python, so it is better to not touch its code without need.
msg380060 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-31 13:01
> Both changes add significant amount of code (100 and 85 lines correspondingly). Even if they speed up a particular case of dict constructor it is not common use case. You are right, but please wait. Marco is new contributor and he can write correct C code for now. So I am searching some parts which can be optimized by his code before rejecting it. * bpo-42126, GH-22911: I can make dict display (aka. dict literal) 50% faster. But it introduce additional complexity to compiler and ceval. So I will reject it unless I find real world code using dict display in performance critical part. * _PyStack_AsDict (https://github.com/methane/cpython/pull/25): I thought this is performance critical function. But I could not see significant performance gain in pyperformance. * _PyEval_EvalCode (https://github.com/python/cpython/blob/master/Python/ceval.c#L4465): I am still not sure we can assume there are no duplicated keyword argument here. If we can assume it, we can optimize calling function receiving **kwds argument. These three parts are all I found. I will reject this issue after I failed to optimize _PyEval_EvalCode.
msg380066 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-10-31 14:04
Do not overestimate the importance of _PyStack_AsDict(). Most of calls (~90-95% or like) are with positional only arguments, and most of functions do not have var-keyword parameter. So efforts in last years were spent on optimizing common cases, in particularly avoiding creation of a dict without need. _PyStack_AsDict() can affect perhaps 1% of code, or less, and these functions are usually not performance critical.
msg380078 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-10-31 16:34
Well, actually Serhiy is right, it does not seem that the macro benchs did show something significant. Maybe the code can be used in other parts of CPython, for example in _pickle, where dicts are loaded. But it needs also to expose, maybe internally only, dictresize() and DICT_NEXT_VERSION(). Not sure it's something desirable. There's something that I do not understand: the speedup to unpack_sequence. I checked the pyperformance code, and it's a microbench for: a, b = some_sequence It should not be affected by the change. Anyway, I run the bench other 10 times, and the lowest value with the CPython code without the PR is not lower than 67.7 ns. With the PR, it reaches 53.5 ns. And I do not understand why. Maybe it affects the creation of the dicts with the local and global vars?
msg380110 - (view)	Author: Inada Naoki (methane) *	Date: 2020-11-01 00:49
> It should not be affected by the change. Anyway, I run the bench other 10 times, and the lowest value with the CPython code without the PR is not lower than 67.7 ns. With the PR, it reaches 53.5 ns. And I do not understand why. The benchmark is very affected by code placement. Even adding dead function affects speeds. Read vstinner's blog and presentation: * https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html * https://speakerdeck.com/haypo/how-to-run-a-stable-benchmark?slide=9 That's why we recommend PGO+LTO build for benchmarking.
msg380127 - (view)	Author: Marco Sulla (Marco Sulla) *	Date: 2020-11-01 12:12
I did PGO+LTO... --enable-optimizations --with-lto
msg380214 - (view)	Author: Inada Naoki (methane) *	Date: 2020-11-02 12:02
> I did PGO+LTO... --enable-optimizations --with-lto I'm sorry about that. PGO+LTO reduce noises, but there are still noises. And unpack_sequence is very fragile. I tried your branch again, and unpack_sequence is 10% slower than master branch. I am running pyperformance with PR-23106, which simplifies your function and use it from _PyStack_AsDict() and _PyEval_EvalCode().
msg380218 - (view)	Author: Inada Naoki (methane) *	Date: 2020-11-02 13:15
Short result (minspeed=2): Slower (4): - unpack_sequence: 65.2 ns +- 1.3 ns -> 69.2 ns +- 0.4 ns: 1.06x slower (+6%) - unpickle_list: 5.21 us +- 0.04 us -> 5.44 us +- 0.02 us: 1.04x slower (+4%) - chameleon: 9.80 ms +- 0.08 ms -> 10.0 ms +- 0.1 ms: 1.02x slower (+2%) - logging_silent: 202 ns +- 5 ns -> 206 ns +- 5 ns: 1.02x slower (+2%) Faster (9): - pickle_dict: 30.7 us +- 0.1 us -> 29.0 us +- 0.1 us: 1.06x faster (-5%) - scimark_lu: 169 ms +- 3 ms -> 163 ms +- 3 ms: 1.04x faster (-4%) - sympy_str: 396 ms +- 8 ms -> 383 ms +- 5 ms: 1.04x faster (-3%) - sqlite_synth: 3.46 us +- 0.08 us -> 3.34 us +- 0.04 us: 1.03x faster (-3%) - scimark_fft: 415 ms +- 3 ms -> 405 ms +- 3 ms: 1.03x faster (-3%) - pickle_list: 4.91 us +- 0.07 us -> 4.79 us +- 0.04 us: 1.03x faster (-3%) - dulwich_log: 82.4 ms +- 0.8 ms -> 80.4 ms +- 0.8 ms: 1.02x faster (-2%) - scimark_sparse_mat_mult: 5.49 ms +- 0.03 ms -> 5.37 ms +- 0.02 ms: 1.02x faster (-2%) - spectral_norm: 157 ms +- 1 ms -> 153 ms +- 4 ms: 1.02x faster (-2%) Benchmark hidden because not significant (47): ... Geometric mean: 1.00 (faster) Long result is attached.
msg380219 - (view)	Author: Inada Naoki (methane) *	Date: 2020-11-02 13:25
And bench_kwcall.py is a microbenchmark for _PyEval_EvalCode. $ cpython/release/python -m pyperf compare_to master.json kwcall-nodup.json kwcall-3: Mean +- std dev: [master] 192 us +- 2 us -> [kwcall-nodup] 175 us +- 1 us: 1.09x faster (-9%) kwcall-6: Mean +- std dev: [master] 327 us +- 6 us -> [kwcall-nodup] 291 us +- 4 us: 1.12x faster (-11%) kwcall-9: Mean +- std dev: [master] 436 us +- 10 us -> [kwcall-nodup] 373 us +- 5 us: 1.17x faster (-14%) Geometric mean: 0.89 (faster)
msg380222 - (view)	Author: Inada Naoki (methane) *	Date: 2020-11-02 14:13
While this is an interesting optimization, the gain is not enough. I close this issue for now. @Marco Sulla Optimizing dict is a bit hard job. If you want to continue, I have an idea: `dict(zip(keys, row))` is common use case. It is used by asdict() in datacalss, _asdict() in namedtuple, and csv DictReader. Sniffing zip object and presizing dict may be interesting optimization. But note that this idea has low chance of accepted too. We tries many ideas like this and reject them by ourselves even without creating a pull request.

History
Date	User	Action	Args
2022-04-11 14:59:35	admin	set	github: 86001
2020-11-02 14:13:06	methane	set	status: open -> closed resolution: rejected messages: + msg380222 stage: resolved
2020-11-02 13:25:13	methane	set	files: + bench_kwcall.py messages: + msg380219
2020-11-02 13:15:24	methane	set	files: + pr23106.txt messages: + msg380218
2020-11-02 12:02:39	methane	set	messages: + msg380214 stage: patch review -> (no value)
2020-11-02 11:58:07	methane	set	stage: patch review pull_requests: + pull_request22023
2020-11-01 12:12:14	Marco Sulla	set	messages: + msg380127
2020-11-01 00:49:59	methane	set	messages: + msg380110
2020-10-31 16:34:41	Marco Sulla	set	messages: + msg380078
2020-10-31 14:04:52	serhiy.storchaka	set	messages: + msg380066
2020-10-31 13:01:10	methane	set	messages: + msg380060
2020-10-31 10:29:17	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg380051
2020-10-31 01:26:46	methane	set	messages: + msg380033
2020-10-30 21:49:45	Marco Sulla	set	messages: + msg380000
2020-10-24 12:47:25	Marco Sulla	set	messages: + msg379528
2020-10-24 09:10:19	methane	set	messages: + msg379525
2020-10-24 07:36:55	methane	set	messages: + msg379524
2020-10-24 07:22:00	methane	set	messages: + msg379523
2020-10-23 20:06:17	Marco Sulla	set	messages: + msg379475
2020-10-23 17:42:50	Marco Sulla	set	messages: + msg379458
2020-10-23 14:06:42	Mark.Shannon	set	nosy: + Mark.Shannon messages: + msg379436
2020-10-23 14:06:11	methane	set	messages: + msg379435
2020-10-23 05:47:40	methane	set	messages: + msg379409 stage: patch review -> (no value)
2020-10-23 05:36:15	methane	set	keywords: + patch stage: patch review pull_requests: + pull_request21840
2020-10-22 12:19:20	Marco Sulla	set	messages: + msg379290
2020-09-23 15:36:59	Marco Sulla	set	messages: + msg377397
2020-09-23 05:19:26	methane	set	messages: + msg377359
2020-09-22 12:56:44	Marco Sulla	create