Issue 29735: Optimize functools.partial() for positional arguments

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/73921

classification

Title:	Optimize functools.partial() for positional arguments
Type:	performance	Stage:	resolved
Components:	Extension Modules	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	methane, ncoghlan, rhettinger, serhiy.storchaka, vstinner, yselivanov
Priority:	normal	Keywords:

Created on 2017-03-06 13:22 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bench_fastcall_partial.py	vstinner, 2017-03-14 12:08
partial_stack_usage.py	vstinner, 2017-03-14 15:02

Pull Requests
URL	Status	Linked	Edit
PR 516	merged	vstinner, 2017-03-06 13:29

Messages (10)
msg289100 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-06 13:22
The pull request makes functools.partial() faster for positional arguments. It avoids the creation of a tuple for positional arguments. It allocates a small buffer for up to 5 parameters. But it seems like even if the small buffer is not used, it's still faster. Use small buffer, total: 2 positional arguments. haypo@smithers$ ./python -m perf timeit -s 'from functools import partial; f = lambda x, y: None; g = partial(f, 1)' 'g(2)' --duplicate=100 --compare-to ../master-ref/python --python-names=ref:patch --python-names=ref:patch ref: ..................... 138 ns +- 1 ns patch: ..................... 121 ns +- 1 ns Median +- std dev: [ref] 138 ns +- 1 ns -> [patch] 121 ns +- 1 ns: 1.14x faster (-12%) Don't use small buffer, total: 6 positional arguments. haypo@smithers$ ./python -m perf timeit -s 'from functools import partial; f = lambda a1, a2, a3, a4, a5, a6: None; g = partial(f, 1, 2, 3, 4, 5)' 'g(6)' --duplicate=100 --compare-to ../master-ref/python --python-names=ref:patch --python-names=ref:patch ref: ..................... 156 ns +- 1 ns patch: ..................... 136 ns +- 0 ns Median +- std dev: [ref] 156 ns +- 1 ns -> [patch] 136 ns +- 0 ns: 1.15x faster (-13%) Another benchmark with 10 position arguments: haypo@smithers$ ./python -m perf timeit -s 'from functools import partial; f = lambda a1, a2, a3, a4, a5, a6, a7, a8, a9, a10: None; g = partial(f, 1, 2, 3, 4, 5)' 'g(6, 7, 8, 9, 10)' --duplicate=100 --compare-to ../master-ref/python --python-names=ref:patch --python-names=ref:patch ref: ..................... 193 ns +- 1 ns patch: ..................... 166 ns +- 2 ns Median +- std dev: [ref] 193 ns +- 1 ns -> [patch] 166 ns +- 2 ns: 1.17x faster (-14%)
msg289103 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-06 13:32
functools.partial() is commonly used in the the asyncio module. The asyncio doc suggests to use it, because of deliberate limitations of the asyncio API.
msg289112 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-06 14:58
What about C stack consumption? Is not this increase it? Since nested partial()`s are collapsed, you need to interlace them with other wrapper for testing. def decorator(f): def wrapper(args): return f(args) return wrapper def f(*args): pass for i in range(n): f = partial(f) f = decorator(f) f(1, 2)
msg289120 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-06 16:52
If the underlying function doesn't support fast call, and either args or pto->args are empty, partial_call() makes two unneeded copyings. Arguments are copied from a tuple to the raw array and from the array to new tuple. This is what the current code does, but this can be avoided. If the underlying function doesn't support fast call, and both args and pto->args are not empty, patched partial_call() makes one unneeded copyings. Arguments are copied from tuples to the raw array and from the array to the new tuple. Only one copying is needed (from tuples to the new tuple).
msg289578 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-14 10:35
> If the underlying function doesn't support fast call, and both args and pto->args are not empty, patched partial_call() makes one unneeded copyings. The simple workaround is to revert changes using FASTCALL in partial_call(). But for best performances, it seems like we need two code paths depending if the function supports fastcall or not. I will try to write a patch for that.
msg289579 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-14 12:08
bench_fastcall_partial.py: more complete microbenchmark. I rewrote my patch: * I added _PyObject_HasFastCall(callable): return 1 if callable supports FASTCALL calling convention for positional arguments * I splitted partial_call() into 2 subfunctions: partial_fastcall() is specialized for FASTCALL, partial_call_impl() uses PyObject_Call() with a tuple for positional arguments The patch fixes the performance regression for VARARGS and optimize FASTCALL: haypo@smithers$ ./python -m perf compare_to ref.json patch.json --table +-----------------------------+---------+------------------------------+ \| Benchmark \| ref \| patch \| +=============================+=========+==============================+ \| partial Python, 1+1 arg \| 135 ns \| 118 ns: 1.15x faster (-13%) \| +-----------------------------+---------+------------------------------+ \| partial Python, 2+0 arg \| 114 ns \| 91.4 ns: 1.25x faster (-20%) \| +-----------------------------+---------+------------------------------+ \| partial Python, 5+1 arg \| 151 ns \| 135 ns: 1.12x faster (-11%) \| +-----------------------------+---------+------------------------------+ \| partial Python, 5+5 arg \| 192 ns \| 168 ns: 1.15x faster (-13%) \| +-----------------------------+---------+------------------------------+ \| partial C VARARGS, 2+0 arg \| 153 ns \| 127 ns: 1.20x faster (-17%) \| +-----------------------------+---------+------------------------------+ \| partial C FASTCALL, 1+1 arg \| 111 ns \| 93.7 ns: 1.18x faster (-15%) \| +-----------------------------+---------+------------------------------+ \| partial C FASTCALL, 2+0 arg \| 63.9 ns \| 64.6 ns: 1.01x slower (+1%) \| +-----------------------------+---------+------------------------------+ Not significant (1): partial C VARARGS, 1+1 arg
msg289580 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-14 12:10
> What about C stack consumption? Is not this increase it? Yes, my optimization consumes more C stack: small_stack allocates 80 bytes on the stack (for 5 positional arguments). Is it an issue?
msg289582 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-14 13:25
Nice results. You made a great work for decreasing C stack consumption. It would be sad to lose it without good reasons. Could you please compare two variants, with and without small stack?
msg289594 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-14 15:02
I measured that my patch (pull request) increases the stack usage of 64 bytes per partial_call() call. I consider that it's accepted for a speedup between 1.12x faster and 1.25x faster. Attached partial_stack_usage.py requires testcapi_stack_pointer.patch of issue #28870. Original: f(): [1000 calls] 624.0 B per call f2(): [1000 calls] 624.0 B per call Patched: f(): [1000 calls] 688.0 B per call (+64 B) f2(): [1000 calls] 688.0 B per call (+64 B)
msg290183 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-24 22:19
New changeset 0f7b0b397e12514ee213bc727c9939b66585cbe2 by Victor Stinner in branch 'master': bpo-29735: Optimize partial_call(): avoid tuple (#516) https://github.com/python/cpython/commit/0f7b0b397e12514ee213bc727c9939b66585cbe2

History
Date	User	Action	Args
2022-04-11 14:58:43	admin	set	github: 73921
2017-03-24 22:19:27	vstinner	set	messages: + msg290183
2017-03-14 20:42:37	vstinner	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2017-03-14 15:02:03	vstinner	set	files: + partial_stack_usage.py messages: + msg289594
2017-03-14 13:25:02	serhiy.storchaka	set	messages: + msg289582
2017-03-14 12:10:04	vstinner	set	messages: + msg289580
2017-03-14 12:08:12	vstinner	set	files: + bench_fastcall_partial.py messages: + msg289579
2017-03-14 10:35:05	vstinner	set	messages: + msg289578
2017-03-06 16:52:11	serhiy.storchaka	set	messages: + msg289120
2017-03-06 14:58:01	serhiy.storchaka	set	messages: + msg289112 components: + Extension Modules stage: patch review
2017-03-06 13:32:55	vstinner	set	nosy: + rhettinger, ncoghlan
2017-03-06 13:32:22	vstinner	set	nosy: + methane, serhiy.storchaka, yselivanov messages: + msg289103
2017-03-06 13:29:15	vstinner	set	pull_requests: + pull_request425
2017-03-06 13:22:45	vstinner	create