Issue 36616: Optimize thread state handling in function call code

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80797

classification

Title:	Optimize thread state handling in function call code
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.9, Python 3.8

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Mark.Shannon, jdemeyer, petr.viktorin, vstinner
Priority:	normal	Keywords:	patch

Created on 2019-04-12 16:25 by jdemeyer, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 12839	closed	jdemeyer, 2019-04-15 13:24

Messages (8)
msg340078 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-04-12 16:25
The bytecode interpreter uses an inline function call_function() to handle most function calls. To check for profiling, call_function() needs to call to PyThreadState_GET(). In the reference implementation of PEP 590, I saw that we can remove these PyThreadState_GET() calls by passing the thread state from the main eval loop to call_function(). I suggest to apply this optimization now, because they make sense independently of PEP 580 and PEP 590 and to give a better baseline for performance comparisons.
msg340161 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-04-13 14:39
Mark, Petr, do you agree? I like the way how the reference implementation of PEP 590 improves the handling of profiling. However, that change really has little to do with PEP 590, it's something that we can do independently.
msg340242 - (view)	Author: Petr Viktorin (petr.viktorin) *	Date: 2019-04-15 08:29
Indeed, this is a good idea.
msg340243 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-04-15 08:54
I wanted to do that, but I never measured the overhead of PyThreadState_GET() calls. Subtle detail: using _PyThreadState_GET() ("_Py") prefix rather than PyThreadState_GET() ensures that you get the optimized macro ;-)
msg340270 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-04-15 13:29
The gain is small, but it's there. I made some further changes: - replacing code of the form sp = stack_pointer; call_function(..., &sp, ...) stack_pointer = sp; by call_function(..., &stack_pointer, ...) - fold the inline function do_call_core() in the main eval loop (the function became so small that there was no longer a reason to pull it out of the main loop) - removed pointless check PyMethod_GET_SELF(func) != NULL (methods always have self != NULL)
msg340290 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-04-15 16:10
> The gain is small, but it's there. Do you mean a performance speedup? If yes, can you please run a micro-benchmark?
msg340341 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-04-16 13:12
Jeroen Demeyer closed his PR 12839, so I close the issue as well.
msg340790 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-04-24 16:28
See also my PR 12934 which includes a similar change but for correctness, not for optimization.

History
Date	User	Action	Args
2022-04-11 14:59:13	admin	set	github: 80797
2019-04-24 16:28:20	vstinner	set	messages: + msg340790
2019-04-16 13:12:18	vstinner	set	status: open -> closed resolution: not a bug messages: + msg340341 stage: patch review -> resolved
2019-04-15 16:10:12	vstinner	set	messages: + msg340290
2019-04-15 13:29:22	jdemeyer	set	messages: + msg340270
2019-04-15 13:24:15	jdemeyer	set	keywords: + patch stage: patch review pull_requests: + pull_request12764
2019-04-15 08:54:33	vstinner	set	nosy: + vstinner messages: + msg340243
2019-04-15 08:29:29	petr.viktorin	set	messages: + msg340242
2019-04-13 14:39:08	jdemeyer	set	messages: + msg340161
2019-04-12 16:25:47	jdemeyer	set	type: performance
2019-04-12 16:25:26	jdemeyer	create