Issue 37774: Micro-optimize vectorcall using PY_LIKELY

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81955

classification

Title:	Micro-optimize vectorcall using PY_LIKELY
Type:		Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Mark.Shannon, jdemeyer, josh.r, malin, methane, vstinner
Priority:	normal	Keywords:	patch

Created on 2019-08-06 14:00 by jdemeyer, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 15144	open	jdemeyer, 2019-08-06 14:34

Messages (16)
msg349108 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-08-06 14:00
Take the LIKELY/UNLIKELY macros out of Objects/obmalloc.c (renaming them of course). Use them in a few places to micro-optimize vectorcall.
msg349204 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2019-08-07 23:07
Is the intent to make these macros public? If not, shouldn't they be prefixed with an underscore, like _Py_SIZE_ROUND_DOWN/_Py_SIZE_ROUND_UP in the same header?
msg349208 - (view)	Author: Inada Naoki (methane) *	Date: 2019-08-08 01:57
I'm conservative about adding public APIs. But in this case, LIKELY/UNLIKELY macros are battle-tested in the world already. So I think it's safe to make it public. But I'll wait to merge for a week for other core dev's comments.
msg349212 - (view)	Author: Ma Lin (malin) *	Date: 2019-08-08 04:48
How about write a suggestion on when to use them in the comment? For example: > You should use it only in cases when the likeliest branch is > very very very likely, or when the unlikeliest branch is very > very very unlikely. from https://kernelnewbies.org/FAQ/LikelyUnlikely Another discussion about them: https://lwn.net/Articles/420019/
msg349466 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-12 13:13
I dislike adding these macros: # define Py_UNLIKELY(value) __builtin_expect(!!(value), 0) # define Py_LIKELY(value) __builtin_expect(!!(value), 1) It's too easy to introduce a performance regression by mistake. IHMO PGO compilation already defeats the purpose of these macros. You should use PGO build.
msg349467 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-12 13:14
My previous attempt: bpo-29461.
msg349534 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-08-13 09:08
> IHMO PGO compilation already defeats the purpose of these macros. That's certainly true. The question is whether we want to optimize also non-PGO builds.
msg349544 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-13 11:23
> That's certainly true. The question is whether we want to optimize also non-PGO builds. If it's an optimization, can you show a benchmark to validate that it's really faster as expected? I tried for 6 months to run benchmarks without PGO and it was a loss of time: not building with PGO lead to random performances and so can easily lead to bad optimization decisions. See my article: https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html I wrote a serie of articles: https://vstinner.readthedocs.io/benchmark.html#my-articles
msg349552 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-08-13 13:00
> If it's an optimization, can you show a benchmark to validate that it's really faster as expected? Yes, I did test it. I didn't save the results, but I can redo them if you want. If you plan to reject the issue anyway, there is no point. > not building with PGO lead to random performances I would expect that Py_LIKELY/Py_UNLIKELY helps with this. If the compiler doesn't have probability information, it guesses. This can cause unrelated changes to change the heuristics used by the compiler, impacting the performance of some piece of code. With Py_LIKELY/Py_UNLIKELY, the code generated by the compiler should be more stable.
msg349554 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-13 13:35
> I would expect that Py_LIKELY/Py_UNLIKELY helps with this. Well, PGO is made of many optimizations, not only help branch prediction. For example, it detects hot code and tries to put functions which call them each other close in memory to enhance the CPU instruction cache efficiency. I tried to annotate "hot" functions but I failed to get reproducible performance results. commit c6944e7edcacc8960170719623c325aefaf23cac Author: Victor Stinner <victor.stinner@gmail.com> Date: Fri Nov 11 02:13:35 2016 +0100 Issue #28618: Make hot functions using __attribute__((hot)) When Python is not compiled with PGO, the performance of Python on call_simple and call_method microbenchmarks depend highly on the code placement. In the worst case, the performance slowdown can be up to 70%. The GCC __attribute__((hot)) attribute helps to keep hot code close to reduce the risk of such major slowdown. This attribute is ignored when Python is compiled with PGO. The following functions are considered as hot according to statistics collected by perf record/perf report: * _PyEval_EvalFrameDefault() * call_function() * _PyFunction_FastCall() * PyFrame_New() * frame_dealloc() * PyErr_Occurred() But my change didn't work: "Sadly, it seems like I was just lucky when adding __attribute__((hot)) fixed the issue, because call_method is slow again! (...) (+51%)" https://bugs.python.org/issue28618#msg281459 PGO is like black magic for me :-) Note: Since this change, pyperformance evolved: I removed call_method microbenchmark. It was too instable for various reasons.
msg349555 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-13 13:40
Another example to explain my concern. In the Linux kernel, list macros used to prefetch next items: “(...) So the conclusion is: prefetches are absolutely toxic, even if the NULL ones are excluded.” https://lwn.net/Articles/444336/ The lesson is that we should not let developers tune the compiler: let the CPU do that for you :-) For example, CPUs have evolved heuristics to prefetch data for you.
msg349563 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-08-13 14:29
We're not talking about prefetching here. The Py_LIKELY/Py_UNLIKELY macros only affect which of the two code paths in a branch is the "default" one, i.e. the one not involving a jmp.
msg349567 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-08-13 14:44
I claim that adding Py_LIKELY/Py_UNLIKELY will reduce the performance randomness of non-PGO builds. So it would be strange to use that randomness as an argument against this patch.
msg349571 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-13 15:37
> We're not talking about prefetching here. The Py_LIKELY/Py_UNLIKELY macros only affect which of the two code paths in a branch is the "default" one, i.e. the one not involving a jmp. I know. My point about prefetching is that if we provide a way to developers to tune the compiler, they will use it, but it can lead the a performance regression whereas not using these tools would lead to such issue. > I claim that adding Py_LIKELY/Py_UNLIKELY will reduce the performance randomness of non-PGO builds. So it would be strange to use that randomness as an argument against this patch. It's right that Py_LIKELY/Py_UNLIKELY should make performance more reliable. My question is if it is safe to let developers "abuse" it. If these macros are misused, they can lead to a performance regression. I'm not sure that Py_LIKELY/Py_UNLIKELY is always more performant than not using it. For example, if you mark the error path as unlikely: this path will become slower. If the error path is taken "frequently" enough, the slowdown can become significant overall. It's tricky to decide if a code path is likely or not. I would be ok to use Py_UNLIKELY for sanity checks which raise SystemError: these code paths must never be taken in a valid code. I'm more in favor of simply removing these runtime checks in release mode, but that's a different issue :-) (bpo-37406) I'm not sure about judging if the error case raising an exception really deserve a Py_UNLIKELY. Maybe I would be more comfortable if you would split the PR into multiple parts. One PR to add the macros. Then one PR per kind of modified tests. For example, "if (Py_UNLIKELY(tstate->use_tracing))" in ceval.c sounds reasonable, but I would prefer to have the opinion of other core devs. About the macros, I would prefer to make it private to not promote it. It should also have a good explanation why it's better to not use it, except if you really well understand what you do.
msg349576 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-13 15:50
Few more links about likely/__builtin_expect: * GCC documentation says: "In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform." https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html * https://softwareengineering.stackexchange.com/questions/188853/how-much-usage-of-likely-and-unlikely-macros-is-too-much "If you're writing for x86/x64 (and are not using 20-year-old CPUs), the performance gain from using __builtin_expect() will be negligible if any." * http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html "This optimized version [PGO] runs significantly faster (...) than our version that used __builtin_expect()." [When __builtin_expect() is misused] "In this case, unsurprisingly, we made each check run slower (...)" * https://stackoverflow.com/questions/109710/how-do-the-likely-unlikely-macros-in-the-linux-kernel-work-and-what-is-their-ben "Like all such performance optimisations you should only do it after extensive profiling to ensure the code really is in a bottleneck (...)" * https://stackoverflow.com/questions/10922607/gcc-likely-unlikely-macro-usage
msg349685 - (view)	Author: Jeroen Demeyer (jdemeyer) *	Date: 2019-08-14 11:35
> My question is if it is safe to let developers "abuse" it. If these macros are misused, they can lead to a performance regression. I expect people using these macros and PR reviewers to use good judgement when to use these macros. There are many cases where the use of such macros should be uncontroversial, for example when checking for errors. > For example, if you mark the error path as unlikely: this path will become slower. If the error path is taken "frequently" enough, the slowdown can become significant overall. The speedup/slowdown that Py_LIKELY/Py_UNLIKELY gives is quite small, just a few clock cycles. This means that we shouldn't worry about performance regressions if the rest of the code takes much longer anyway. An example of this is raising exceptions, which typically involves PyErr_Format(). So I wouldn't worry too much about error paths getting slower. > About the macros, I would prefer to make it private to not promote it. I don't think it matters whether we call it Py_LIKELY or _Py_LIKELY. People that want to use it will use it anyway. And there is no issue with "provisional API" since this API is well-established in many existing projects.

History
Date	User	Action	Args
2022-04-11 14:59:18	admin	set	github: 81955
2019-08-14 11:35:27	jdemeyer	set	messages: + msg349685
2019-08-13 15:50:21	vstinner	set	messages: + msg349576
2019-08-13 15:37:15	vstinner	set	messages: + msg349571
2019-08-13 14:44:35	jdemeyer	set	messages: + msg349567
2019-08-13 14:29:33	jdemeyer	set	messages: + msg349563
2019-08-13 13:40:18	vstinner	set	messages: + msg349555
2019-08-13 13:35:21	vstinner	set	messages: + msg349554
2019-08-13 13:00:14	jdemeyer	set	messages: + msg349552
2019-08-13 11:23:26	vstinner	set	messages: + msg349544
2019-08-13 09:08:00	jdemeyer	set	messages: + msg349534
2019-08-12 13:14:06	vstinner	set	messages: + msg349467
2019-08-12 13:13:20	vstinner	set	messages: + msg349466
2019-08-08 04:48:11	malin	set	nosy: + malin messages: + msg349212
2019-08-08 01:57:24	methane	set	messages: + msg349208
2019-08-07 23:07:30	josh.r	set	messages: + msg349204
2019-08-07 19:56:31	josh.r	set	nosy: + josh.r
2019-08-06 14:34:39	jdemeyer	set	keywords: + patch stage: patch review pull_requests: + pull_request14881
2019-08-06 14:09:49	serhiy.storchaka	set	nosy: + vstinner
2019-08-06 14:00:15	jdemeyer	create