classification
Title: Micro-optimize vectorcall using PY_LIKELY
Type: Stage: patch review
Components: Interpreter Core Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ma Lin, Mark.Shannon, inada.naoki, jdemeyer, josh.r, vstinner
Priority: normal Keywords: patch

Created on 2019-08-06 14:00 by jdemeyer, last changed 2019-08-14 11:35 by jdemeyer.

Pull Requests
URL Status Linked Edit
PR 15144 open jdemeyer, 2019-08-06 14:34
Messages (16)
msg349108 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-06 14:00
Take the LIKELY/UNLIKELY macros out of Objects/obmalloc.c (renaming them of course). Use them in a few places to micro-optimize vectorcall.
msg349204 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-08-07 23:07
Is the intent to make these macros public? If not, shouldn't they be prefixed with an underscore, like _Py_SIZE_ROUND_DOWN/_Py_SIZE_ROUND_UP in the same header?
msg349208 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-08-08 01:57
I'm conservative about adding public APIs.
But in this case, LIKELY/UNLIKELY macros are battle-tested in the world already.
So I think it's safe to make it public.

But I'll wait to merge for a week for other core dev's comments.
msg349212 - (view) Author: Ma Lin (Ma Lin) * Date: 2019-08-08 04:48
How about write a suggestion on when to use them in the comment?

For example:
> You should use it only in cases when the likeliest branch is
> very very very likely, or when the unlikeliest branch is very
> very very unlikely. 
from https://kernelnewbies.org/FAQ/LikelyUnlikely

Another discussion about them:
https://lwn.net/Articles/420019/
msg349466 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-12 13:13
I dislike adding these macros:

#  define Py_UNLIKELY(value) __builtin_expect(!!(value), 0)
#  define Py_LIKELY(value) __builtin_expect(!!(value), 1)

It's too easy to introduce a performance regression by mistake. IHMO PGO compilation already defeats the purpose of these macros. You should use PGO build.
msg349467 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-12 13:14
My previous attempt: bpo-29461.
msg349534 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-13 09:08
> IHMO PGO compilation already defeats the purpose of these macros.

That's certainly true. The question is whether we want to optimize also non-PGO builds.
msg349544 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-13 11:23
> That's certainly true. The question is whether we want to optimize also non-PGO builds.

If it's an optimization, can you show a benchmark to validate that it's really faster as expected?

I tried for 6 months to run benchmarks without PGO and it was a loss of time: not building with PGO lead to random performances and so can easily lead to bad optimization decisions.

See my article: https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html

I wrote a serie of articles:
https://vstinner.readthedocs.io/benchmark.html#my-articles
msg349552 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-13 13:00
> If it's an optimization, can you show a benchmark to validate that it's really faster as expected?

Yes, I did test it. I didn't save the results, but I can redo them if you want. If you plan to reject the issue anyway, there is no point.

> not building with PGO lead to random performances

I would expect that Py_LIKELY/Py_UNLIKELY helps with this. If the compiler doesn't have probability information, it guesses. This can cause unrelated changes to change the heuristics used by the compiler, impacting the performance of some piece of code. With Py_LIKELY/Py_UNLIKELY, the code generated by the compiler should be more stable.
msg349554 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-13 13:35
> I would expect that Py_LIKELY/Py_UNLIKELY helps with this.

Well, PGO is made of many optimizations, not only help branch prediction.

For example, it detects hot code and tries to put functions which call them each other close in memory to enhance the CPU instruction cache efficiency. I tried to annotate "hot" functions but I failed to get reproducible performance results.

commit c6944e7edcacc8960170719623c325aefaf23cac
Author: Victor Stinner <victor.stinner@gmail.com>
Date:   Fri Nov 11 02:13:35 2016 +0100

    Issue #28618: Make hot functions using __attribute__((hot))
    
    When Python is not compiled with PGO, the performance of Python on call_simple
    and call_method microbenchmarks depend highly on the code placement. In the
    worst case, the performance slowdown can be up to 70%.
    
    The GCC __attribute__((hot)) attribute helps to keep hot code close to reduce
    the risk of such major slowdown. This attribute is ignored when Python is
    compiled with PGO.
    
    The following functions are considered as hot according to statistics collected
    by perf record/perf report:
    
    * _PyEval_EvalFrameDefault()
    * call_function()
    * _PyFunction_FastCall()
    * PyFrame_New()
    * frame_dealloc()
    * PyErr_Occurred()

But my change didn't work: "Sadly, it seems like I was just lucky when adding __attribute__((hot)) fixed the issue, because call_method is slow again! (...)  (+51%)"

https://bugs.python.org/issue28618#msg281459

PGO is like black magic for me :-)

Note: Since this change, pyperformance evolved: I removed call_method microbenchmark. It was too instable for various reasons.
msg349555 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-13 13:40
Another example to explain my concern. In the Linux kernel, list macros used to prefetch next items: “(...) So the conclusion is: prefetches are absolutely toxic, even if the NULL ones are excluded.”

https://lwn.net/Articles/444336/

The lesson is that we should not let developers tune the compiler: let the CPU do that for you :-) For example, CPUs have evolved heuristics to prefetch data for you.
msg349563 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-13 14:29
We're not talking about prefetching here. The Py_LIKELY/Py_UNLIKELY macros only affect which of the two code paths in a branch is the "default" one, i.e. the one not involving a jmp.
msg349567 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-13 14:44
I claim that adding Py_LIKELY/Py_UNLIKELY will reduce the performance randomness of non-PGO builds. So it would be strange to use that randomness as an argument *against* this patch.
msg349571 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-13 15:37
> We're not talking about prefetching here. The Py_LIKELY/Py_UNLIKELY macros only affect which of the two code paths in a branch is the "default" one, i.e. the one not involving a jmp.

I know. My point about prefetching is that if we provide a way to developers to tune the compiler, they will use it, but it can lead the a performance regression whereas not using these tools would lead to such issue.

> I claim that adding Py_LIKELY/Py_UNLIKELY will reduce the performance randomness of non-PGO builds. So it would be strange to use that randomness as an argument *against* this patch.

It's right that Py_LIKELY/Py_UNLIKELY should make performance more reliable. My question is if it is safe to let developers "abuse" it. If these macros are misused, they can lead to a performance regression.

I'm not sure that Py_LIKELY/Py_UNLIKELY is always more performant than not using it. For example, if you mark the error path as unlikely: this path will become slower. If the error path is taken "frequently" enough, the slowdown can become significant overall.

It's tricky to decide if a code path is likely or not.

I would be ok to use Py_UNLIKELY for sanity checks which raise SystemError: these code paths must never be taken in a valid code. I'm more in favor of simply removing these runtime checks in release mode, but that's a different issue :-) (bpo-37406)

I'm not sure about judging if the error case raising an exception really deserve a Py_UNLIKELY.

Maybe I would be more comfortable if you would split the PR into multiple parts. One PR to add the macros. Then one PR per kind of modified tests.

For example, "if (Py_UNLIKELY(tstate->use_tracing))" in ceval.c sounds reasonable, but I would prefer to have the opinion of other core devs.

About the macros, I would prefer to make it private to not promote it. It should also have a good explanation why it's better to not use it, except if you really well understand what you do.
msg349576 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-13 15:50
Few more links about likely/__builtin_expect:

* GCC documentation says: "In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform."

  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

* https://softwareengineering.stackexchange.com/questions/188853/how-much-usage-of-likely-and-unlikely-macros-is-too-much

  "If you're writing for x86/x64 (and are not using 20-year-old CPUs), the performance gain from using __builtin_expect() will be negligible if any."

* http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html

  "This optimized version [PGO] runs significantly faster (...) than our version that used __builtin_expect()."

  [When __builtin_expect() is misused] "In this case, unsurprisingly, we made each check run slower (...)"

* https://stackoverflow.com/questions/109710/how-do-the-likely-unlikely-macros-in-the-linux-kernel-work-and-what-is-their-ben

  "Like all such performance optimisations you should only do it after extensive profiling to ensure the code really is in a bottleneck (...)"

* https://stackoverflow.com/questions/10922607/gcc-likely-unlikely-macro-usage
msg349685 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-14 11:35
> My question is if it is safe to let developers "abuse" it. If these macros are misused, they can lead to a performance regression.

I expect people using these macros and PR reviewers to use good judgement when to use these macros. There are many cases where the use of such macros should be uncontroversial, for example when checking for errors.

> For example, if you mark the error path as unlikely: this path will become slower. If the error path is taken "frequently" enough, the slowdown can become significant overall.

The speedup/slowdown that Py_LIKELY/Py_UNLIKELY gives is quite small, just a few clock cycles. This means that we shouldn't worry about performance regressions if the rest of the code takes much longer anyway. 

An example of this is raising exceptions, which typically involves PyErr_Format(). So I wouldn't worry too much about error paths getting slower.

> About the macros, I would prefer to make it private to not promote it.

I don't think it matters whether we call it Py_LIKELY or _Py_LIKELY. People that want to use it will use it anyway. And there is no issue with "provisional API" since this API is well-established in many existing projects.
History
Date User Action Args
2019-08-14 11:35:27jdemeyersetmessages: + msg349685
2019-08-13 15:50:21vstinnersetmessages: + msg349576
2019-08-13 15:37:15vstinnersetmessages: + msg349571
2019-08-13 14:44:35jdemeyersetmessages: + msg349567
2019-08-13 14:29:33jdemeyersetmessages: + msg349563
2019-08-13 13:40:18vstinnersetmessages: + msg349555
2019-08-13 13:35:21vstinnersetmessages: + msg349554
2019-08-13 13:00:14jdemeyersetmessages: + msg349552
2019-08-13 11:23:26vstinnersetmessages: + msg349544
2019-08-13 09:08:00jdemeyersetmessages: + msg349534
2019-08-12 13:14:06vstinnersetmessages: + msg349467
2019-08-12 13:13:20vstinnersetmessages: + msg349466
2019-08-08 04:48:11Ma Linsetnosy: + Ma Lin
messages: + msg349212
2019-08-08 01:57:24inada.naokisetmessages: + msg349208
2019-08-07 23:07:30josh.rsetmessages: + msg349204
2019-08-07 19:56:31josh.rsetnosy: + josh.r
2019-08-06 14:34:39jdemeyersetkeywords: + patch
stage: patch review
pull_requests: + pull_request14881
2019-08-06 14:09:49serhiy.storchakasetnosy: + vstinner
2019-08-06 14:00:15jdemeyercreate