This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add Py_ASSUME() macro for __builtin_unreachable()
Type: Stage:
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: sir-sigurd, vstinner
Priority: normal Keywords:

Created on 2019-09-12 19:34 by sir-sigurd, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
disasm.txt sir-sigurd, 2019-09-19 15:27
Messages (6)
msg352228 - (view) Author: Sergey Fedoseev (sir-sigurd) * Date: 2019-09-12 19:34
GCC (along with Clang and ICC) has __builtin_unreachable() and MSVC has __assume() builtins, that can be used to optimize out unreachable conditions.

So we could add macro like this:

#ifdef Py_DEBUG
#   define Py_ASSUME(cond) (assert(cond))
#else
#   if defined(_MSC_VER)
#       define Py_ASSUME(cond) (__assume(cond))
#   elif defined(__GNUC__)
#       define Py_ASSUME(cond) (cond? (void)0: __builtin_unreachable())
#   else
#       define Py_ASSUME(cond) ((void)0);
#   endif
#endif

Here's a pair of really simple examples showing how it can optimize code: https://godbolt.org/z/g9LYXF.

Real world example. _PyLong_Copy() [1] calls _PyLong_New() [2]. _PyLong_New() checks the size, so that overflow does not occur. This check is redundant when _PyLong_New() is called from _PyLong_Copy(). We could add a function that bypass that check, but in LTO build PyObject_MALLOC() is inlined into _PyLong_New() and it also checks the size. Adding Py_ASSUME((size_t)size <= MAX_LONG_DIGITS) allows to bypass both checks. 

[1] https://github.com/python/cpython/blob/3a4f66707e824ef3a8384827590ebaa6ca463dc0/Objects/longobject.c#L287-L309
[2] https://github.com/python/cpython/blob/3a4f66707e824ef3a8384827590ebaa6ca463dc0/Objects/longobject.c#L264-L283
msg352792 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-19 13:43
> Real world example. _PyLong_Copy() [1] calls _PyLong_New() [2]. _PyLong_New() checks the size, so that overflow does not occur. This check is redundant when _PyLong_New() is called from _PyLong_Copy(). We could add a function that bypass that check, but in LTO build PyObject_MALLOC() is inlined into _PyLong_New() and it also checks the size. Adding Py_ASSUME((size_t)size <= MAX_LONG_DIGITS) allows to bypass both checks. 

This sounds like a bad usage of __builtin_unreachable().

_PyLong_New() must always check that size <= MAX_LONG_DIGITS, the check must not be optimized by the compiler.

__builtin_unreachable() must only be used if the code really be reached by design.

For example:

if (...) { Py_FatalError("oops)"; __builtin_unreachable() }

But it's a bad example, since Py_FatalError is decorated with the "noreturn" attribute, so the compiler should already know that Py_FatalError() never returns.
msg352793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-19 13:44
> This check is redundant when _PyLong_New() is called from _PyLong_Copy().

If you care of _PyLong_Copy() performance, you should somehow manually inline _PyLong_New() inside _PyLong_Copy().
msg352800 - (view) Author: Sergey Fedoseev (sir-sigurd) * Date: 2019-09-19 15:27
> If you care of _PyLong_Copy() performance, you should somehow manually inline _PyLong_New() inside _PyLong_Copy().

It doesn't solve this:

> We could add a function that bypass that check, but in LTO build PyObject_MALLOC() is inlined into _PyLong_New() and it also checks the size. Adding Py_ASSUME((size_t)size <= MAX_LONG_DIGITS) allows to bypass both checks.

Here's example: https://github.com/sir-sigurd/cpython/commit/c8699d0c614a18d558216ae7d432107147c95c28.

I attach some disassembly from this example compiled with LTO, to demonstrate how the proposed macro affects generated code.
msg352803 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-19 15:50
> Here's example: https://github.com/sir-sigurd/cpython/commit/c8699d0c614a18d558216ae7d432107147c95c28.

"_Py_ASSUME((size_t)size <= MAX_LONG_DIGITS);"

Typically, such code use assert() and is removed for release build.

assert() is more for contract base programming: when the error "cannot" happen at runtime (it would be a programming error).

For other cases, I prefer to always emit code to handle the error (the error can happen, for example, the function must check inputs), even in release build.
msg353027 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-23 14:57
See also bpo-38249.
History
Date User Action Args
2022-04-11 14:59:20adminsetgithub: 82328
2019-09-23 14:57:12vstinnersetmessages: + msg353027
2019-09-19 15:51:35vstinnersettitle: add macro for __builtin_unreachable -> add Py_ASSUME() macro for __builtin_unreachable()
2019-09-19 15:50:37vstinnersetmessages: + msg352803
2019-09-19 15:27:43sir-sigurdsetfiles: + disasm.txt

messages: + msg352800
2019-09-19 13:44:07vstinnersetmessages: + msg352793
2019-09-19 13:43:13vstinnersetnosy: + vstinner
messages: + msg352792
2019-09-12 19:34:23sir-sigurdcreate