Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Py_ASSUME() macro for __builtin_unreachable() #82328

Closed
sir-sigurd mannequin opened this issue Sep 12, 2019 · 8 comments
Closed

add Py_ASSUME() macro for __builtin_unreachable() #82328

sir-sigurd mannequin opened this issue Sep 12, 2019 · 8 comments

Comments

@sir-sigurd
Copy link
Mannequin

sir-sigurd mannequin commented Sep 12, 2019

BPO 38147
Nosy @vstinner, @sir-sigurd
Files
  • disasm.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-09-12.19:34:23.478>
    labels = []
    title = 'add Py_ASSUME() macro for __builtin_unreachable()'
    updated_at = <Date 2019-09-23.14:57:12.562>
    user = 'https://github.com/sir-sigurd'

    bugs.python.org fields:

    activity = <Date 2019-09-23.14:57:12.562>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2019-09-12.19:34:23.478>
    creator = 'sir-sigurd'
    dependencies = []
    files = ['48614']
    hgrepos = []
    issue_num = 38147
    keywords = []
    message_count = 6.0
    messages = ['352228', '352792', '352793', '352800', '352803', '353027']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'sir-sigurd']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38147'
    versions = []

    @sir-sigurd
    Copy link
    Mannequin Author

    sir-sigurd mannequin commented Sep 12, 2019

    GCC (along with Clang and ICC) has __builtin_unreachable() and MSVC has __assume() builtins, that can be used to optimize out unreachable conditions.

    So we could add macro like this:

    #ifdef Py_DEBUG
    #   define Py_ASSUME(cond) (assert(cond))
    #else
    #   if defined(_MSC_VER)
    #       define Py_ASSUME(cond) (__assume(cond))
    #   elif defined(__GNUC__)
    #       define Py_ASSUME(cond) (cond? (void)0: __builtin_unreachable())
    #   else
    #       define Py_ASSUME(cond) ((void)0);
    #   endif
    #endif

    Here's a pair of really simple examples showing how it can optimize code: https://godbolt.org/z/g9LYXF.

    Real world example. _PyLong_Copy() [1] calls _PyLong_New() [2]. _PyLong_New() checks the size, so that overflow does not occur. This check is redundant when _PyLong_New() is called from _PyLong_Copy(). We could add a function that bypass that check, but in LTO build PyObject_MALLOC() is inlined into _PyLong_New() and it also checks the size. Adding Py_ASSUME((size_t)size <= MAX_LONG_DIGITS) allows to bypass both checks.

    [1]

    _PyLong_Copy(PyLongObject *src)
    {
    PyLongObject *result;
    Py_ssize_t i;
    assert(src != NULL);
    i = Py_SIZE(src);
    if (i < 0)
    i = -(i);
    if (i < 2) {
    sdigit ival = MEDIUM_VALUE(src);
    if (IS_SMALL_INT(ival)) {
    return get_small_int(ival);
    }
    }
    result = _PyLong_New(i);
    if (result != NULL) {
    Py_SIZE(result) = Py_SIZE(src);
    while (--i >= 0)
    result->ob_digit[i] = src->ob_digit[i];
    }
    return (PyObject *)result;
    }

    [2]
    _PyLong_New(Py_ssize_t size)
    {
    PyLongObject *result;
    /* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +
    sizeof(digit)*size. Previous incarnations of this code used
    sizeof(PyVarObject) instead of the offsetof, but this risks being
    incorrect in the presence of padding between the PyVarObject header
    and the digits. */
    if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
    PyErr_SetString(PyExc_OverflowError,
    "too many digits in integer");
    return NULL;
    }
    result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
    size*sizeof(digit));
    if (!result) {
    PyErr_NoMemory();
    return NULL;
    }
    return (PyLongObject*)PyObject_INIT_VAR(result, &PyLong_Type, size);

    @vstinner
    Copy link
    Member

    Real world example. _PyLong_Copy() [1] calls _PyLong_New() [2]. _PyLong_New() checks the size, so that overflow does not occur. This check is redundant when _PyLong_New() is called from _PyLong_Copy(). We could add a function that bypass that check, but in LTO build PyObject_MALLOC() is inlined into _PyLong_New() and it also checks the size. Adding Py_ASSUME((size_t)size <= MAX_LONG_DIGITS) allows to bypass both checks.

    This sounds like a bad usage of __builtin_unreachable().

    _PyLong_New() must always check that size <= MAX_LONG_DIGITS, the check must not be optimized by the compiler.

    __builtin_unreachable() must only be used if the code really be reached by design.

    For example:

    if (...) { Py_FatalError("oops)"; __builtin_unreachable() }

    But it's a bad example, since Py_FatalError is decorated with the "noreturn" attribute, so the compiler should already know that Py_FatalError() never returns.

    @vstinner
    Copy link
    Member

    This check is redundant when _PyLong_New() is called from _PyLong_Copy().

    If you care of _PyLong_Copy() performance, you should somehow manually inline _PyLong_New() inside _PyLong_Copy().

    @sir-sigurd
    Copy link
    Mannequin Author

    sir-sigurd mannequin commented Sep 19, 2019

    If you care of _PyLong_Copy() performance, you should somehow manually inline _PyLong_New() inside _PyLong_Copy().

    It doesn't solve this:

    We could add a function that bypass that check, but in LTO build PyObject_MALLOC() is inlined into _PyLong_New() and it also checks the size. Adding Py_ASSUME((size_t)size <= MAX_LONG_DIGITS) allows to bypass both checks.

    Here's example: sir-sigurd@c8699d0.

    I attach some disassembly from this example compiled with LTO, to demonstrate how the proposed macro affects generated code.

    @vstinner
    Copy link
    Member

    Here's example: sir-sigurd@c8699d0.

    "_Py_ASSUME((size_t)size <= MAX_LONG_DIGITS);"

    Typically, such code use assert() and is removed for release build.

    assert() is more for contract base programming: when the error "cannot" happen at runtime (it would be a programming error).

    For other cases, I prefer to always emit code to handle the error (the error can happen, for example, the function must check inputs), even in release build.

    @vstinner vstinner changed the title add macro for __builtin_unreachable add Py_ASSUME() macro for __builtin_unreachable() Sep 19, 2019
    @vstinner
    Copy link
    Member

    See also bpo-38249.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    Closing as there is no consensus here, so this won't happen without a wider discussion on python-dev (following which this can be reopened).

    @iritkatriel iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2023
    @vstinner
    Copy link
    Member

    The C API has the Py_UNREACHABLE() macro which does basically implement requested feature:

    GCC (along with Clang and ICC) has __builtin_unreachable() and MSVC has __assume() builtins, that can be used to optimize out unreachable conditions.

    Extract of the implementation:

    (...)
    #elif defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5))
    #  define Py_UNREACHABLE() __builtin_unreachable()
    #elif defined(__clang__) || defined(__INTEL_COMPILER)
    #  define Py_UNREACHABLE() __builtin_unreachable()
    #elif defined(_MSC_VER)
    #  define Py_UNREACHABLE() __assume(0)
    #else
    #  define Py_UNREACHABLE() \
        Py_FatalError("Unreachable C code path reached")
    #endif

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants