Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Py_INCREF() and PyObject_INIT() to inlined functions #79240

Closed
vstinner opened this issue Oct 24, 2018 · 55 comments
Closed

Convert Py_INCREF() and PyObject_INIT() to inlined functions #79240

vstinner opened this issue Oct 24, 2018 · 55 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

BPO 35059
Nosy @mdickinson, @vstinner, @benjaminp, @ericsnowcurrently, @serhiy-storchaka, @zooba, @srinivasreddy, @aaronchall, @pablogsal, @miss-islington
PRs
  • bpo-35059: Convert _Py_NewReference() to function #10076
  • bpo-35059: Convert PyObject_INIT() to function #10077
  • bpo-35059: Convert Py_INCREF() to static inline function #10079
  • bpo-35059: Add Py_STATIC_INLINE() macro #10093
  • bpo-35059 : Add /Ob1 flag when building pythoncore in debug mode #10094
  • Revert "bpo-35059 : Add /Ob1 flag when building pythoncore in debug mode" #10127
  • bpo-35059, libmpdec: Add missing EXTINLINE in mpdecimal.h #10128
  • [3.6] bpo-35059, libmpdec: Add missing EXTINLINE in mpdecimal.h (GH-10128) #10134
  • [3.7] bpo-35059, libmpdec: Add missing EXTINLINE in mpdecimal.h (GH-10128) #10135
  • bpo-35059: Remove Py_STATIC_INLINE() macro #10216
  • bpo-35059: Convert _Py_Dealloc() to static inline function #10223
  • bpo-35059: Convert Py_XINCREF() to static inline function #10224
  • WIP: bpo-35059: _PyThreadState_GET() checks that the GIL is hold #10278
  • bpo-35059: Enhance _PyObject_AssertFailed() #10642
  • bpo-35059: Enhance _PyObject_AssertFailed() #10642
  • bpo-35059: Enhance _PyObject_AssertFailed() #10642
  • bpo-35059: Convert _PyObject_GC_TRACK() to inline function #10643
  • bpo-35059: Convert _PyObject_GC_TRACK() to inline function #10643
  • bpo-35059: Convert _PyObject_GC_TRACK() to inline function #10643
  • bpo-35059: Add _PyObject_CAST() macro #10645
  • bpo-35059: Cleanup load_extension() of _pickle.c #10647
  • bpo-35059: Cleanup usage of Python macros #10648
  • [WIP] bpo-35059: Add assertion to _PyObject_CAST() #10649
  • bpo-35059: Cast void* to PyObject* #10650
  • WIP: bpo-35059: Replace inline with macros #10669
  • bpo-35059: Add NEWS entry #10671
  • bpo-35059: PyObject_INIT() casts to PyObject* #10674
  • bpo-35059: Enhance _PyObject_GC_TRACK() macros #20931
  • Files
  • 2018-11-22_17-38-master-3bb183d7fb83-patch-10669.json.gz
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-11-23.15:26:57.489>
    created_at = <Date 2018-10-24.14:31:56.305>
    labels = ['interpreter-core', '3.8']
    title = 'Convert Py_INCREF() and PyObject_INIT() to inlined functions'
    updated_at = <Date 2021-09-03.13:27:37.824>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-09-03.13:27:37.824>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-11-23.15:26:57.489>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2018-10-24.14:31:56.305>
    creator = 'vstinner'
    dependencies = []
    files = ['47943']
    hgrepos = []
    issue_num = 35059
    keywords = ['patch']
    message_count = 55.0
    messages = ['328367', '328376', '328408', '328416', '328418', '328420', '328421', '328423', '328424', '328426', '328427', '328450', '328514', '328529', '328542', '328543', '328552', '328557', '328570', '328577', '328578', '328622', '328746', '328757', '328759', '328760', '328823', '328829', '328830', '328855', '328857', '328919', '329527', '329842', '330225', '330226', '330229', '330230', '330239', '330302', '330305', '330310', '330314', '330316', '330320', '330321', '330323', '330329', '330330', '330338', '330353', '330363', '330365', '371733', '400991']
    nosy_count = 10.0
    nosy_names = ['mark.dickinson', 'vstinner', 'benjamin.peterson', 'eric.snow', 'serhiy.storchaka', 'steve.dower', 'thatiparthy', 'Aaron Hall', 'pablogsal', 'miss-islington']
    pr_nums = ['10076', '10077', '10079', '10093', '10094', '10127', '10128', '10134', '10135', '10216', '10223', '10224', '10278', '10642', '10642', '10642', '10643', '10643', '10643', '10645', '10647', '10648', '10649', '10650', '10669', '10671', '10674', '20931']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue35059'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    CPython has been created in 1990. In 1990, it made sense to use C macros. But nowadays, inlined functions can be used instead:

    "Python versions greater than or equal to 3.6 use C89 with several select C99 features: (...) static inline functions"
    https://www.python.org/dev/peps/pep-0007/#c-dialect

    I propose to convert 4 macros to inlined functions:

    • PyObject_INIT(), PyObject_INIT_VAR()
    • _Py_NewReference(), _Py_ForgetReference()

    Advantages:

    • Functions use regular C syntax
    • No more corner cases ("traps") of macros
    • Function arguments have a type

    Drawbacks:

    • Require a specific type can introduce compiler warnings if the caller doesn't pass the proper type (PyObject* or PyVarObject*). _Py_NewReference() and _Py_ForgetReference() seem to be properly used, but not PyObject_INIT() and PyObject_INIT_VAR().

    The two attached PRs implements these changes.

    @vstinner vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 24, 2018
    @vstinner vstinner changed the title Convert PyObject_INIT() and _Py_NewReference() to inlined functions Convert Py_INCREF() and PyObject_INIT() to inlined functions Oct 24, 2018
    @vstinner
    Copy link
    Member Author

    Context: I was unhappy with _Py_NewReference() macro implementation when I had to modify it to fix a bug, bpo-35053. That's why I would like to convert it to a static inline function.

    @benjaminp
    Copy link
    Contributor

    Does this slow down debug builds at all? It probably will not end will if Py_INCREF is ever not inlined.

    @vstinner
    Copy link
    Member Author

    I tested PR 10079 using gdb on Fedora 28 with GCC 8.1.1 to check if Py_INCREF/Py_DECREF functions are inlined.

    I understand that "static inline void Py_INCREF()" is *not* inline by gcc -O0, but it *is* inlined using gcc -Og which is the *default* optimization level of ./configure --with-debug.

    To develop on Python, I force -O0, because the compilation time matters more than runtime performance for me :-) Compilation on my laptop using MAKEFLAGS=-j9:

    • -O0: 21s
    • -0g: 36s (1.7x slower)

    ... But Py_INCREF/DECREF are always inlined, even with -O0, when using __attribute__((always_inline))! I will work on a change to use that.

    == gcc -O0 ==

    (gdb) disassemble Py_IncRef
    Dump of assembler code for function Py_IncRef:
    ...
    0x000000000047276a <+20>: cmpq $0x0,-0x8(%rbp)
    0x000000000047276f <+25>: je 0x47277d <Py_IncRef+39>
    0x0000000000472771 <+27>: mov -0x8(%rbp),%rax
    0x0000000000472775 <+31>: mov %rax,%rdi
    0x0000000000472778 <+34>: callq 0x472523 <_Py_INCREF>
    0x000000000047277d <+39>: nop
    0x000000000047277e <+40>: leaveq
    0x000000000047277f <+41>: retq

    (gdb) disassemble Py_DecRef
    Dump of assembler code for function Py_DecRef:
    ...
    0x0000000000472794 <+20>: cmpq $0x0,-0x8(%rbp)
    0x0000000000472799 <+25>: je 0x4727b1 <Py_DecRef+49>
    0x000000000047279b <+27>: mov -0x8(%rbp),%rax
    0x000000000047279f <+31>: mov %rax,%rdx
    0x00000000004727a2 <+34>: mov $0xe1,%esi
    0x00000000004727a7 <+39>: mov $0x65c550,%edi
    0x00000000004727ac <+44>: callq 0x472554 <_Py_DECREF>
    0x00000000004727b1 <+49>: nop
    0x00000000004727b2 <+50>: leaveq
    0x00000000004727b3 <+51>: retq

    == gcc -Og ==

    (gdb) disassemble Py_IncRef
    Dump of assembler code for function Py_IncRef:
    0x0000000000462de2 <+0>: test %rdi,%rdi
    0x0000000000462de5 <+3>: je 0x462dfb <Py_IncRef+25>
    0x0000000000462de7 <+5>: addq $0x1,0x4bfc09(%rip) # 0x9229f8 <_Py_RefTotal>
    0x0000000000462def <+13>: mov 0x10(%rdi),%rax
    0x0000000000462df3 <+17>: add $0x1,%rax
    0x0000000000462df7 <+21>: mov %rax,0x10(%rdi)
    0x0000000000462dfb <+25>: retq

    (gdb) disassemble Py_DecRef
    Dump of assembler code for function Py_DecRef:
    0x0000000000463b2e <+0>: test %rdi,%rdi
    0x0000000000463b31 <+3>: je 0x463b6f <Py_DecRef+65>
    0x0000000000463b33 <+5>: sub $0x8,%rsp
    0x0000000000463b37 <+9>: subq $0x1,0x4beeb9(%rip) # 0x9229f8 <_Py_RefTotal>
    0x0000000000463b3f <+17>: mov 0x10(%rdi),%rax
    0x0000000000463b43 <+21>: sub $0x1,%rax
    0x0000000000463b47 <+25>: mov %rax,0x10(%rdi)
    0x0000000000463b4b <+29>: je 0x463b68 <Py_DecRef+58>
    0x0000000000463b4d <+31>: js 0x463b54 <Py_DecRef+38>
    0x0000000000463b4f <+33>: add $0x8,%rsp
    0x0000000000463b53 <+37>: retq
    0x0000000000463b54 <+38>: mov %rdi,%rdx
    0x0000000000463b57 <+41>: mov $0xe1,%esi
    0x0000000000463b5c <+46>: mov $0x5e1120,%edi
    0x0000000000463b61 <+51>: callq 0x462da4 <_Py_NegativeRefcount>
    0x0000000000463b66 <+56>: jmp 0x463b4f <Py_DecRef+33>
    0x0000000000463b68 <+58>: callq 0x463b0c <_Py_Dealloc>
    0x0000000000463b6d <+63>: jmp 0x463b4f <Py_DecRef+33>
    0x0000000000463b6f <+65>: retq

    @vstinner
    Copy link
    Member Author

    I modified PR 10079 to add a Py_STATIC_INLINE(TYPE) macro:

    • Use __attribute__((always_inline)) with GCC and clang
    • Use __forceinline with MSVC

    Tests on Linux, example:

    "./configure --with-pydebug CC=clang CFLAGS="-O0" && make clean && make platform"

    • Linux, gcc -O0: inlined
    • Linux, clang -O0: inlined

    Test done on Fedora 28 with GCC 8.1.1 and clang 6.0.1.

    @vstinner
    Copy link
    Member Author

    On Windows, a Debug build doesn't inline Py_INCREF/DECREF even if it uses __forceinline. I looked at the Py_IncRef() and Py_DecRef() assembly in Visual Studio using a breakpoint.

    Using /Ob1, Py_INCREF/DECREF are inlined as expected. I set this option in the pythoncore project.

    Do you think that I should modify the 38 other projects of the pcbuild solution to enable /Ob1 in debug build?

    Documentations.

    Inline Functions (C++):
    https://docs.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=vs-2017

    -Od: disable optimization ("d" stands for Debug)
    https://msdn.microsoft.com/en-us/library/aafb762y.aspx

    /Ob (Inline Function Expansion):
    https://msdn.microsoft.com/en-us/library/47238hez.aspx

    @vstinner
    Copy link
    Member Author

    Ok, I confirm that Py_XINCREF() is properly inlined in Py_IncRef() with the latest version of my PR 10079.

    I tested:

    • gcc -O0
    • clang -O0
    • MSVC: x64 build in Debug mode

    @serhiy-storchaka
    Copy link
    Member

    Is it guarantied that static inline functions will be inlined and will be not called as external functions from the Python library? The latter would break binary compatibility of extensions compiled with newer Python headers with older binary Python libraries.

    @vstinner
    Copy link
    Member Author

    Is it guarantied that static inline functions will be inlined and will be not called as external functions from the Python library? The latter would break binary compatibility of extensions compiled with newer Python headers with older binary Python libraries.

    First, I'm not sure that the "stable ABI" really works in practice, there are many issues:
    https://pythoncapi.readthedocs.io/

    Converting the macro to a static inline function is a minor step in the direction of a more stable API and ABI.

    To come back to your question: depending on the compiler and compiler options, Py_INCREF/DECREF() can generate a *function call*. I tuned Py_STATIC_INLINE() to always inline Py_INCREF/DECREF() for GCC, clang and MSVC which are the major compilers used by Python on Windows, Linux, macOS and FreeBSD.

    "static inline" is still something new to me. Maybe someone will come with a compiler with which it doesn't work as expected. IMHO we have to go through these issues, and it's only be testing for real that we will see these issues.

    I mean, we *have to* get ride of these ugly macros used in Python header files. They are causing subtle bugs and are hard to maintain. See my first message for advantages of static inline functions.

    @vstinner
    Copy link
    Member Author

    Is it guarantied that static inline functions will be inlined and will be not called as external functions from the Python library? The latter would break binary compatibility of extensions compiled with newer Python headers with older binary Python libraries.

    Ok, I made more checks. In short, PR 10079 has no impact on the ABI compatibility.

    I modified Py_STATIC_INLINE() to remove __attribute__((always_inline)).

    In this case, the ./python binary contains multiple "instances" (what's the correct name for that?) of the "_Py_INCREF" inline function:
    ---
    vstinner@apu$ readelf -sW ./python | c++filt -t |grep -E '(INC|DEC)REF'
    42: 000000000041f908 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    43: 000000000041f939 121 FUNC LOCAL DEFAULT 13 _Py_DECREF
    109: 00000000004234cb 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    ...
    5639: a486d 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    5786: a7abc 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    5801: a7f0f 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    ...
    8126: 00000000006011c5 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    8127: 00000000006011f6 121 FUNC LOCAL DEFAULT 13 _Py_DECREF
    8140: 0000000000601971 49 FUNC LOCAL DEFAULT 13 _Py_INCREF
    8141: 00000000006019a2 121 FUNC LOCAL DEFAULT 13 _Py_DECREF
    ...
    ---

    These functions are *LOCAL*, I understand that they are not exported.

    I also checked the _struct module:

    vstinner@apu$ readelf -sW build/lib.linux-x86_64-3.8-pydebug/_struct.cpython-38dm-x86_64-linux-gnu.so | c++filt -t |grep -E '(INC|DEC)REF'
    40: 0000000000002e99 55 FUNC LOCAL DEFAULT 11 _Py_INCREF
    41: 0000000000002ed0 127 FUNC LOCAL DEFAULT 11 _Py_DECREF

    Again, these functions are "LOCAL", not imported nor exported.

    I undertand that _struct.so doesn't depend on libpython for INCREF/DECREF: it contains its own "implementation".

    @vstinner
    Copy link
    Member Author

    Interesting article on inline/static inline and symbols:
    https://gist.github.com/htfy96/50308afc11678d2e3766a36aa60d5f75

    @vstinner
    Copy link
    Member Author

    New changeset 18618e6 by Victor Stinner in branch 'master':
    bpo-35059: Add Py_STATIC_INLINE() macro (GH-10093)
    18618e6

    @benjaminp
    Copy link
    Contributor

    Why do we need this Py_STATIC_INLINE macro? If you want to have one for enabling always inline, that's fine with me, since it's compiler-specific. But the current Py_STATIC_INLINE macro seems to conflate linkage with always-inline behavior.

    @vstinner
    Copy link
    Member Author

    Benjamin:

    Why do we need this Py_STATIC_INLINE macro? If you want to have one for enabling always inline, that's fine with me, since it's compiler-specific.

    For about the name, there is already Py_LOCAL_INLINE which also uses "static inline", but has a very different usage: "Py_LOCAL can be used instead of static to get the fastest possible calling convention for functions that are local to a given module." So I chise "Py_STATIC_INLINE" name, it describes the basic implementation ("static inline TYPE").

    Py_STATIC_INLINE() is designed to replace a preprocessor macro with a function, when you care that the code is "always inlined". (Maybe the name is not perfect ;-))

    Honestly, I'm not sure that it really matters that the function is "always" inlined. The main issue is that "static" requires to duplicate the function in each file which uses the macro, when the function cannot/is not inlined.

    Previously, you asked me:

    Does this slow down debug builds at all? It probably will not end will if Py_INCREF is ever not inlined.

    That's why I wrote Py_STATIC_INLINE() to "force" inlining and PR 10094 to enable inlining for Debug build on MSVC.

    But the current Py_STATIC_INLINE macro seems to conflate linkage with always-inline behavior.

    I'm not sure that I get it. Do you talk about "static" inside the macro? bpo-33407 modifies Py_DEPRECATED() to support Visual Stuido, but it requires to modify how the macro is used: it now must be used at the start, rather than at the end:
    https://github.com/python/cpython/pull/8980/files

    I chose to put "static" and "inline" in the same macro. We already have many other similar macros like PyAPI_FUNC(TYPE) and Py_LOCAL(TYPE). I would like to have a common way to "declare a function behaving as a macro".

    Please see also the discussion on the PR itself, Neil discuss what's the best way to declare an inline function:
    #10079 (comment)

    By the way, was it you who required "static inline" support in PEP-7? :-)

    @vstinner
    Copy link
    Member Author

    New changeset b4435e2 by Victor Stinner in branch 'master':
    bpo-35059: Convert PyObject_INIT() to function (GH-10077)
    b4435e2

    @vstinner
    Copy link
    Member Author

    New changeset a05bef4 by Victor Stinner in branch 'master':
    bpo-35059, PCbuild: Expand inline funcs in Debug (GH-10094)
    a05bef4

    @vstinner
    Copy link
    Member Author

    bpo-35059, PCbuild: Expand inline funcs in Debug (GH-10094)
    a05bef4

    Too bad: this change broke the compilation on AMD64 Windows7 SP1 3.x, AMD64 Windows8 3.x and AMD64 Windows10 3.x:
    #10094 (comment)

    I don't understand why: it works well on my Windows 10 VM with my up to date Visual Studio Community 2017 (version 15.8.8). The compilation also worked on AppVeyor.

    For me, the only explanation is that buildbots use older compiler versions.

    Note: I checked that _decimal is compiled on my VM.

    @vstinner
    Copy link
    Member Author

    New changeset 3b1cba3 by Victor Stinner in branch 'master':
    bpo-35059, libmpdec: Add missing EXTINLINE in mpdecimal.h (GH-10128)
    3b1cba3

    @vstinner
    Copy link
    Member Author

    Too bad: this change broke the compilation on AMD64 Windows7 SP1 3.x, AMD64 Windows8 3.x and AMD64 Windows10 3.x: (...)

    I checked buildbots: my PR 10128 fixed all Windows buildbots, good.

    @miss-islington
    Copy link
    Contributor

    New changeset 95cfb81 by Miss Islington (bot) in branch '3.7':
    bpo-35059, libmpdec: Add missing EXTINLINE in mpdecimal.h (GH-10128)
    95cfb81

    @miss-islington
    Copy link
    Contributor

    New changeset 7eac88a by Miss Islington (bot) in branch '3.6':
    bpo-35059, libmpdec: Add missing EXTINLINE in mpdecimal.h (GH-10128)
    7eac88a

    @vstinner
    Copy link
    Member Author

    See also bpo-35081: "Rename Include/internals/ to Include/pycore/" which is linked to this issue.

    @benjaminp
    Copy link
    Contributor

    On Fri, Oct 26, 2018, at 03:18, STINNER Victor wrote:

    STINNER Victor <vstinner@redhat.com> added the comment:

    Benjamin:
    > Why do we need this Py_STATIC_INLINE macro? If you want to have one for enabling always inline, that's fine with me, since it's compiler-specific.

    For about the name, there is already Py_LOCAL_INLINE which also uses
    "static inline", but has a very different usage: "Py_LOCAL can be used
    instead of static to get the fastest possible calling convention for
    functions that are local to a given module." So I chise
    "Py_STATIC_INLINE" name, it describes the basic implementation ("static
    inline TYPE").

    I would like to see Py_LOCAL_INLINE removed, too, fwiw.

    Py_STATIC_INLINE() is designed to replace a preprocessor macro with a
    function, when you care that the code is "always inlined". (Maybe the
    name is not perfect ;-))

    "always inline" is different from "static inline". So, it's not appropriate to make a macro named the latter that has the former former.

    Honestly, I'm not sure that it really matters that the function is
    "always" inlined. The main issue is that "static" requires to duplicate
    the function in each file which uses the macro, when the function
    cannot/is not inlined.

    Previously, you asked me:

    > Does this slow down debug builds at all? It probably will not end will if Py_INCREF is ever not inlined.

    That's why I wrote Py_STATIC_INLINE() to "force" inlining and PR 10094
    to enable inlining for Debug build on MSVC.

    > But the current Py_STATIC_INLINE macro seems to conflate linkage with always-inline behavior.

    I'm not sure that I get it. Do you talk about "static" inside the macro?
    bpo-33407 modifies Py_DEPRECATED() to support Visual Stuido, but it
    requires to modify how the macro is used: it now must be used at the
    start, rather than at the end:
    https://github.com/python/cpython/pull/8980/files

    I chose to put "static" and "inline" in the same macro. We already have
    many other similar macros like PyAPI_FUNC(TYPE) and Py_LOCAL(TYPE). I
    would like to have a common way to "declare a function behaving as a
    macro".

    We don't want functions that behave like macros... otherwise, we would be writing macros. If we want a function to always be inlined, we should explicitly request that from the compiler.

    Please see also the discussion on the PR itself, Neil discuss what's the
    best way to declare an inline function:
    #10079 (comment)

    By the way, was it you who required "static inline" support in PEP-7? :-)

    Yes, which is why we shouldn't need a macro to write it.

    @vstinner
    Copy link
    Member Author

    I would like to see Py_LOCAL_INLINE removed, too, fwiw.

    Oh. Why? Do you want to directly use "static" and "static inline"?

    I guess that Py_LOCAL and Py_LOCAL_INLINE have been added to use __fastcall with MSVC.

    ... __fastcall is mostly interesting in x86 (32-bit), but x86-64 calling convention is "fast" by default no? __fastcall pass the first two arguments in registers, but x86-64 already pass the first four arguments in registers...

    Do you mean that __fastcall is no longer revelant?

    @vstinner
    Copy link
    Member Author

    > Py_STATIC_INLINE() is designed to replace a preprocessor macro with a
    > function, when you care that the code is "always inlined". (Maybe the
    > name is not perfect ;-))

    "always inline" is different from "static inline". So, it's not appropriate to make a macro named the latter that has the former former.

    Oh ok. So if we decide to keep it, it should be renamed to Py_STATIC_ALMOST_ALWAYS_INLINE() or something like that :-)

    We don't want functions that behave like macros... otherwise, we would be writing macros. If we want a function to always be inlined, we should explicitly request that from the compiler.

    Oh. It seems like I misunderstood you.

    I understood that you required to have zero impact on performance on debug build.

    *I* want to use functions because it's the regular C language: regular scope rules, no preprocessor magic, the compiler detects errors if the function is misused, etc.

    But I'm not sure about the drawbacks of converting a macro to a function. I don't want to be the only one responsible to regressions :-) If you support the change, we can drop "__attribute__((always_inline))" and use "regular" "static inline" functions :-)

    > By the way, was it you who required "static inline" support in PEP-7? :-)

    Yes, which is why we shouldn't need a macro to write it.

    Ok ok.

    I updated my PR 10079 to simply use "static inline".

    @vstinner
    Copy link
    Member Author

    If my PR 10079 is merged without Py_STATIC_INLINE(), I will remove the macro.

    I'm not sure if we need an "always inline" macro or not.

    Note: I'm in favor of moving closer to the C language and not abusing __attribute__(...) :-)

    @vstinner
    Copy link
    Member Author

    New changeset 2aaf0c1 by Victor Stinner in branch 'master':
    bpo-35059: Convert Py_INCREF() to static inline function (GH-10079)
    2aaf0c1

    @vstinner
    Copy link
    Member Author

    bpo-35059: Remove Py_STATIC_INLINE() macro (GH-10216)

    Here you have Benjamin, it's gone :-)

    @vstinner
    Copy link
    Member Author

    TODO: Convert _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() macros to static inline functions, but there are inter-dependencies issues: see bpo-35081 and
    https://mail.python.org/pipermail/python-dev/2018-October/155592.html

    @vstinner
    Copy link
    Member Author

    New changeset 541497e by Victor Stinner in branch 'master':
    bpo-35059: Convert Py_XINCREF() to static inline function (GH-10224)
    541497e

    @vstinner
    Copy link
    Member Author

    New changeset 3c09dca by Victor Stinner in branch 'master':
    bpo-35059: Convert _Py_Dealloc() to static inline function (GH-10223)
    3c09dca

    @pablogsal
    Copy link
    Member

    I am collecting information on how different compilers behave with the static inline. I am checking the following compilers:

    ARM GCC
    AVR GCC
    CLANG X86_64
    CLANG RISC-V
    ELLCC
    GCC X86_64
    ICC x86_64
    MIPS GCC
    MSP GCC
    POWERPC GCC
    AIX Compiler (xlc)
    SunPro Compilers

    will report back when I have enough information on these.

    @vstinner
    Copy link
    Member Author

    See also bpo-35230: "Remove _Py_REF_DEBUG_COMMA".

    @vstinner
    Copy link
    Member Author

    New changeset f1d002c by Victor Stinner in branch 'master':
    bpo-35059: Enhance _PyObject_AssertFailed() (GH-10642)
    f1d002c

    @vstinner
    Copy link
    Member Author

    New changeset 271753a by Victor Stinner in branch 'master':
    bpo-35059: Convert _PyObject_GC_TRACK() to inline function (GH-10643)
    271753a

    @vstinner
    Copy link
    Member Author

    New changeset 2ff8fb7 by Victor Stinner in branch 'master':
    bpo-35059: Add _PyObject_CAST() macro (GH-10645)
    2ff8fb7

    @vstinner
    Copy link
    Member Author

    New changeset b37672d by Victor Stinner in branch 'master':
    bpo-35059: Cleanup usage of Python macros (GH-10648)
    b37672d

    @vstinner
    Copy link
    Member Author

    New changeset a42de74 by Victor Stinner in branch 'master':
    bpo-35059: Cast void* to PyObject* (GH-10650)
    a42de74

    @vstinner
    Copy link
    Member Author

    I wrote a pull request to replace static inline functions with C macros: PR 10669. I ran a benchmark on speed.python.org server using the "performance" benchmark suite:
    http://pyperformance.readthedocs.io/

    I understand that from the benchmark results that converting macros to static inline functions has no significant impact on performances. Two benchmarks are 1.06x and 1.07x faster but it can be explained by the PGO compilation which is not reliable. One benchmark is way slower, but it seems like the benchmark has an issue. If you look at the 3 latest run on speed.python.org, I see:

    • 38.2 us (Sept 24)
    • 43.5 us (Nov 21)
    • 43.7 us (Nov 22)

    I don't think that any change in _pickle or pickle explains this significant slowdown. IMHO it's just that the benchmark is not reliable :-/ We have a performance timeline on the last 5 years, and this benchmark doesn't have a straight line, we can see that the result is a little bit random :-/

    --

    speed.python.org runs Ubuntu 16.04 with gcc 5.4.0.

    The result are the two attached (compressed) JSON files:

    • 2018-11-22_17-38-master-3bb183d7fb83-patch-10669.json.gz: reference, Python using C macros
    • 2018-11-22_17-38-master-3bb183d7fb83.json.gz: static inline, current master branch

    Comparison ignoring difference smaller than -5% and +5%, macros are the reference:

    vstinner@apu$ python3 -m perf compare_to --table -G --min-speed=5 macros.json.gz inline.json.gz
    +--------------+---------+------------------------------+
    | Benchmark | macros | inline |
    +==============+=========+==============================+
    | regex_dna | 288 ms | 269 ms: 1.07x faster (-7%) |
    +--------------+---------+------------------------------+
    | crypto_pyaes | 236 ms | 223 ms: 1.06x faster (-5%) |
    +--------------+---------+------------------------------+
    | pickle_dict | 37.8 us | 43.7 us: 1.16x slower (+16%) |
    +--------------+---------+------------------------------+

    Not significant (54): (...)

    Raw comparison, full data, macros are the reference:

    vstinner@apu$ python3 -m perf compare_to --table -G macros.json.gz inline.json.gz
    +-------------------------+----------+------------------------------+
    | Benchmark | macros | inline |
    +=========================+==========+==============================+
    | regex_dna | 288 ms | 269 ms: 1.07x faster (-7%) |
    +-------------------------+----------+------------------------------+
    | crypto_pyaes | 236 ms | 223 ms: 1.06x faster (-5%) |
    +-------------------------+----------+------------------------------+
    | nqueens | 199 ms | 195 ms: 1.02x faster (-2%) |
    +-------------------------+----------+------------------------------+
    | raytrace | 1.01 sec | 984 ms: 1.02x faster (-2%) |
    +-------------------------+----------+------------------------------+
    | chaos | 224 ms | 221 ms: 1.02x faster (-2%) |
    +-------------------------+----------+------------------------------+
    | logging_simple | 21.2 us | 20.9 us: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | unpack_sequence | 117 ns | 116 ns: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | spectral_norm | 247 ms | 244 ms: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | regex_v8 | 41.8 ms | 41.4 ms: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | pickle | 19.9 us | 19.7 us: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | logging_format | 24.0 us | 23.8 us: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | scimark_sparse_mat_mult | 9.06 ms | 8.99 ms: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | pathlib | 42.1 ms | 41.8 ms: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | meteor_contest | 188 ms | 187 ms: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | unpickle_pure_python | 704 us | 700 us: 1.01x faster (-1%) |
    +-------------------------+----------+------------------------------+
    | python_startup | 12.0 ms | 12.0 ms: 1.00x faster (-0%) |
    +-------------------------+----------+------------------------------+
    | python_startup_no_site | 7.91 ms | 7.94 ms: 1.00x slower (+0%) |
    +-------------------------+----------+------------------------------+
    | sympy_integrate | 35.0 ms | 35.3 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | pidigits | 283 ms | 285 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | hexiom | 17.3 ms | 17.4 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | sympy_str | 403 ms | 407 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | mako | 31.9 ms | 32.2 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | django_template | 293 ms | 296 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | tornado_http | 368 ms | 372 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | sqlalchemy_declarative | 287 ms | 290 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | html5lib | 190 ms | 192 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | xml_etree_iterparse | 181 ms | 183 ms: 1.01x slower (+1%) |
    +-------------------------+----------+------------------------------+
    | 2to3 | 601 ms | 610 ms: 1.02x slower (+2%) |
    +-------------------------+----------+------------------------------+
    | unpickle_list | 6.49 us | 6.60 us: 1.02x slower (+2%) |
    +-------------------------+----------+------------------------------+
    | scimark_monte_carlo | 210 ms | 214 ms: 1.02x slower (+2%) |
    +-------------------------+----------+------------------------------+
    | sympy_sum | 198 ms | 202 ms: 1.02x slower (+2%) |
    +-------------------------+----------+------------------------------+
    | telco | 14.8 ms | 15.1 ms: 1.02x slower (+2%) |
    +-------------------------+----------+------------------------------+
    | fannkuch | 855 ms | 874 ms: 1.02x slower (+2%) |
    +-------------------------+----------+------------------------------+
    | sympy_expand | 857 ms | 883 ms: 1.03x slower (+3%) |
    +-------------------------+----------+------------------------------+
    | scimark_lu | 298 ms | 307 ms: 1.03x slower (+3%) |
    +-------------------------+----------+------------------------------+
    | xml_etree_generate | 207 ms | 213 ms: 1.03x slower (+3%) |
    +-------------------------+----------+------------------------------+
    | float | 210 ms | 216 ms: 1.03x slower (+3%) |
    +-------------------------+----------+------------------------------+
    | xml_etree_process | 173 ms | 180 ms: 1.04x slower (+4%) |
    +-------------------------+----------+------------------------------+
    | scimark_sor | 357 ms | 373 ms: 1.05x slower (+5%) |
    +-------------------------+----------+------------------------------+
    | json_dumps | 25.8 ms | 27.0 ms: 1.05x slower (+5%) |
    +-------------------------+----------+------------------------------+
    | pickle_dict | 37.8 us | 43.7 us: 1.16x slower (+16%) |
    +-------------------------+----------+------------------------------+

    Not significant (16): regex_effbot; go; json_loads; xml_etree_parse; dulwich_log; nbody; regex_compile; scimark_fft; pickle_pure_python; unpickle; pickle_list; sqlite_synth; richards; logging_silent; deltablue; sqlalchemy_imperative

    @vstinner
    Copy link
    Member Author

    New changeset e89607c by Victor Stinner in branch 'master':
    bpo-35059: NEWS entry for macros converted to inline funcs (GH-10671)
    e89607c

    @vstinner
    Copy link
    Member Author

    I ran a coarse benchmark on the debug mode using PR 10669: I ran the full Python test suite.

    => I don't see any significant impact on performance.

    git clean -fdx # remove *all* untracked files
    ./configure --with-pydebug && make
    ./python -m test -j0 -r --randseed=2411717

    Result:

    • C macros: 3 min 22 sec
    • static inline functions: 3 min 26 sec (+4 sec, +2%)

    I used my laptop to run the benchmark: Fedora 29 with GCC 8.2.1. I was still using the laptop to run other tasks.

    @vstinner
    Copy link
    Member Author

    Unexpected cool effect of static inline functions (copy of my email):
    https://mail.python.org/pipermail/python-dev/2018-November/155747.html

    Le jeu. 15 nov. 2018 à 01:06, Gregory P. Smith <greg at krypto.org> a écrit :

    I expect the largest visible impact may be that a profiler may now attribute CPU cycles takes by these code snippets to the function from the .h file rather than directly to the functions the macro expanded in in the past due to additional debug symbol info attribution. Just more data. Consider that a win.

    Oh. That's very interesting.

    I just tried gdb and I confirm that gdb understands well inlined
    function. When I debug Python, gdb moves into Py_INCREF() or
    Py_DECREF() when I use "next".

    I also tried perf record/perf report: if I annotate a function
    (assembler code of the function), perf shows me the C code of inlined
    Py_INCREF and Py_DECREF!

    That's nice!

    Victor

    @vstinner
    Copy link
    Member Author

    Stefan Behnel wrote:
    https://mail.python.org/pipermail/python-dev/2018-November/155759.html

    """
    It's also slower to compile, given that function inlining happens at a much
    later point in the compiler pipeline than macro expansion. The C compiler
    won't even get to see macros in fact, whereas whether to inline a function
    or not is a dedicated decision during the optimisation phase based on
    metrics collected in earlier stages. For something as ubiquitous as
    Py_INCREF/Py_DECREF, it might even be visible in the compilation times.
    """

    I ran a benchmark on the compilation time using PR 10669: there is no significant slowdown (+4 seconds, 6% slower, in the worst case).

    I ran a benchmark on my laptop:

    • Fedora 29
    • Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz: 4 physical cores (8 threads)
    • GCC 8.2.1
    • MAKEFLAGS=-j9

    Result in release mode:

    • git clean -fdx; ./configure; time make # -03
    • C macros: 1m12,158s
    • static inline functions: 1m16,294s (+4.136s, +6%)

    Result in debug mode:

    • git clean -fdx; ./configure --with-pydebug; time make # -Og
    • C macros: 0m39,727s
    • static inline functions: 0m40,423s (+0.696s, +2%)

    I only used "real" time (I ignored user and sys times).

    @vstinner
    Copy link
    Member Author

    "PyObject* PyObject_INIT(PyObject *op, PyTypeObject *typeobj)"
    and
    "PyVarObject* PyObject_INIT_VAR(PyVarObject *op, PyTypeObject *typeobj, Py_ssize_t size)"

    Don't cast their argument to PyObject*/PyVarObject* which can introduce new warnings when upgrading from Python 3.7 to Python 3.8.

    @vstinner
    Copy link
    Member Author

    I ran my benchmarks, I close the PR 10669 (replace static inline functions with macros). You should still be able to use it as patch:

    https://github.com/python/cpython/pull/10669.patch

    @vstinner
    Copy link
    Member Author

    New changeset b509d52 by Victor Stinner in branch 'master':
    bpo-35059: PyObject_INIT() casts to PyObject* (GH-10674)
    b509d52

    @vstinner
    Copy link
    Member Author

    Drawbacks: Require a specific type can introduce compiler warnings if the caller doesn't pass the proper type (PyObject* or PyVarObject*). _Py_NewReference() and _Py_ForgetReference() seem to be properly used, but not PyObject_INIT() and PyObject_INIT_VAR().

    I worked around this issue by adding a macro to cast the argument and declare the static inline function with a different name. Example:

    static inline void _Py_INCREF(PyObject *op)
    {
        _Py_INC_REFTOTAL;
        op->ob_refcnt++;
    }
    
    #define Py_INCREF(op) _Py_INCREF(_PyObject_CAST(op))

    @vstinner
    Copy link
    Member Author

    I close the issue, it seems like all subtasks have been completed.

    Summary of the issue:

    • The following macros have been converted to static inline functions:

      • Py_INCREF(), Py_DECREF()
      • Py_XINCREF(), Py_XDECREF()
      • PyObject_INIT(), PyObject_INIT_VAR()
      • _Py_NewReference(), _Py_ForgetReference()
      • _Py_Dealloc()
      • _PyObject_GC_TRACK(), _PyObject_GC_UNTRACK()
    • There is no significant impact on performance:

      • I ran performance benchmark on Python compiled in release mode
      • I ran the Python test suite on Python compiled in debug mode
      • I measured the compilation time in release an debug mode
    • It has been decided to use "static inline" to declare inline function (write directly "static inline", no macro).

    • It has been decided to no use __attribute__((always_inline)) nor __forceinline (ask the compiler to always inline).

    --

    Benjamin Peterson would "like to see Py_LOCAL_INLINE removed, too, fwiw", but it's part of the public C API. It would require a deprecation period. I'm not interested to touch the public C API.

    Benjamin: please open a new issue if you still want to remove it :-)

    @vstinner
    Copy link
    Member Author

    Advantages of inline functions over C macros:

    https://mail.python.org/pipermail/python-dev/2018-November/155805.html

    There are multiple advantages:

    • Better development and debugging experience: tools understand
      inlined functions much better than C macros: gdb, Linux perf, etc.

    • Better API: arguments now have a type and the function has a return
      type. In practice, some macros still cast their argument to PyObject*
      to not introduce new compiler warnings in Python 3.8. For example,
      even if Py_INCREF() is documented () as a function expecting
      PyObject
      , it accepts any pointer type (PyTupleObject*,
      PyUnicodeObject*, etc.). Technically, it also accepts PyObject** which
      is a bug, but that's a different story ;-)

    • Much better code, just plain regular C. C macros are ugly: "do { ...
      } while (0)" workaround, additional parenthesis around each argument,
      strange "expr1, expr2" syntax of "macro expression" which returns a
      value (inline function just uses regular "return" and ";" at the end
      of instructions), strange indentation, etc.

    • No more "macro pitfals":
      https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html

    • Local variables no longer need a magic name to avoid risk of name
      conflict, and have a clearly defined scope. Py_DECREF() and
      _Py_XINCREF() no longer need a local variable since it's argument
      already has a clearly defined type: PyObject*. I introduced a new
      variable in _Py_Dealloc() to fix a possible race condition.
      Previously, the variable was probably avoided because it's tricky use
      variables in macros.

    • #ifdef can now be used inside the inline function: it makes the code
      easier to understand.

    • etc.

    Are you aware that Python had macros like:

    #define _Py_REF_DEBUG_COMMA ,
    #define _Py_CHECK_REFCNT(OP) /* a semicolon */;

    I let you judge the quality of this macro:

    #define _Py_NewReference(op) (                          \
        _Py_INC_TPALLOCS(op) _Py_COUNT_ALLOCS_COMMA         \
        _Py_INC_REFTOTAL  _Py_REF_DEBUG_COMMA               \
        Py_REFCNT(op) = 1)

    Is it an expression? Can it be used in "if (test)
    _Py_NewReference(op);"? It doesn't use the "do { ... } while (0)"
    protection against macro pitfals.

    (*) Py_INCREF doc:
    https://docs.python.org/dev/c-api/refcounting.html#c.Py_INCREF

    @zooba
    Copy link
    Member

    zooba commented Nov 23, 2018

    Could we have used function overloading to handle the different types? Rather than reintroducing the macro for the sake of the cast?

    @vstinner
    Copy link
    Member Author

    Could we have used function overloading to handle the different types?
    Rather than reintroducing the macro for the sake of the cast?

    Sorry, I don't know what is function overloading. Is it a C++ thing?
    Py_INCREF() must accept any type based on PyObject.

    At least, this issue shouldn't make the situation worse :-)

    Please open a new issue if you have a solution for this problem. I am now
    curious since I tried many things and I failed to find anything working for
    all cases.

    @zooba
    Copy link
    Member

    zooba commented Nov 23, 2018

    Ah, you're right, it's only C++. My bad.

    @vstinner
    Copy link
    Member Author

    New changeset 07923f3 by Victor Stinner in branch 'master':
    bpo-35059: Enhance _PyObject_GC_TRACK() macros (GH-20931)
    07923f3

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 3, 2021

    See bpo-45094: "Consider using __forceinline and __attribute__((always_inline)) on static inline functions (Py_INCREF, Py_TYPE) for debug builds".

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants