Title: Disable inlining of _PyStack_AsTuple() to reduce the stack consumption
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, python-dev, serhiy.storchaka, steve.dower
Priority: normal Keywords: patch

Created on 2017-01-11 00:05 by haypo, last changed 2017-02-01 17:21 by serhiy.storchaka. This issue is now closed.

File name Uploaded Description Edit
noinline_msvs.patch haypo, 2017-01-11 00:26 review
Messages (8)
msg285168 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-11 00:05
While working on the issue #28870, I noticed that _PyStack_AsTuple() is inlined. Compared to Python 3.5 which didn't have this function, it seems like _PyStack_AsTuple() is responsible for an increase of the stack consumption.
msg285170 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-11 00:26
noinline_msvs.patch: implement _Py_NO_INLINE for Microsoft Visual Studio

#elif defined(_MSC_VER)
#  define _Py_NO_INLINE __declspec(noinline)

I didn't push this change because I don't have access to a Windows VM yet.

@Steve: Does noinline_msvs.patch look good to you?

I decided to make the macro private because I don't know if _Py_NO_INLINE can be combined with PyAPI_FUNC() which also uses __declspec(): __declspec(dllexport).

Is it possible to use __declspec() multiple times per function declaration?
msg285172 - (view) Author: Roundup Robot (python-dev) Date: 2017-01-11 00:28
New changeset 6478e6d0476f by Victor Stinner in branch 'default':
Disable _PyStack_AsTuple() inlining
msg285182 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-01-11 02:27
I'm fairly sure dllexport implies noinline, and it certainly does for the exported function (but maybe PGO will inline it within the module? hard to know). In any case, I agree with keeping it private as it's purely about optimization.

Do you have a test that can run against the released build? That will be the easiest way to tell whether it matters at all.
msg285191 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-11 06:48
What is a stack usage effect of disabling inlining _PyStack_AsTuple()? Does it impact performance? Should inlining _PyStack_AsDict() be disabled too? Is it worth to extract slow paths of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() into separate non-inlined functions?
msg285201 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-11 08:04
Full commit message:
Disable _PyStack_AsTuple() inlining

Issue #29234: Inlining _PyStack_AsTuple() into callers increases their stack
consumption, Disable inlining to optimize the stack consumption.

Add _Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang.

It reduces the stack consumption, bytes per call, before => after:

test_python_call: 1040 => 976 (-64 B)
test_python_getitem: 976 => 912 (-64 B)
test_python_iterator: 1120 => 1056 (-64 B)

=> total: 3136 => 2944 (- 192 B)

Serhiy Storchaka: "What is a stack usage effect of disabling inlining _PyStack_AsTuple()?"

The total effect on the 3 tests is to reduce the stack consumption by 192 bytes/call, or 64 bytes/call (8 CPU words) for each test.

> Does it impact performance?

I ran a benchmark on 3 changes at once. The effect is a speedup, not a slowdown:

I don't expect any significant performance impact for the change 6478e6d0476f.

> Should inlining _PyStack_AsDict() be disabled too?

Good question. I didn't try to write a benchmark calling this function. It would help to have numbers to take a decision.

I tried to push the fewer changes which have the largest impact on the stack consumption. There is still room to reduce it even further.

> Is it worth to extract slow paths of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() into separate non-inlined functions?

Do you mean for performance or stack consumption? I don't know. If you would like to know, you should run a benchmark to measure that.
msg286656 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-01 17:03
The initial issue has been fixed.

With the issue #28870 (msg285169): "The default branch is now as good as Python 3.4, in term of stack consumption, and Python 3.4 was the Python version which used the least stack memory according to my tests."

Serhiy: As I wrote, for _PyStack_AsDict(), we need numbers to see if this function is inlined or not, and check its usage of the stack.

Since I consider that the stack usage is now low enough, I'm not interested to continue to investigate the stack usage. Feel free to continue the effort, but please open a new issue in this case.

I close the issue.
msg286663 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-01 17:21
Thank you Victor. Your work on reducing the stack consumption is great.
Date User Action Args
2017-02-01 17:21:59serhiy.storchakasetmessages: + msg286663
2017-02-01 17:03:48hayposetstatus: open -> closed
resolution: fixed
messages: + msg286656
2017-01-11 08:04:34hayposetmessages: + msg285201
2017-01-11 06:48:46serhiy.storchakasetmessages: + msg285191
2017-01-11 02:27:09steve.dowersetmessages: + msg285182
2017-01-11 00:28:06python-devsetnosy: + python-dev
messages: + msg285172
2017-01-11 00:26:08hayposetfiles: + noinline_msvs.patch

nosy: + steve.dower
messages: + msg285170

keywords: + patch
2017-01-11 00:05:11haypocreate