classification
Title: Fastcall uses more C stack
Type: Stage:
Components: Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, python-dev, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-12-02 07:54 by haypo, last changed 2016-12-05 17:45 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
stack_overflow.py haypo, 2016-12-02 07:54
stack_overflow.py serhiy.storchaka, 2016-12-04 22:49
Messages (10)
msg282228 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-12-02 07:54
Serhiy Storchaka reported that Python 3.6 crashs earlier than Python 3.5 on calling json.dumps() when sys.setrecursionlimit() is increased.

I tested the script he wrote. Results on Python built in release mode:

Python 3.7:

...
58100 116204
Segmentation fault (core dumped)

Python 3.6:

...
74800 149604
Segmentation fault (core dumped)

Python 3.5:

...
74700 149404
Segmentation fault (core dumped)

Oh, it seems like Python 3.7 does crash earlier.

But to be clear, it's hard to control the usage of the C stack.
msg282247 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-12-02 16:10
Oh, I didn't understand that the regression was introduced by the revision b9c9691c72c5. The purpose of this revision was to *reduce* the memory usage of the C stack!?

It seems like _PyObject_CallArg1() uses more stack memory than PyObject_CallFunctionObjArgs(). PyObject_CallFunctionObjArgs() allocates 4O bytes (5*sizeof(PyObject*)) on the stack.

At least, I can say that when the crash occurs, _PyObject_FastCallDict() is not the gdb backtrace.
msg282252 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-02 19:53
Yes, that is why I asked you to revert your changes.

In additional, they introduced compiler warnings.
msg282370 - (view) Author: Roundup Robot (python-dev) Date: 2016-12-04 22:00
New changeset d35fc6e58a70 by Victor Stinner in branch 'default':
Backed out changeset b9c9691c72c5
https://hg.python.org/cpython/rev/d35fc6e58a70
msg282371 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-12-04 22:13
> Serhiy Storchaka reported that Python 3.6 crashs earlier than Python 3.5 on calling json.dumps() when sys.setrecursionlimit() is increased.

Reference: http://bugs.python.org/issue23507#msg282190 (issue #23507).


Serhiy Storchaka: "Yes, that is why I asked you to revert your changes."

Sorry, I misunderstood your comments. So yes, my change b9c9691c72c5 introduced a regression. Sorry, I didn't have time before now to revert my  change. I just pushed the change d35fc6e58a70 which reverts b9c9691c72c5.

The question is how replacing PyObject_CallFunctionObjArgs() with _PyObject_CallArg1() increases the usage of the C stack. I wrote my change to reduce the usage of the C stack.

PyObject_CallFunctionObjArgs() allocates 5 "PyObject *", so 40 bytes, on the C stack. Maybe using _PyObject_CallArg1() increases the usage of C stack in the *caller*.


> In additional, they introduced compiler warnings.

This one was fixed by Benjamin Peterson in the issue #28855 (change 96245d4af0ca).
msg282373 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-04 22:49
Thanks Victor.

Following script includes several examples of achieving a stack overflow (most are real world examples or can be used in real world examples). It measures maximal deep for every type of recursion.
msg282377 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-12-04 23:11
When I wrote the _PyObject_CallArg1(), it looks as a cool hack:

#define _PyObject_CallArg1(func, arg) \
    _PyObject_FastCall((func), (PyObject **)&(arg), 1)

It hacks the declaration of an explicit "stack" like:

   PyObject *stack[1];
   stack[0] = arg;
   res = _PyObject_FastCall(func, stack, 1);

And I expected that the C compiler magically computes the memory address of the argument. But it seems like requesting the memory address of an argument allocates something on the C stack.

On x86_64, first function arguments are passed with CPU registers. Maybe requesting the memory address of an argument requires to allocate a local variable, copy the register into the variable, to get the address of the local variable?

So, I suggest to *remove* the _PyObject_CallArg1() macro, and use existing functions like PyObject_CallFunctionObjArgs().

What do you think Serhiy?
msg282385 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-05 06:06
That was my initial preference. Mainly because this doesn't add code churn.

But I don't understand how PyObject_CallFunctionObjArgs() that uses _PyObject_CallArg1() and has many local variables can consume less stack than _PyObject_CallArg1() itself.
msg282426 - (view) Author: Roundup Robot (python-dev) Date: 2016-12-05 16:13
New changeset 4171cc26272c by Victor Stinner in branch 'default':
Issue #28858: Remove _PyObject_CallArg1() macro
https://hg.python.org/cpython/rev/4171cc26272c
msg282447 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-12-05 17:45
The changeset 4171cc26272c "Remove _PyObject_CallArg1() macro" fixed the initial bug report, so I now close the issue.

Serhiy: If you see further enhancements, please open a new issue. Your second stack_overflow.py script is interesting, but I don't see any obvious possible changes to enhance results.

Thanks Serhiy for digging into this issue ;-)
History
Date User Action Args
2016-12-05 17:45:34hayposetstatus: open -> closed
resolution: fixed
messages: + msg282447
2016-12-05 16:13:21python-devsetmessages: + msg282426
2016-12-05 06:06:17serhiy.storchakasetmessages: + msg282385
2016-12-04 23:11:19hayposetmessages: + msg282377
2016-12-04 22:49:30serhiy.storchakasetfiles: + stack_overflow.py

messages: + msg282373
2016-12-04 22:13:53hayposetmessages: + msg282371
2016-12-04 22:00:17python-devsetnosy: + python-dev
messages: + msg282370
2016-12-02 19:53:47serhiy.storchakasetmessages: + msg282252
2016-12-02 16:10:15hayposetnosy: + serhiy.storchaka
messages: + msg282247
2016-12-02 07:54:27haypocreate