Issue 28870: Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/73056

classification

Title:	Reduce stack consumption of PyObject_CallFunctionObjArgs() and like
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	python-dev, serhiy.storchaka, vstinner, xiang.zhang
Priority:	normal	Keywords:	patch

Created on 2016-12-04 22:50 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
PyObject_CallFunctionObjArgs.patch	serhiy.storchaka, 2016-12-04 22:50		review
less_stack.patch	vstinner, 2016-12-15 13:30		review
alloca.patch	vstinner, 2016-12-15 13:45		review
subfunc.patch	vstinner, 2016-12-15 13:49		review
testcapi_stacksize.patch	vstinner, 2017-01-03 01:31		review
no_small_stack.patch	vstinner, 2017-01-03 01:40		review
stack_overflow_28870.py	vstinner, 2017-01-09 17:10
testcapi_stack_pointer.patch	vstinner, 2017-01-10 11:40		review
stack_overflow_28870-sp.py	vstinner, 2017-01-10 11:45
no_small_stack-2.patch	vstinner, 2017-01-10 11:55
bench_recursion-2.py	vstinner, 2017-01-11 00:50

Messages (35)
msg282374 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-12-04 22:50
Following patch I wrote in attempt to decrease a stack consumption of PyObject_CallFunctionObjArgs(), PyObject_CallMethodObjArgs() and _PyObject_CallMethodIdObjArgs(). But it doesn't affect a stack consumption. I still didn't measured what performance effect it has. Seems it makes a code a little cleaner.
msg282379 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-04 23:30
What do you think of using alloca() instead of an "PyObject *small_stack[5];" which has a fixed size? Note: About your patch, try to avoid _PyObject_CallArg1() if you care of the usage of the C stack, see issue #28858.
msg282380 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-04 23:38
> But it doesn't affect a stack consumption. How do you check the stack consumption of PyObject_CallFunctionObjArgs()?
msg282384 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-12-05 05:52
> What do you think of using alloca() instead of an "PyObject *small_stack[5];" which has a fixed size? alloca() is not in POSIX.1. I afraid it would make CPython less portable. > Note: About your patch, try to avoid _PyObject_CallArg1() if you care of the usage of the C stack, see issue #28858. I don't understand how can I avoid it. > How do you check the stack consumption of PyObject_CallFunctionObjArgs()? Using a script from issue28858.
msg283287 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-12-15 11:55
New changeset 71876e4abce4 by Victor Stinner in branch 'default': Add _PY_FASTCALL_SMALL_STACK constant https://hg.python.org/cpython/rev/71876e4abce4
msg283297 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-15 13:30
I reworked abstract.c to prepare work for this issue: * change 455169e87bb3: Add _PyObject_CallFunctionVa() helper * change 6e748eb79038: Add _PyObject_VaCallFunctionObjArgs() private function * change 71876e4abce4: Add _PY_FASTCALL_SMALL_STACK constant I wrote a function _testcapi to measure the consumption of the C code. I was surprised by the results: calling PyObject_CallFunctionObjArgs(func, arg1, arg2, NULL) consumes 560 bytes! I measured on a Python compiled in release mode. Attached less_stack.patch rewrites _PyObject_VaCallFunctionObjArgs(), it reduces the stack consumption from 560 bytes to 384 bytes (-176 bytes!). Changes: * Remove "va_list countva" variable: the va_list variable itself, va_copy(), etc. consume stack memory. First I tried to move code to a subfunction, it helps. With my patch, it's even simpler. * Reduce _PY_FASTCALL_SMALL_STACK from 5 to 3. Stack usage is not directly _PY_FASTCALL_SMALL_STACKsizeof(PyObject), it's much more, probably because of complex memory alignement rules. * Use Py_LOCAL_INLINE(). It seems like depending on the size of the object_vacall() function body, the function is inlined or not. If it's not inlined, the stack usage increases from 384 bytes to 544 bytes!? Use Py_LOCAL_INLINE() to force inlining. Effect of _PY_FASTCALL_SMALL_STACK: * 1: 368 bytes * 2: 384 bytes * 3: 384 bytes -- value chosen in my patch * 4: 400 bytes * 5: 416 bytes
msg283298 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-15 13:33
I don't propose to add _testcapi.pyobjectl_callfunctionobjargs_stacksize(). It's just to test the patch. I'm using it with: $./python -c 'import _testcapi; n=100; print(_testcapi.pyobjectl_callfunctionobjargs_stacksize(n) / (n+1))' 384.0 The value of n has no impact on the stack, it gives the same value with n=0.
msg283302 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-15 13:45
I also tried to use alloca(): see attached alloca.patch. But the result is quite bad: 528 bytes of stack memory per call. I only attach the patch to discuss the issue, but I now dislike the option: the result is bad, it's less portable and more dangerous.
msg283303 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-15 13:49
I also tried Serhiy's approach, split the function into subfunctions, but the result is not as good as expected: 496 bytes. See attached subfunc.patch.
msg283309 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-15 14:26
For comparison, Python 3.5 (before fast calls) uses 448 bytes of C stack per call. Python 3.5 uses a tuple allocated in the heap memory.
msg283336 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-12-15 16:28
I have tested all three patches with the stack_overflow.py script. The only affected are recursive Python implementations of __call__, __getitem__ and __iter__. unpatched less_stack alloca subfunc test_python_call 9696 9876 9880 9876 test_python_getitem 9884 10264 9880 10688 test_python_iterator 7812 8052 8312 8872
msg284524 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-03 01:31
testcapi_stacksize.patch: add _testcapi.pyobjectl_callfunctionobjargs_stacksize(), function used to measure the stack consumption.
msg284527 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-03 01:40
no_small_stack.patch: And now something completely different, a patch to remove the "small stack" alllocated on the C stack, always use the heap memory. FYI I created no_small_stack.patch from less_stack.patch. As expected, the stack usage is lower: * less_stack.patch: 384 bytes/call * no_small_stack.patch: 368 bytes/call I didn't check the performance of no_small_stack.patch yet.
msg284528 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-03 01:42
In Python 3.5, PyObject_CallFunctionObjArgs() calls objargs_mktuple() which uses Py_VA_COPY(countva, va) and creates a tuple. The tuple constructor uses a free list to reduce the cost of heap memory allocations.
msg285055 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-09 17:10
I modified Serhiy's stack_overflow.py of #28858: * re-run each test 10 tests and show the maximum depth * only test: ['test_python_call', 'test_python_getitem', 'test_python_iterator'] Maximum number of Python calls before a crash. (*) Reference (unpatched): 560 bytes/call test_python_call 7172 test_python_getitem 6232 test_python_iterator 5344 => total: 18 838 (1) no_small_stack.patch: 368 bytes/call test_python_call 7172 (=) test_python_getitem 6544 (+312) test_python_iterator 5572 (+228) => total: 19 288 (2) less_stack.patch: 384 bytes/call test_python_call 7272 (+100) test_python_getitem 6384 (+152) test_python_iterator 5456 (+112) => total: 19 112 (3) subfunc.patch: 496 bytes test_python_call 7272 (+100) test_python_getitem 6712 (+480) test_python_iterator 6020 (+678) => total: 20 004 (4) alloca.patch: 528 bytes/call test_python_call 7272 (+100) test_python_getitem 6464 (+232) test_python_iterator 5752 (+408) => total: 19 488 Patched sorted by bytes/call, from best to worst: no_small_stack.patch (368) > less_stack.patch (384) > subfunc.patch (496) > alloca.patch (528) > reference (560). Patched sorted by number of calls before crash: subfunc.patch (20 004) > alloca.patch (19 488) > no_small_stack.patch (19 288) > less_stack.patch (19 112) > reference (18 838). I expected a correlation between the measure bytes/call measured by testcapi_stacksize.patch and the number of calls before a crash, but I fail to see an obvious correlation :-/ Maybe the compiler is smarter than what I would expect and emits efficient code to be able to use less stack memory? Maybe the Linux kernel does weird things which makes the behaviour on stack-overflow non-obvious :-) At least, I would expect that no_small_stack.patch would be the clear winner, since it has the smallest usage of C stack.
msg285057 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-09 17:28
Impact of the _PY_FASTCALL_SMALL_STACK constant: * _PY_FASTCALL_SMALL_STACK=1: 528 bytes/call test_python_call 7376 test_python_getitem 6544 test_python_iterator 5572 => total: 19 492 * _PY_FASTCALL_SMALL_STACK=3: 528 bytes/call test_python_call 7272 test_python_getitem 6464 test_python_iterator 5512 => total: 19 248 * _PY_FASTCALL_SMALL_STACK=5 (current value): 560 bytes/call test_python_call 7172 test_python_getitem 6232 test_python_iterator 5344 => total: 19 636 * _PY_FASTCALL_SMALL_STACK=10: 592 bytes/call test_python_call 6984 test_python_getitem 5952 test_python_iterator 5132 => total: 18 068 Increasing _PY_FASTCALL_SMALL_STACK has a clear effect on the total. Total decreases when _PY_FASTCALL_SMALL_STACK increases. --- no_small_stack.patch with _PY_FASTCALL_SMALL_STACK=3: 368 bytes/call test_python_call 7272 test_python_getitem 6628 test_python_iterator 5632 => total: 19 532
msg285060 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-09 17:45
I'm not sure that the result of pyobjectl_callfunctionobjargs_stacksize() has direct relation to stack consumption in test_python_call, test_python_getitem and test_python_iterator. Try to measure the stack consumption in these cases. This can be done with _testcapi helper that just returns the value of stack pointer. Run all three tests with fixed level of recursion and measure the difference between stack pointers. Would be nice also measure a performance effect of the patches.
msg285105 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 11:40
testcapi_stack_pointer.patch: add _testcapi.stack_pointer() function.
msg285106 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 11:45
stack_overflow_28870-sp.py: script using testcapi_stack_pointer.patch to compute the usage of the C stack. Results of this script. () Reference test_python_call: 7175 calls before crash, stack: 1168 bytes/call test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call => total: 18754 calls, 4080 bytes (1) no_small_stack.patch test_python_call: 7175 calls before crash, stack: 1168 bytes/call test_python_getitem: 6547 calls before crash, stack: 1280 bytes/call test_python_iterator: 5572 calls before crash, stack: 1504 bytes/call => total: 19294 calls, 3952 bytes test_python_call is clearly not impacted by no_small_stack.patch. test_python_call loops on method_call(): method_call() => _PyObject_Call_Prepend() => _PyObject_FastCallDict() => _PyFunction_FastCallDict() => _PyEval_EvalCodeWithName() => PyEval_EvalFrameEx() => _PyEval_EvalFrameDefault() => call_function() => _PyObject_FastCallKeywords() => slot_tp_call() => PyObject_Call() => method_call() => (...) _PyObject_Call_Prepend() is in the middle of the chain. This function uses a "small stack" of _PY_FASTCALL_SMALL_STACK "PyObject" items. We can clearly see the impact of modifying _PY_FASTCALL_SMALL_STACK on the maximum number of test_python_call calls before crash in msg285057.
msg285107 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 11:55
no_small_stack-2.patch: Remove all "small_stack" buffers. Reference test_python_call: 7175 calls before crash, stack: 1168 bytes/call test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call => total: 18754 calls, 4080 bytes no_small_stack.patch test_python_call: 7482 calls (+307) before crash, stack: 1120 bytes/call (-48) test_python_getitem: 6715 calls (+480) before crash, stack: 1248 bytes/call (-96) test_python_iterator: 5693 calls (+349) before crash, stack: 1472 bytes/call (-96) => total: 19890 calls (+1136), 3840 bytes (-240) The total gain is the removal of 5 small buffers of 48 bytes: 240 bytes.
msg285108 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 12:08
> no_small_stack.patch: Oops, you should read no_small_stack-2.patch in my previous message ;-)
msg285109 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 12:15
Python 3.5 (revision 8125d9a8152b), before all fastcall changes: test_python_call: 8314 calls before crash, stack: 1008 bytes/call test_python_getitem: 7483 calls before crash, stack: 1120 bytes/call test_python_iterator: 6802 calls before crash, stack: 1232 bytes/call => total: 22599 calls, 3360 bytes
msg285110 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-10 12:23
What are results with 3.4? There were several issues about stack overflow in 3.5 (issue25222, issue28179, issue28913).
msg285113 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 14:09
Python 3.4 (rev 6340c9fcc111): test_python_call: 9700 calls before crash, stack: 864 bytes/call test_python_getitem: 8314 calls before crash, stack: 1008 bytes/call test_python_iterator: 7818 calls before crash, stack: 1072 bytes/call => total: 25832 calls, 2944 bytes Python 2.7 (rev 0d4e0a736688): test_python_call: 6162 calls before crash, stack: 1360 bytes/call test_python_getitem: 5952 calls before crash, stack: 1408 bytes/call test_python_iterator: 5885 calls before crash, stack: 1424 bytes/call => total: 17999 calls, 4192 bytes Nice. At least, Python 3.7 is better than Python 2.7 (4080 bytes < 4192 bytes) :-) Python 3.4 stack usage was very low, and lower than Python 3.5.
msg285122 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 15:03
no_small_stack-2.patch has a very bad impact on performances: haypo@speed-python$ python3 -m perf compare_to 2017-01-04_12-02-default-ee1390c9b585.json no_small_stack-2_refee1390c9b585.json -G --min-speed=5 Slower (59): - telco: 15.7 ms +- 0.5 ms -> 23.4 ms +- 0.3 ms: 1.49x slower (+49%) - scimark_sor: 393 ms +- 6 ms -> 579 ms +- 10 ms: 1.47x slower (+47%) - json_loads: 56.9 us +- 0.9 us -> 83.1 us +- 2.4 us: 1.46x slower (+46%) - unpickle_pure_python: 698 us +- 10 us -> 984 us +- 10 us: 1.41x slower (+41%) - scimark_lu: 424 ms +- 22 ms -> 585 ms +- 33 ms: 1.38x slower (+38%) - chameleon: 22.4 ms +- 0.2 ms -> 30.8 ms +- 0.3 ms: 1.38x slower (+38%) - xml_etree_generate: 212 ms +- 3 ms -> 291 ms +- 4 ms: 1.37x slower (+37%) - xml_etree_process: 177 ms +- 3 ms -> 240 ms +- 3 ms: 1.35x slower (+35%) - raytrace: 1.04 sec +- 0.01 sec -> 1.40 sec +- 0.02 sec: 1.35x slower (+35%) - logging_simple: 27.9 us +- 0.4 us -> 37.4 us +- 0.5 us: 1.34x slower (+34%) - pickle_pure_python: 1.02 ms +- 0.01 ms -> 1.37 ms +- 0.02 ms: 1.34x slower (+34%) - logging_format: 33.3 us +- 0.4 us -> 44.5 us +- 0.7 us: 1.34x slower (+34%) - xml_etree_iterparse: 195 ms +- 5 ms -> 259 ms +- 7 ms: 1.32x slower (+32%) - chaos: 236 ms +- 3 ms -> 306 ms +- 3 ms: 1.30x slower (+30%) - regex_compile: 380 ms +- 3 ms -> 494 ms +- 5 ms: 1.30x slower (+30%) - pathlib: 42.3 ms +- 0.5 ms -> 55.0 ms +- 0.6 ms: 1.30x slower (+30%) - django_template: 364 ms +- 5 ms -> 471 ms +- 4 ms: 1.29x slower (+29%) - call_method: 11.2 ms +- 0.2 ms -> 14.4 ms +- 0.2 ms: 1.29x slower (+29%) - hexiom: 18.4 ms +- 0.2 ms -> 23.7 ms +- 0.2 ms: 1.29x slower (+29%) - call_method_slots: 11.0 ms +- 0.3 ms -> 14.1 ms +- 0.1 ms: 1.28x slower (+28%) - richards: 147 ms +- 4 ms -> 188 ms +- 5 ms: 1.28x slower (+28%) - html5lib: 207 ms +- 7 ms -> 262 ms +- 6 ms: 1.27x slower (+27%) - genshi_text: 71.5 ms +- 1.3 ms -> 90.3 ms +- 1.1 ms: 1.26x slower (+26%) - deltablue: 14.2 ms +- 0.2 ms -> 17.9 ms +- 0.4 ms: 1.26x slower (+26%) - genshi_xml: 164 ms +- 2 ms -> 207 ms +- 3 ms: 1.26x slower (+26%) - sympy_str: 429 ms +- 5 ms -> 539 ms +- 4 ms: 1.25x slower (+25%) - go: 493 ms +- 5 ms -> 619 ms +- 7 ms: 1.25x slower (+25%) - mako: 35.4 ms +- 1.5 ms -> 44.2 ms +- 1.2 ms: 1.25x slower (+25%) - sympy_expand: 959 ms +- 10 ms -> 1.19 sec +- 0.01 sec: 1.24x slower (+24%) - nqueens: 215 ms +- 2 ms -> 268 ms +- 1 ms: 1.24x slower (+24%) (...) Benchmark ran on speed-python with PGO+LTO, Linux configured for benchmarks using python3 -m perf system tune.
msg285123 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-10 15:06
Thus Python 3.6 stack usage is about 20% larger than Python 3.5 and about 40% larger than Python 3.4. This is significant. :-( no_small_stack-2.patch decreases it only by 6% (with possible performance loss).
msg285124 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 15:09
> no_small_stack-2.patch decreases it only by 6% (with possible performance loss). Yeah, if we want to come back to Python 3.4 efficiency, we need to find the other functions which now uses more stack memory ;-) The discussed "small stack" buffers are only responsible of 96 bytes, not a big deal compared to the total of 4080 bytes.
msg285128 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 16:02
Stack used by each C function of test_python_call. 3.4: (a) method_call: 64 (b) PyObject_Call: 48 (b) function_call: 160 (b) PyEval_EvalCodeEx: 176 (c) PyEval_EvalFrameEx: 256 (c) call_function: 0 (c) do_call: 0 (c) PyObject_Call: 48 (d) slot_tp_call: 64 (d) PyObject_Call: 48 => total: 864 default: (a) method_call: 80 (b) _PyObject_FastCallDict: 64 (b) _PyFunction_FastCallDict: 208 (b) _PyEval_EvalCodeWithName: 176 (c) _PyEval_EvalFrameDefault: 320 (c) call_function: 80 (c) _PyObject_FastCallKeywords: 80 (d) slot_tp_call: 64 (d) PyObject_Call: 48 => total: 1120 Groups of functions, 3.4 => default: (a) 64 => 80 (+16) (b) 384 => 448 (+64) (c) 304 => 480 (+176) (d) 112 => 112 (=) I used gdb: (gdb) set $last=0 (gdb) define size > print $last - (uintptr_t)$rsp > set $last = (uintptr_t)$rsp > down (gdb) up (gdb) up (gdb) up (... until a first method_call ...) (gdb) size (gdb) size ...
msg285136 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 17:51
I created the issue #29227 "Reduce C stack consumption in function calls" which contains a first simple patch with a significant effect on the C stack.
msg285137 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-10 17:57
It seems like subfunc.patch approach using the "no inline" attribute helps.
msg285169 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-11 00:20
I pushed 3 changes: * rev b9404639a18c: Issue #29233: call_method() now uses _PyObject_FastCall() * rev 8481c379e2da: Issue #29227: inline call_function() * rev 6478e6d0476f: Issue #29234: disable _PyStack_AsTuple() inlining Before (rev a30cdf366c02): test_python_call: 7175 calls before crash, stack: 1168 bytes/call test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call => total: 18754 calls, 4080 bytes With these 3 changes (rev 6478e6d0476f): test_python_call: 8587 calls before crash, stack: 976 bytes/call test_python_getitem: 9189 calls before crash, stack: 912 bytes/call test_python_iterator: 7936 calls before crash, stack: 1056 bytes/call => total: 25712 calls, 2944 bytes The default branch is now as good as Python 3.4, in term of stack consumption, and Python 3.4 was the Python version which used the least stack memory according to my tests. I didn't touch _PY_FASTCALL_SMALL_STACK value, it's still 5 arguments (40 bytes). So my changes should not impact performances.
msg285173 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-11 00:50
Result of attached bench_recursion-2.py comparing before/after the 3 changes reducing the stack consumption: test_python_call: Median +- std dev: [a30cdf366c02] 512 us +- 12 us -> [6478e6d0476f] 467 us +- 21 us: 1.10x faster (-9%) test_python_getitem: Median +- std dev: [a30cdf366c02] 485 us +- 26 us -> [6478e6d0476f] 437 us +- 18 us: 1.11x faster (-10%) test_python_iterator: Median +- std dev: [a30cdf366c02] 1.15 ms +- 0.04 ms -> [6478e6d0476f] 1.03 ms +- 0.06 ms: 1.12x faster (-10%) At least, it doesn't seem to be slower. Maybe the speedup comes from call_function() inlining. This function was probably already inlined when using PGO build. The script was written by Serhiy in the issue #29227, I modified it to use the Runner.timeit() API for convenience.
msg285192 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-11 06:51
Awesome! You are great Victor!
msg285200 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-11 08:04
I also ran the reliable performance benchmark suite with LTO+PGO. There is no significant performance change on these benchmarks: https://speed.python.org/changes/?rev=b9404639a18c&exe=5&env=speed-python The largest change is on scimark_lu (-13%), but there was an hiccup on the previous change which is probably a small unstability in the benchmark. It's not a speedup of these changes. The second largest change is on spectral_norm: +9%. But this benchmark is known to be unstable, there was already a small peak previously. Again, I don't think that it's related to the changes.
msg286657 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-02-01 17:05
"The default branch is now as good as Python 3.4, in term of stack consumption, and Python 3.4 was the Python version which used the least stack memory according to my tests." I consider that the initial issue is now fixed, so I close the issue. Thanks Serhiy for the tests, reviews, ideas and obvious the bug report ;-) I never looked at the stack usage before.

History
Date	User	Action	Args
2022-04-11 14:58:40	admin	set	github: 73056
2017-02-06 15:00:00	vstinner	set	status: open -> closed resolution: fixed stage: resolved
2017-02-01 17:05:28	vstinner	set	messages: + msg286657
2017-01-11 08:04:18	vstinner	set	messages: + msg285200
2017-01-11 06:51:48	serhiy.storchaka	set	messages: + msg285192
2017-01-11 00:50:29	vstinner	set	files: + bench_recursion-2.py messages: + msg285173
2017-01-11 00:20:39	vstinner	set	messages: + msg285169
2017-01-10 17:57:16	vstinner	set	messages: + msg285137
2017-01-10 17:51:18	vstinner	set	messages: + msg285136
2017-01-10 16:02:07	vstinner	set	messages: + msg285128
2017-01-10 15:09:44	vstinner	set	messages: + msg285124
2017-01-10 15:06:19	serhiy.storchaka	set	messages: + msg285123
2017-01-10 15:03:59	vstinner	set	messages: + msg285122
2017-01-10 14:09:34	vstinner	set	messages: + msg285113
2017-01-10 12:23:56	serhiy.storchaka	set	messages: + msg285110
2017-01-10 12:15:10	vstinner	set	messages: + msg285109
2017-01-10 12:08:38	vstinner	set	messages: + msg285108
2017-01-10 11:55:07	vstinner	set	files: + no_small_stack-2.patch messages: + msg285107
2017-01-10 11:45:51	vstinner	set	files: + stack_overflow_28870-sp.py messages: + msg285106
2017-01-10 11:40:26	vstinner	set	files: + testcapi_stack_pointer.patch messages: + msg285105
2017-01-09 17:45:03	serhiy.storchaka	set	messages: + msg285060
2017-01-09 17:28:49	vstinner	set	messages: + msg285057
2017-01-09 17:10:22	vstinner	set	files: + stack_overflow_28870.py messages: + msg285055
2017-01-09 10:56:13	xiang.zhang	set	nosy: + xiang.zhang
2017-01-03 01:42:39	vstinner	set	messages: + msg284528
2017-01-03 01:40:46	vstinner	set	files: + no_small_stack.patch messages: + msg284527
2017-01-03 01:31:35	vstinner	set	files: + testcapi_stacksize.patch messages: + msg284524
2016-12-15 16:28:33	serhiy.storchaka	set	messages: + msg283336
2016-12-15 14:26:23	vstinner	set	messages: + msg283309
2016-12-15 13:49:04	vstinner	set	files: + subfunc.patch messages: + msg283303
2016-12-15 13:45:37	vstinner	set	files: + alloca.patch messages: + msg283302
2016-12-15 13:33:59	vstinner	set	messages: + msg283298
2016-12-15 13:31:26	vstinner	set	title: Refactor PyObject_CallFunctionObjArgs() and like -> Reduce stack consumption of PyObject_CallFunctionObjArgs() and like
2016-12-15 13:30:48	vstinner	set	files: + less_stack.patch messages: + msg283297
2016-12-15 11:55:10	python-dev	set	nosy: + python-dev messages: + msg283287
2016-12-05 05:52:58	serhiy.storchaka	set	messages: + msg282384
2016-12-04 23:38:10	vstinner	set	messages: + msg282380
2016-12-04 23:30:50	vstinner	set	messages: + msg282379
2016-12-04 22:50:13	serhiy.storchaka	create