Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce C stack consumption in function calls #73413

Closed
vstinner opened this issue Jan 10, 2017 · 7 comments
Closed

Reduce C stack consumption in function calls #73413

vstinner opened this issue Jan 10, 2017 · 7 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@vstinner
Copy link
Member

BPO 29227
Nosy @vstinner, @serhiy-storchaka
Files
  • less_stack.patch
  • bench_recursion.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-02-01.17:10:46.538>
    created_at = <Date 2017-01-10.17:50:46.698>
    labels = ['interpreter-core', '3.7', 'performance']
    title = 'Reduce C stack consumption in function calls'
    updated_at = <Date 2017-02-01.17:10:46.536>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-02-01.17:10:46.536>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-02-01.17:10:46.538>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2017-01-10.17:50:46.698>
    creator = 'vstinner'
    dependencies = []
    files = ['46242', '46243']
    hgrepos = []
    issue_num = 29227
    keywords = ['patch']
    message_count = 7.0
    messages = ['285135', '285147', '285160', '285163', '285164', '285171', '286658']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue29227'
    versions = ['Python 3.7']

    @vstinner
    Copy link
    Member Author

    Attached patch reduce C stack consumption in function calls. It's the follow-up of the issue bpo-28870.

    Reference (rev a30cdf366c02):

    test_python_call: 7175 calls before crash, stack: 1168 bytes/call
    test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call
    test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call

    => total: 18754 calls, 4080 bytes

    With "Inline call_function() in ceval.c":

    test_python_call: 7936 calls before crash, stack: 1056 bytes/call
    test_python_getitem: 6387 calls before crash, stack: 1312 bytes/call
    test_python_iterator: 5755 calls before crash, stack: 1456 bytes/call

    => total: 20078 calls, 3824 bytes

    With inline and "_PY_FASTCALL_SMALL_STACK: 5 arg (40 B) => 3 arg (24 B)":

    test_python_call: 8058 calls before crash, stack: 1040 bytes/call
    test_python_getitem: 6630 calls before crash, stack: 1264 bytes/call
    test_python_iterator: 5952 calls before crash, stack: 1408 bytes/call

    => total: 20640 calls, 3712 bytes

    I applied testcapi_stack_pointer.patch and run stack_overflow_28870-sp.py of the issue bpo-28870 to produce these statistics.

    With the patch, Python 3.7 is still not as good as Python 3.5 (msg285109), but it's a first enhancement.

    @vstinner vstinner added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jan 10, 2017
    @serhiy-storchaka
    Copy link
    Member

    $ ./python -m perf timeit -s "from bench_recursion import test_python_call as test" -- "test(1000)"
    Python 2.7:  5.10 ms +- 0.37 ms
    Python 3.4:  4.38 ms +- 0.28 ms
    Python 3.5:  4.19 ms +- 0.26 ms
    Python 3.6:  3.93 ms +- 0.32 ms
    Python 3.7:  3.26 ms +- 0.27 ms
    
    $ ./python -m perf timeit -s "from bench_recursion import test_python_getitem as test" -- "test(1000)"
    Python 2.7:  4.09 ms +- 0.26 ms
    Python 3.4:  4.60 ms +- 0.23 ms
    Python 3.5:  4.35 ms +- 0.28 ms
    Python 3.6:  4.05 ms +- 0.34 ms
    Python 3.7:  3.23 ms +- 0.23 ms
    
    $ ./python -m perf timeit -s "from bench_recursion import test_python_iterator as test" -- "test(1000)"
    Python 2.7:  7.85 ms +- 0.66 ms
    Python 3.4:  9.31 ms +- 0.55 ms
    Python 3.5:  9.83 ms +- 0.71 ms
    Python 3.6:  8.99 ms +- 0.66 ms
    Python 3.7:  8.58 ms +- 0.73 ms

    @vstinner
    Copy link
    Member Author

    Oh wow! I'm impressed that Python 3 is better at each release! On 2 tests, Python 3.7 is faster than Python 2.7, but on test_python_iterator Python 3.7 is still slower. It seems like this specific test became much slower (+19%) on Python 3.4 compared to 2.7.

    I guess that your benchmark is on unpatched Python.

    I don't think that less_stack.patch has an impact on performances, but I guess because I'm curisous. It seems like it's a little bit faster. At least, it's not slower ;-)

    test_python_call: Median +- std dev: [ref] 509 us +- 11 us -> [patch] 453 us +- 49 us: 1.12x faster (-11%)
    test_python_getitem: Median +- std dev: [ref] 485 us +- 13 us -> [patch] 470 us +- 23 us: 1.03x faster (-3%)
    test_python_iterator: Median +- std dev: [ref] 1.15 ms +- 0.05 ms -> [patch] 1.12 ms +- 0.07 ms: 1.03x faster (-3%)

    @serhiy-storchaka
    Copy link
    Member

    I didn't provide results with less_stack.patch because they were almost the same, just 1-3% faster. That might be just a random noise or compiler artifact. But may be an effect of inlining call_function().

    Could you run full Python benchmarks? Decreasing the size of small stack doesn't impact a performance in these cases, but may impact a performance of calls with larger number of arguments. AFAIK the size of some small stacks already was decreased from 8 to 5.

    @vstinner
    Copy link
    Member Author

    I plan to run a benchmark when all my patches to reduce the stack consumption will be ready. I'm still trying all the various options to reduce the stack consumption. I'm trying to avoid hacks and reduce the number of changes. I'm already better than Python 2.7 and 3.5 on my local branch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 11, 2017

    New changeset 8481c379e2da by Victor Stinner in branch 'default':
    Inline call_function()
    https://hg.python.org/cpython/rev/8481c379e2da

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2017

    Victor: "I plan to run a benchmark when all my patches to reduce the stack consumption will be ready."

    msg285200 of issue bpo-28870: "I also ran the reliable performance benchmark suite with LTO+PGO. There is no significant performance change on these benchmarks (...)"

    less_stack.patch:

    -#define _PY_FASTCALL_SMALL_STACK 5
    +#define _PY_FASTCALL_SMALL_STACK 3

    With the issue bpo-28870, reducing _PY_FASTCALL_SMALL_STACK value is no more needed. Larger _PY_FASTCALL_SMALL_STACK means better performances, so I prefer to keep the value 5 (arguments).

    The main change, inline call_function(), was merged, so I close the issue.

    @vstinner vstinner closed this as completed Feb 1, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants