Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call #90758

Closed
vstinner opened this issue Feb 1, 2022 · 16 comments
Labels
3.11 only security fixes build The build process and cross-build pending The issue will be closed if no feedback is provided performance Performance or resource usage

Comments

@vstinner
Copy link
Member

vstinner commented Feb 1, 2022

BPO 46600
Nosy @vstinner, @methane, @corona10, @pablogsal
PRs
  • bpo-46600: ./configure --with-pydebug uses -Og with clang #31052
  • bpo-46600: Fix test_gdb.test_pycfunction() for clang -Og #31058
  • Files
  • stack_overflow-4.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-02-01.13:06:00.875>
    labels = ['3.11', 'build', 'performance']
    title = 'Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call'
    updated_at = <Date 2022-02-03.00:00:28.611>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2022-02-03.00:00:28.611>
    actor = 'methane'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Build']
    creation = <Date 2022-02-01.13:06:00.875>
    creator = 'vstinner'
    dependencies = []
    files = ['50600']
    hgrepos = []
    issue_num = 46600
    keywords = ['patch']
    message_count = 14.0
    messages = ['412252', '412253', '412254', '412255', '412256', '412258', '412260', '412261', '412278', '412286', '412294', '412334', '412348', '412407']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'methane', 'corona10', 'pablogsal']
    pr_nums = ['31052', '31058']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue46600'
    versions = ['Python 3.11']

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    Measure using this script on the main branch (commit 108e66b):
    ---

    import _testcapi
    def f(): yield _testcapi.stack_pointer()
    print(_testcapi.stack_pointer() - next(f()))

    Stack usage depending on the compiler and compiler optimization level:

    • clang -O0: 9,104 bytes
    • clang -Og: 736 bytes
    • gcc -O0: 6,784 bytes
    • gcc -Og: 624 bytes

    -O0 allocates around 10x more memory.

    Moreover, "./configure --with-pydebug CC=clang" uses -O0 in CFLAGS, because "clang --help" output doesn't containt "-Og". I'm working on a configure change to use -Og on clang which supports it.

    @vstinner vstinner added 3.11 only security fixes build The build process and cross-build performance Performance or resource usage labels Feb 1, 2022
    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    #75235 enables -Og when using clang and ./configure --with-pydebug and so the example uses 736 bytes instead of 9,104 bytes.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    This issue is a follow-up of bpo-46542 "test_json and test_lib2to3 crash on s390x Fedora Clang 3.x buildbot".

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    Previous issues about stack memory usage, work done in 2017:

    • bpo-28870: Reduce stack consumption of PyObject_CallFunctionObjArgs() and like
    • bpo-29227: Reduce C stack consumption in function calls
    • bpo-29465: Modify _PyObject_FastCall() to reduce stack consumption
      29464

    I summarized the results in the "Stack consumption" section of my article: https://vstinner.github.io/contrib-cpython-2017q1.html

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    See also bpo-30866: "Add _testcapi.stack_pointer() to measure the C stack consumption".

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    stack_overflow-4.py: Update script from bpo-30866 to measure stack memory usage before Python crash or raises a RecursionError.

    I had to modify the script since calling a Python function from a Python function no longer allocates (additional) memory on the stack! See bpo-45256 "Remove the usage of the C stack in Python to Python calls".

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    stack_overflow-4.py output depending on the compiler and compiler flags.

    gcc -O3 (./configure):
    ---
    test_python_call: 11904 calls before crash, stack: 704.1 bytes/call
    test_python_iterator: 17460 calls before crash, stack: 480.0 bytes/call
    test_python_getitem: 245760 calls before recursion error, stack: 0.2 bytes/call

    => total: 275124 calls, 1184.3 bytes per call
    ---

    It's better than stack memory usage in 2017: https://bugs.python.org/issue30866#msg297826

    clang -O3 (./configure CC=clang):
    ---
    test_python_call: 10270 calls before crash, stack: 816.1 bytes/call
    test_python_iterator: 14155 calls before crash, stack: 592.0 bytes/call
    test_python_getitem: 245760 calls before recursion error, stack: 0.3 bytes/call

    => total: 270185 calls, 1408.4 bytes per call
    ---

    clang allocates a little bit more memory on the stack than gcc.

    I didn't try PGO or LTO yet.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    New changeset 0515eaf by Victor Stinner in branch 'main':
    bpo-46600: ./configure --with-pydebug uses -Og with clang (GH-31052)
    0515eaf

    @pablogsal
    Copy link
    Member

    PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :(

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :(

    test_gdb fails if Python is built with clang -Og. I don't think that it's a regression. It's just that previously, buildbots using clang only build Python with -O0 or -O3.

    I'm investigating the test_gdb issue: it's easy to reproduce on Linux (clang 13.0.0). I may skip test_gdb is Python is built with clang -Og.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2022

    New changeset bebaa95 by Victor Stinner in branch 'main':
    bpo-46600: Fix test_gdb.test_pycfunction() for clang -Og (GH-31058)
    bebaa95

    @methane
    Copy link
    Member

    methane commented Feb 2, 2022

    FWIW, it seems -O0 don't merge local variables in different path or lifetime.

    For example, see _Py_abspath

        if (path[0] == '\0' || !wcscmp(path, L".")) {
           wchar_t cwd[MAXPATHLEN + 1];
           //(snip)
        }
        //(snip)
        wchar_t cwd[MAXPATHLEN + 1];
    

    wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes.
    -Og allocates 32856 bytes for it and -Og allocates 16440 bytes for it.

    I don't know what is the specific optimization flag in -Og do merge local variable, but I think -Og is very important for _PyEval_EvalFrameDefault() since it has many local variables in huge switch-case statements.
    -Og allocates 312 bytes for it and -O0 allocates 8280 bytes for it.

    By the way, clang 13 has -fstack-usage option like gcc, but clang 12 don't have it.
    Since Ubuntu 20.04 have only clang 12, I use -fstack-size-segment and https://github.com/mvanotti/stack-sizes to get stack size.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 2, 2022

    For example, see _Py_abspath

    For functions which are commonly called in Python at runtime, it may be worth it to manually merged large local variables to save a few bytes on the stack when Python is built with -O0. For _Py_abspath(), this function is only called at startup, if I recall correctly, so it should be a big issue in practice.

    @methane
    Copy link
    Member

    methane commented Feb 3, 2022

    I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and -Og is so different.

    We can reduce stack usage of it easily, but it is not a problem than _PyEval_EvalFrameDefault.
    It is difficult to reduce stack usage of _PyEval_EvalFrameDefault with -O0.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    Is there anything left to do here?

    @iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Sep 12, 2022
    @vstinner
    Copy link
    Member Author

    There is always room for enhancement :-) But for now, IMO merged changes are enough to make the issue less complicated.

    The main change is to use -Og when Python is built in debug mode by clang.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes build The build process and cross-build pending The issue will be closed if no feedback is provided performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants