Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient sigaltstack size used by CPython prevents extensions from using new ISA #91124

Closed
oleksandr-pavlyk mannequin opened this issue Mar 9, 2022 · 21 comments
Closed
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@oleksandr-pavlyk
Copy link
Mannequin

oleksandr-pavlyk mannequin commented Mar 9, 2022

BPO 46968
Nosy @vstinner, @ambv, @pablogsal, @miss-islington, @oleksandr-pavlyk
PRs
  • bpo-46968: Query altstack size dynamically on Linux kernel >=5.14 #31789
  • bpo-46968: Add os.sysconf_names['SC_MINSIGSTKSZ'] #31824
  • [3.10] bpo-46968: Fix faulthandler for Sapphire Rapids Xeon (GH-31789) #31830
  • [3.9] bpo-46968: Fix faulthandler for Sapphire Rapids Xeon (GH-31789) #31831
  • bpo-46968: Check for 'sys/auxv.h' in the configure script #31961
  • [3.10] bpo-46968: Check for 'sys/auxv.h' in the configure script (GH-1961) #31974
  • [3.9] bpo-46968: Check for 'sys/auxv.h' in the configure script (GH-31961) #31975
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-03-18.13:33:30.380>
    created_at = <Date 2022-03-09.19:07:02.730>
    labels = ['extension-modules', 'type-bug', '3.9', '3.10', '3.11']
    title = 'Insufficient sigaltstack size used by CPython prevents extensions from using new ISA'
    updated_at = <Date 2022-03-18.14:15:32.541>
    user = 'https://github.com/oleksandr-pavlyk'

    bugs.python.org fields:

    activity = <Date 2022-03-18.14:15:32.541>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-03-18.13:33:30.380>
    closer = 'pablogsal'
    components = ['Extension Modules']
    creation = <Date 2022-03-09.19:07:02.730>
    creator = 'oleksandr-pavlyk'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46968
    keywords = ['patch']
    message_count = 21.0
    messages = ['414809', '414938', '414939', '414941', '414943', '414949', '414950', '414957', '414960', '414965', '415402', '415403', '415404', '415405', '415406', '415407', '415483', '415487', '415496', '415497', '415500']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'lukasz.langa', 'lkollar', 'pablogsal', 'miss-islington', 'oleksandr-pavlyk']
    pr_nums = ['31789', '31824', '31830', '31831', '31961', '31974', '31975']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46968'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @oleksandr-pavlyk
    Copy link
    Mannequin Author

    oleksandr-pavlyk mannequin commented Mar 9, 2022

    The following snippet illustrates request by an extension to use AMX registers:

    // no increase sigaltstack size fix will not wark till we fix python
    void enable_amx_no_fix()
    {
    
        unsigned long bitmask;
        long rc;
    
        rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, XFEATURE_XTILEDATA);
        if (rc) {
            printf("The kernel rejects the AMX use.\n");
            printf("errno %d\n",errno);
        } else {
            printf("The kernel allows to use AMX.\n");
        }
    
    
        rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_PERM, &bitmask);
    
        if (rc) {
            printf("rc error\n");
        } else  {
            if (( bitmask & XFEATURE_MASK_XTILEDATA) == 0){
                printf("verify AMX permission faild bitmask %ld\n",bitmask);
            }
    
        }
    }
    

    This request fails on the account of too small a size for sigaltstack used by CPython allocated in Modules/faulthandler.c

    The stack size used is 2*SIGSTKSZ, and does not take hardware capabilities into account.

    Linux kernel 5.14 adds support to query minimum size of sigaltstack dynamically via getauxval(AT_MINSIGSTKSZ).

    AMX support is added in Linux kernel 5.16

    CPython should make use of this when built against more recent Linux kernels.

    @oleksandr-pavlyk oleksandr-pavlyk mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.10 only security fixes 3.11 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Mar 9, 2022
    @vstinner
    Copy link
    Member

    New changeset dc374ac by Victor Stinner in branch 'main':
    bpo-46968: Add os.sysconf_names['SC_MINSIGSTKSZ'] (GH-31824)
    dc374ac

    @vstinner
    Copy link
    Member

    New changeset 3b128c0 by Oleksandr Pavlyk in branch 'main':
    bpo-46968: Fix faulthandler for Sapphire Rapids Xeon (GH-31789)
    3b128c0

    @vstinner
    Copy link
    Member

    The latest faulthandler bug on sigaltstack() was bpo-21131 about FPU state stored by "XSAVE" and a Linux kernel change to fix a security vulnerability:
    https://bugs.python.org/issue21131#msg349688

    @vstinner
    Copy link
    Member

    New changeset 393e2bf by Victor Stinner in branch '3.10':
    bpo-46968: Fix faulthandler for Sapphire Rapids Xeon (GH-31789) (GH-31830)
    393e2bf

    @vstinner
    Copy link
    Member

    New changeset ba2b795 by Victor Stinner in branch '3.9':
    bpo-46968: Fix faulthandler for Sapphire Rapids Xeon (GH-31789) (GH-31831)
    ba2b795

    @vstinner
    Copy link
    Member

    Thanks Oleksandr Pavlyk for your nice enhancement!

    I merged your change to 3.9, 3.10 and main branches! I also added os.sysconf('SC_MINSIGSTKSZ') to Python 3.11.

    @vstinner vstinner removed 3.7 (EOL) end of life 3.8 only security fixes labels Mar 11, 2022
    @vstinner vstinner removed 3.7 (EOL) end of life 3.8 only security fixes labels Mar 11, 2022
    @vstinner
    Copy link
    Member

    On my x86-64 Fedora 35 with the CPU "Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz", I get:

    • SIGSTKSZ = 8192
    • getauxval(AT_MINSIGSTKSZ) = 2032
    • faulthandler stack.ss_size = 10224
    • os.sysconf('SC_MINSIGSTKSZ') = 2032

    In C, sysconf(_SC_MINSIGSTKSZ) is similar to getauxval(AT_MINSIGSTKSZ), but the glibc sysconf(_SC_MINSIGSTKSZ) is more generic, whereas IMO getauxval(AT_MINSIGSTKSZ) is more the low-level Linux API.

    @oleksandr-pavlyk
    Copy link
    Mannequin Author

    oleksandr-pavlyk mannequin commented Mar 12, 2022

    So where getauxval(AT_MINSIGSTKSZ) < SIGSTKSZ the merged changes actually resulted in decrease of the allocated signal deliver stack.

    On Sapphire Rapids due to tile registers size we have getauxval(AT_MINSIGSTKSZ) > SIGSTKSZ.

    This is why the initial proposal was to use SIGSTKSZ + max(getauxval(AT_MINSIGSTKSZ), SIGSTKSZ).

    I suppose, ultimately I am saying that we should check that bpo-21131 does not regress.

    @vstinner
    Copy link
    Member

    So where getauxval(AT_MINSIGSTKSZ) < SIGSTKSZ the merged changes actually resulted in decrease of the allocated signal deliver stack.

    faulthandler stack is only used in the least likely case: on a fatal error. It should reduce its memory footprint, so it's good that it uses less memory.

    To fix bpo-21131, I chose to use SIGSTKSZ * 2 because:

    • the fix is trivial: add "* 2"
    • it "just works"
    • there was no API to query how many bytes Linux uses on the stack

    Previously, faulthandler had between 0 and SIGSTKSZ bytes depending on which CPU extension was used or not on the process.

    If I understood correctly, on Linux 5.14 with the change, faulthandler now always has SIGSTKSZ bytes for its own usage, and the other bytes are used by the Linux kernel. So it's more reliable for faulthandler, to always have the same amount of memory for its personal use.

    @pablogsal
    Copy link
    Member

    Commit 393e2bf has broken CPython in RedHat 6:

    [2022-03-16T18:49:20.608Z] 2022/03/16 14:48:55 ERROR /tmp/python3.10-3.10.3-0/Modules/faulthandler.c:28:12: fatal error: sys/auxv.h: No such file or directory
    # include <sys/auxv.h>
    ^~~~~~~~~~~~
    compilation terminated.

    @pablogsal pablogsal reopened this Mar 17, 2022
    @pablogsal pablogsal reopened this Mar 17, 2022
    @pablogsal
    Copy link
    Member

    This is problematic because this has been backported to stable releases.

    @pablogsal
    Copy link
    Member

    We may need to revert these commits and do another release.... sigh :(

    @pablogsal
    Copy link
    Member

    The configure file is checking for "linux/auxvec.h"

    checking linux/auxvec.h usability... yes

    but the code is including "sys/auxvec.h"

    @pablogsal
    Copy link
    Member

    The code is assuming that if linux/auxvec.h then sys/auxv.h will be available, which is wrong.

    @pablogsal
    Copy link
    Member

    This may be quite bad, because this means that 3.10 and 3.9 doesn't build in CentOS 6, which is used for manylinux2010 wheels

    @vstinner
    Copy link
    Member

    Commit 393e2bf has broken CPython in RedHat 6

    Too bad that we want to support RHEL 6 but have no buildbot for that.

    @miss-islington
    Copy link
    Contributor

    New changeset 8e3fde7 by Pablo Galindo Salgado in branch 'main':
    bpo-46968: Check for 'sys/auxv.h' in the configure script (GH-31961)
    8e3fde7

    @pablogsal
    Copy link
    Member

    New changeset a12ef81 by Pablo Galindo Salgado in branch '3.9':
    [3.9] bpo-46968: Check for 'sys/auxv.h' in the configure script (GH-31961). (GH-31975)
    a12ef81

    @pablogsal
    Copy link
    Member

    New changeset 6fd9737 by Pablo Galindo Salgado in branch '3.10':
    [3.10] bpo-46968: Check for 'sys/auxv.h' in the configure script (GH-31961). (GH-31974)
    6fd9737

    @vstinner
    Copy link
    Member

    Thanks for the fix Pablo.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants