Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 resource.setrlimit strange behaviour under macOS #78783

Closed
marche147 mannequin opened this issue Sep 7, 2018 · 23 comments
Closed

python3 resource.setrlimit strange behaviour under macOS #78783

marche147 mannequin opened this issue Sep 7, 2018 · 23 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-mac stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@marche147
Copy link
Mannequin

marche147 mannequin commented Sep 7, 2018

BPO 34602
Nosy @ronaldoussoren, @ned-deily, @methane, @0-wiz-0, @ambv, @miss-islington, @vladima, @marche147
PRs
  • bpo-34602: Avoid failures setting macOS stack resource limit #13011
  • [3.7] bpo-34602: Avoid failures setting macOS stack resource limit (GH-13011) #13013
  • [3.6] bpo-34602: Avoid failures setting macOS stack resource limit (GH-13011) #13014
  • bpo-34602: Avoid failures setting macOS stack resource limit #14546
  • [3.8] bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546) #14547
  • [3.7] bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546) #14548
  • [3.6] bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546) #14549
  • bpo-34602: Quadruple stack size on macOS when compiling with UBSAN #27309
  • [3.10] bpo-34602: Quadruple stack size on macOS when compiling with UBSAN (GH-27309) #28280
  • bpo-34602: Fix unportable test(1) operator in configure script #30490
  • [3.10] bpo-34602: Fix unportable test(1) operator in configure script (GH-30490) #30491
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-07-02.07:50:14.073>
    created_at = <Date 2018-09-07.10:34:46.904>
    labels = ['OS-mac', 'type-bug', '3.9', '3.10', '3.11', 'library']
    title = 'python3 resource.setrlimit strange behaviour under macOS'
    updated_at = <Date 2022-01-09.01:08:28.643>
    user = 'https://github.com/marche147'

    bugs.python.org fields:

    activity = <Date 2022-01-09.01:08:28.643>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-07-02.07:50:14.073>
    closer = 'ned.deily'
    components = ['Library (Lib)', 'macOS']
    creation = <Date 2018-09-07.10:34:46.904>
    creator = 'marche147'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34602
    keywords = ['patch']
    message_count = 22.0
    messages = ['324731', '324737', '324818', '324824', '338868', '339197', '341112', '341114', '341118', '341120', '347110', '347113', '347115', '347117', '347119', '347167', '400984', '401879', '409803', '409879', '410032', '410126']
    nosy_count = 9.0
    nosy_names = ['ronaldoussoren', 'exarkun', 'ned.deily', 'methane', 'wiz', 'lukasz.langa', 'miss-islington', 'v2m', 'marche147']
    pr_nums = ['13011', '13013', '13014', '14546', '14547', '14548', '14549', '27309', '28280', '30490', '30491']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue34602'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @marche147
    Copy link
    Mannequin Author

    marche147 mannequin commented Sep 7, 2018

    Consider the following code:

    import resource
    s, h = resource.getrlimit(resource.RLIMIT_STACK)
    resource.setrlimit(resource.RLIMIT_STACK, (h, h))
    

    Running this under macOS with python 3.6.5 gives the following exception:

    bash-3.2$ uname -a
    Darwin arch-osx 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
    bash-3.2$ cat test.py
    import resource
    s, h = resource.getrlimit(resource.RLIMIT_STACK)
    resource.setrlimit(resource.RLIMIT_STACK, (h, h))
    bash-3.2$ python3 test.py
    Traceback (most recent call last):
      File "test.py", line 3, in <module>
        resource.setrlimit(resource.RLIMIT_STACK, (h, h))
    ValueError: current limit exceeds maximum limit
    

    Nevertheless, when using python 2.7.10 under the same environment, this code works perfectly without exceptions being thrown. Additionally, neither of these operations fail under the same circumstances :

    bash-3.2$ cat test.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <errno.h>
    #include <sys/resource.h>
    
    int main() {
      struct rlimit rl;
      if(getrlimit(RLIMIT_STACK, &rl) < 0) {
        perror("getrlimit");
        exit(1);
      }
    
      rl.rlim_cur = rl.rlim_max;
      if(setrlimit(RLIMIT_STACK, &rl) < 0) {
        perror("setrlimit");
        exit(1);
      }
      return 0;
    }
    bash-3.2$ gcc -o test test.c
    bash-3.2$ ./test
    
    bash-3.2$ ulimit -s -H
    65532
    bash-3.2$ ulimit -s
    8192
    bash-3.2$ ulimit -s 65532
    bash-3.2$ ulimit -s
    65532
    

    I have also tried to run the above-mentioned python script on linux, also it does not generate exceptions both on python2 (2.7.10) & python3 (3.6.5).

    @marche147 marche147 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 7, 2018
    @ronaldoussoren
    Copy link
    Contributor

    I get the same error, also with python3.7. Both for homebrew and a python.org installer.

    @vladima
    Copy link
    Mannequin

    vladima mannequin commented Sep 8, 2018

    I can repro it with a given sample file

    vladima-mbp $ cat test.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <errno.h>
    #include <sys/resource.h>
    
    int main() {
      struct rlimit rl;
      if(getrlimit(RLIMIT_STACK, &rl) < 0) {
        perror("getrlimit");
        exit(1);
      }
    
      rl.rlim_cur = rl.rlim_max;
      if(setrlimit(RLIMIT_STACK, &rl) < 0) {
        perror("setrlimit");
        exit(1);
      }
      return 0;
    }vladima-mbp $ gcc   -Wl,-stack_size,1000000   -o test test.c
    vladima-mbp $ ./test
    setrlimit: Invalid argument
    

    Similar settings were added to Python in 335ab5b66f4

    @marche147
    Copy link
    Mannequin Author

    marche147 mannequin commented Sep 8, 2018

    Thanks for the repro! It did help for pinpointing the issue.

    So I took a little spare time and dived into xnu kernel code, here is my assumption based on what I found (N.B. : My assumption comes from a simple experiment and a brief skim of the source code within 15 minutes or less, it could be seriously wrong since I'm not an expert of XNU kernel, and I currently don't have the time to build and debug it.) :

    In bsd/kern/kern_resource.c, there's a function dosetrlimit which handles the setrlimit request, and here's part of it:

      case RLIMIT_STACK:
        // ...
        if (limp->rlim_cur > alimp->rlim_cur) {
          user_addr_t addr;
          user_size_t size;
    
            /* grow stack */
            size = round_page_64(limp->rlim_cur);
            size -= round_page_64(alimp->rlim_cur);
    
          addr = p->user_stack - round_page_64(limp->rlim_cur);
          kr = mach_vm_protect(current_map(),
                   addr, size,
                   FALSE, VM_PROT_DEFAULT);
          if (kr != KERN_SUCCESS) {
            error =  EINVAL;
            goto out;
          }
        } // ...
    
    

    As we can see, the kernel will try to mprotect the memory preceding the stack to VM_PROT_DEFAULT (presumably read & write). I then used vmmap to see the difference between two binaries compiled with different commands. And the results are :

    1. Binary compiled without default stack size:
    - Before calling setrlimit
    
    ...
    STACK GUARD            00007ffee76d9000-00007ffeeaed9000 [ 56.0M     0K     0K     0K] ---/rwx SM=NUL          stack guard for thread 0
    ...
    Stack                  00007ffeeaed9000-00007ffeeb6d9000 [ 8192K    20K    20K     0K] rw-/rwx SM=PRV          thread 0
    ...
                                    VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
    REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
    ===========                     ======= ========    =====  ======= ========   ======    =====  =======
    Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        2
    MALLOC guard page                   16K       0K       0K       0K       0K       0K       0K        5
    MALLOC metadata                     60K      60K      60K       0K       0K       0K       0K        6
    MALLOC_SMALL                      16.0M      16K      16K       0K       0K       0K       0K        3         see MALLOC ZONE table below
    MALLOC_TINY                       2048K      32K      32K       0K       0K       0K       0K        3         see MALLOC ZONE table below
    STACK GUARD                       56.0M       0K       0K       0K       0K       0K       0K        2
    Stack                             8192K      20K      20K       0K       0K       0K       0K        2
    __DATA                            2324K    1192K     208K       0K       0K       0K       0K       43
    __LINKEDIT                       192.7M    21.7M       0K       0K       0K       0K       0K        4
    __TEXT                            9448K    8224K       0K       0K       0K       0K       0K       48
    shared memory                        8K       8K       8K       0K       0K       0K       0K        3
    ===========                     ======= ========    =====  ======= ========   ======    =====  =======
    TOTAL                            286.3M    31.0M     348K       0K       0K       0K       0K      110
    ...
    
    - After calling setrlimit
    
    ...
    STACK GUARD            00007ffee76d9000-00007ffee76da000 [    4K     0K     0K     0K] ---/rwx SM=NUL          stack guard for thread 0
    ...
    Stack                  00007ffee76da000-00007ffeeaed9000 [ 56.0M     0K     0K     0K] rw-/rwx SM=NUL          thread 0
    Stack                  00007ffeeaed9000-00007ffeeb6d9000 [ 8192K    20K    20K     0K] rw-/rwx SM=PRV          thread 0
    ...
                                    VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
    REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
    ===========                     ======= ========    =====  ======= ========   ======    =====  =======
    Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        2
    MALLOC guard page                   16K       0K       0K       0K       0K       0K       0K        5
    MALLOC metadata                     60K      60K      60K       0K       0K       0K       0K        6
    MALLOC_SMALL                      16.0M      20K      20K       0K       0K       0K       0K        3         see MALLOC ZONE table below
    MALLOC_TINY                       2048K      32K      32K       0K       0K       0K       0K        3         see MALLOC ZONE table below
    STACK GUARD                          4K       0K       0K       0K       0K       0K       0K        2
    Stack                             64.0M      20K      20K       0K       0K       0K       0K        3
    __DATA                            2324K    1192K     208K       0K       0K       0K       0K       43
    __LINKEDIT                       192.7M    21.7M       0K       0K       0K       0K       0K        4
    __TEXT                            9448K    8224K       0K       0K       0K       0K       0K       48
    shared memory                        8K       8K       8K       0K       0K       0K       0K        3
    ===========                     ======= ========    =====  ======= ========   ======    =====  =======
    TOTAL                            286.3M    31.0M     352K       0K       0K       0K       0K      111
    ...
    
    1. Binary compiled with default stack size:
    Before calling setrlimit :
    ...
    STACK GUARD            00007ffee09c2000-00007ffee09c3000 [    4K     0K     0K     0K] ---/rwx SM=NUL          stack guard for thread 0
    ...
    Stack                  00007ffee09c3000-00007ffee19c3000 [ 16.0M    20K    20K     0K] rw-/rwx SM=PRV          thread 0
    ...
                                    VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
    REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
    ===========                     ======= ========    =====  ======= ========   ======    =====  =======
    Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        2
    MALLOC guard page                   16K       0K       0K       0K       0K       0K       0K        5
    MALLOC metadata                     60K      60K      60K       0K       0K       0K       0K        6
    MALLOC_SMALL                      8192K      12K      12K       0K       0K       0K       0K        2         see MALLOC ZONE table below
    MALLOC_TINY                       1024K      20K      20K       0K       0K       0K       0K        2         see MALLOC ZONE table below
    STACK GUARD                          4K       0K       0K       0K       0K       0K       0K        2
    Stack                             16.0M      20K      20K       0K       0K       0K       0K        2
    __DATA                            2324K    1192K     208K       0K       0K       0K       0K       43
    __LINKEDIT                       192.7M    22.3M       0K       0K       0K       0K       0K        4
    __TEXT                            9448K    8232K       0K       0K       0K       0K       0K       48
    shared memory                        8K       8K       8K       0K       0K       0K       0K        3
    ===========                     ======= ========    =====  ======= ========   ======    =====  =======
    TOTAL                            229.3M    31.7M     332K       0K       0K       0K       0K      108
    

    As we can see, it seems that the kernel tried to mprotect (or we can say, allocate) from the "STACK GUARD" region. So where does this "STACK GUARD" region comes from? Let's see this:

    bsd/kern/kern_exec.c, in create_unix_stack function (where the kernel creates the stack for a new task, I assume) :

    ...
    #define unix_stack_size(p)  (p->p_rlimit[RLIMIT_STACK].rlim_cur)
    ...
        if (load_result->user_stack_size == 0) {
          load_result->user_stack_size = unix_stack_size(p);
          prot_size = mach_vm_trunc_page(size - load_result->user_stack_size);
        } else {
          prot_size = PAGE_SIZE;
        }
    
        prot_addr = addr;
        kr = mach_vm_protect(map,
                 prot_addr,
                 prot_size,
                 FALSE,
                 VM_PROT_NONE);
      ...
    

    So that comes my conclusion: if the binary has a specified default stack size, this load_result->user_stack_size would not be zero (this should be set somewhere inside the mach-o parser/loader, I guess), so the kernel will only map a small page for the "STACK GUARD" region, otherwise the kernel will use the current stack size soft limit (inherited from the parent) as the user_stack_size and calculates a prot_size, which should be (rlim_max - rlim_cur). And of course, the python3 binary was built with default stack size, so the kernel does not provide a huge enough "STACK GUARD" region for the setrlimit syscall to allot more stack space than the default stack size.

    @methane
    Copy link
    Member

    methane commented Mar 26, 2019

    https://bugs.python.org/issue34602 may be relating to this.

    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented Mar 30, 2019

    I guess Inada Naoki was to say https://bugs.python.org/issue36432 in the last comment.

    @ned-deily
    Copy link
    Member

    New changeset 883dfc6 by Ned Deily in branch 'master':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-13011)
    883dfc6

    @miss-islington
    Copy link
    Contributor

    New changeset 52a5b71 by Miss Islington (bot) in branch '3.7':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-13011)
    52a5b71

    @ned-deily
    Copy link
    Member

    New changeset fbe2a13 by Ned Deily (Miss Islington (bot)) in branch '3.6':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-13011) (GH-13014)
    fbe2a13

    @ned-deily
    Copy link
    Member

    Thanks for the analyses everyone! Also see the discussion in duplicate bpo-36432. I was never able to reproduce the failure on earlier versions of macOS but then it seemed to become a hard failure with the release of 10.14.4. I haven't gone back and tried running the tests on all supported older versions with this reversion in place but those I have did not exhibit any new failures. So we should keep an eye open for reports of segfaults running tests as originally reported in bpo-18075. But better that then not being able to run any tests. "Fixed" in 3.8.0a4, 3.7.4, and 3.6.9 (to allow tests to be run on macOS) by reverting the change for bpo-18075.

    @ned-deily ned-deily added the 3.8 only security fixes label Apr 29, 2019
    @ned-deily
    Copy link
    Member

    So we should keep an eye open for reports of segfaults running tests as originally reported in bpo-18075.

    Now there are such reports for 3.7.4rc1, at least. Release builds seem to be OK but at least four tests now fail on macOS 10.14.5 when built --with-pydebug: test_exceptions test_json test_logging test_sys test_traceback. Plus, changing the interpreters stack size can inhibit use of dtrace.

    There are really two attempts at dealing with macOS's small default stack size. The original workaround, dating back to 2002-12-02 (bb48465 !!) added the runtime calls in regrtest to change RLIMIT_STACK. Years later, a different approach was taken in the original change for bpo-18075 (335ab5b) by increasing the default stack size when building the interpreter rather than at runtime. So, I *think* that means that the original regrtest runtime workaround is really no longer needed. So I think we should take the opposite approach to what I originally merged back in April, that is, we should go back to building the interpreter with the increased stack size and remove the ancient regrtest workaround attempt: that's what was failing here anyway. The only place in the source base that resource.RLIMIT_STACK is used, outside of its functional test, is the old regrtest workaround. It's still a bit of a mystery as what has changed in 3.8 that seems to not hit the stack size limit but these kind of test failures on macOS have always been dependent on other factors, including compiler version and options. I'd like to get this into 3.8.0b2 and 3.7.4rc2 ao I'm temporarily making it a "release blocker"; the PR will follow momentarily.

    @ned-deily ned-deily added 3.9 only security fixes release-blocker labels Jul 2, 2019
    @ned-deily ned-deily reopened this Jul 2, 2019
    @ned-deily
    Copy link
    Member

    New changeset 5bbbc73 by Ned Deily in branch 'master':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546)
    5bbbc73

    @miss-islington
    Copy link
    Contributor

    New changeset bd92b94 by Miss Islington (bot) in branch '3.8':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546)
    bd92b94

    @miss-islington
    Copy link
    Contributor

    New changeset bf82cd3 by Miss Islington (bot) in branch '3.7':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546)
    bf82cd3

    @ned-deily
    Copy link
    Member

    New changeset 782854f by Ned Deily (Miss Islington (bot)) in branch '3.6':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546) (GH-14549)
    782854f

    @ned-deily
    Copy link
    Member

    New changeset dcc0eb3 by Ned Deily (Miss Islington (bot)) in branch '3.7':
    bpo-34602: Avoid failures setting macOS stack resource limit (GH-14546)
    dcc0eb3

    @ambv
    Copy link
    Contributor

    ambv commented Sep 3, 2021

    New changeset be9de87 by Łukasz Langa in branch 'main':
    bpo-34602: Quadruple stack size on macOS when compiling with UBSAN (GH-27309)
    be9de87

    @ambv
    Copy link
    Contributor

    ambv commented Sep 15, 2021

    New changeset 2563dd2 by Łukasz Langa in branch '3.10':
    [3.10] bpo-34602: Quadruple stack size on macOS when compiling with UBSAN (GH-27309) (GH-28280)
    2563dd2

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Jan 5, 2022

    My understanding of the resolution of this ticket is that it is still not possible to use setrlimit with RLIMIT_STACK to raise the soft stack limit. Is that correct?

    In that case, the original bug report still seems valid and unresolved (and indeed, I am porting a project from Python 2.7 to Python 3.9 and on macOS it fails because it cannot raise the stack limit).

    @ronaldoussoren
    Copy link
    Contributor

    My understanding of the resolution of this ticket is that it is still not possible to use setrlimit with RLIMIT_STACK to raise the soft stack limit. Is that correct?

    Yes, the code in msg324731 still fails. We're still using the mechanism described in msg324818, but with a larger stack size for some builds.

    It is rather annoying that -Wl,-stack_size,NNNN sets a hard limit on the stack size, rather than overriding the soft limit.

    I guess we could change the startup code for the interpreter executable (Py_Main or related code) to set the RLIMIT_STACK to a larger value when it is too small, that way applications can still pick a different (and in particular larger) value.

    @ned.deily, @lukasz.langa: reopen this issue or open a new one?

    @ronaldoussoren ronaldoussoren added 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Jan 6, 2022
    @ned-deily
    Copy link
    Member

    @ned.deily, @lukasz.langa: reopen this issue or open a new one?

    Since there are so many iterations on this issue already, I think a new issue would be better.

    @ned-deily
    Copy link
    Member

    New changeset b962544 by Miss Islington (bot) in branch '3.10':
    bpo-34602: Fix unportable test(1) operator in configure script (GH-30490) (GH-30491)
    b962544

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ChannyClaus
    Copy link

    hi! sorry if i'm not being blind but is the new issue linked somewhere? i just ran into the same issue on python 3.12 😭

    $ cat test.py
    import resource
    s, h = resource.getrlimit(resource.RLIMIT_STACK)
    resource.setrlimit(resource.RLIMIT_STACK, (h, h))
    chan.kang@Chans-MacBook-Pro ~/test/resource - 
    $ python test.py
    Traceback (most recent call last):
      File "/Users/chan.kang/test/resource/test.py", line 3, in <module>
        resource.setrlimit(resource.RLIMIT_STACK, (h, h))
    ValueError: current limit exceeds maximum limit
    chan.kang@Chans-MacBook-Pro ~/test/resource - 
    $ python --version
    Python 3.12.1
    chan.kang@Chans-MacBook-Pro ~/test/resource - 
    $ uname -a
    Darwin Chans-MacBook-Pro.local 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:21:53 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6020 arm64
    

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-mac stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants