Author marche147
Recipients marche147, ned.deily, ronaldoussoren, v2m
Date 2018-09-08.09:05:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1536397528.28.0.56676864532.issue34602@psf.upfronthosting.co.za>
In-reply-to
Content
Thanks for the repro! It did help for pinpointing the issue.

So I took a little spare time and dived into xnu kernel code, here is my assumption based on what I found (N.B. : My assumption comes from a simple experiment and a brief skim of the source code within 15 minutes or less, it could be seriously wrong since I'm not an expert of XNU kernel, and I currently don't have the time to build and debug it.) :

In bsd/kern/kern_resource.c, there's a function `dosetrlimit` which handles the `setrlimit` request, and here's part of it:

```
  case RLIMIT_STACK:
    // ...
    if (limp->rlim_cur > alimp->rlim_cur) {
      user_addr_t addr;
      user_size_t size;

        /* grow stack */
        size = round_page_64(limp->rlim_cur);
        size -= round_page_64(alimp->rlim_cur);

      addr = p->user_stack - round_page_64(limp->rlim_cur);
      kr = mach_vm_protect(current_map(),
               addr, size,
               FALSE, VM_PROT_DEFAULT);
      if (kr != KERN_SUCCESS) {
        error =  EINVAL;
        goto out;
      }
    } // ...

```

As we can see, the kernel will try to `mprotect` the memory preceding the stack to `VM_PROT_DEFAULT` (presumably read & write). I then used `vmmap` to see the difference between two binaries compiled with different commands. And the results are : 

1. Binary compiled without default stack size:

```
- Before calling setrlimit

...
STACK GUARD            00007ffee76d9000-00007ffeeaed9000 [ 56.0M     0K     0K     0K] ---/rwx SM=NUL          stack guard for thread 0
...
Stack                  00007ffeeaed9000-00007ffeeb6d9000 [ 8192K    20K    20K     0K] rw-/rwx SM=PRV          thread 0
...
                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
===========                     ======= ========    =====  ======= ========   ======    =====  =======
Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        2
MALLOC guard page                   16K       0K       0K       0K       0K       0K       0K        5
MALLOC metadata                     60K      60K      60K       0K       0K       0K       0K        6
MALLOC_SMALL                      16.0M      16K      16K       0K       0K       0K       0K        3         see MALLOC ZONE table below
MALLOC_TINY                       2048K      32K      32K       0K       0K       0K       0K        3         see MALLOC ZONE table below
STACK GUARD                       56.0M       0K       0K       0K       0K       0K       0K        2
Stack                             8192K      20K      20K       0K       0K       0K       0K        2
__DATA                            2324K    1192K     208K       0K       0K       0K       0K       43
__LINKEDIT                       192.7M    21.7M       0K       0K       0K       0K       0K        4
__TEXT                            9448K    8224K       0K       0K       0K       0K       0K       48
shared memory                        8K       8K       8K       0K       0K       0K       0K        3
===========                     ======= ========    =====  ======= ========   ======    =====  =======
TOTAL                            286.3M    31.0M     348K       0K       0K       0K       0K      110
...

- After calling setrlimit

...
STACK GUARD            00007ffee76d9000-00007ffee76da000 [    4K     0K     0K     0K] ---/rwx SM=NUL          stack guard for thread 0
...
Stack                  00007ffee76da000-00007ffeeaed9000 [ 56.0M     0K     0K     0K] rw-/rwx SM=NUL          thread 0
Stack                  00007ffeeaed9000-00007ffeeb6d9000 [ 8192K    20K    20K     0K] rw-/rwx SM=PRV          thread 0
...
                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
===========                     ======= ========    =====  ======= ========   ======    =====  =======
Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        2
MALLOC guard page                   16K       0K       0K       0K       0K       0K       0K        5
MALLOC metadata                     60K      60K      60K       0K       0K       0K       0K        6
MALLOC_SMALL                      16.0M      20K      20K       0K       0K       0K       0K        3         see MALLOC ZONE table below
MALLOC_TINY                       2048K      32K      32K       0K       0K       0K       0K        3         see MALLOC ZONE table below
STACK GUARD                          4K       0K       0K       0K       0K       0K       0K        2
Stack                             64.0M      20K      20K       0K       0K       0K       0K        3
__DATA                            2324K    1192K     208K       0K       0K       0K       0K       43
__LINKEDIT                       192.7M    21.7M       0K       0K       0K       0K       0K        4
__TEXT                            9448K    8224K       0K       0K       0K       0K       0K       48
shared memory                        8K       8K       8K       0K       0K       0K       0K        3
===========                     ======= ========    =====  ======= ========   ======    =====  =======
TOTAL                            286.3M    31.0M     352K       0K       0K       0K       0K      111
...
```

2. Binary compiled with default stack size:

```
Before calling setrlimit :
...
STACK GUARD            00007ffee09c2000-00007ffee09c3000 [    4K     0K     0K     0K] ---/rwx SM=NUL          stack guard for thread 0
...
Stack                  00007ffee09c3000-00007ffee19c3000 [ 16.0M    20K    20K     0K] rw-/rwx SM=PRV          thread 0
...
                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
===========                     ======= ========    =====  ======= ========   ======    =====  =======
Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        2
MALLOC guard page                   16K       0K       0K       0K       0K       0K       0K        5
MALLOC metadata                     60K      60K      60K       0K       0K       0K       0K        6
MALLOC_SMALL                      8192K      12K      12K       0K       0K       0K       0K        2         see MALLOC ZONE table below
MALLOC_TINY                       1024K      20K      20K       0K       0K       0K       0K        2         see MALLOC ZONE table below
STACK GUARD                          4K       0K       0K       0K       0K       0K       0K        2
Stack                             16.0M      20K      20K       0K       0K       0K       0K        2
__DATA                            2324K    1192K     208K       0K       0K       0K       0K       43
__LINKEDIT                       192.7M    22.3M       0K       0K       0K       0K       0K        4
__TEXT                            9448K    8232K       0K       0K       0K       0K       0K       48
shared memory                        8K       8K       8K       0K       0K       0K       0K        3
===========                     ======= ========    =====  ======= ========   ======    =====  =======
TOTAL                            229.3M    31.7M     332K       0K       0K       0K       0K      108
```

As we can see, it seems that the kernel tried to `mprotect` (or we can say, allocate) from the "STACK GUARD" region. So where does this "STACK GUARD" region comes from? Let's see this:

bsd/kern/kern_exec.c, in `create_unix_stack` function (where the kernel creates the stack for a new task, I assume) :

```
...
#define unix_stack_size(p)  (p->p_rlimit[RLIMIT_STACK].rlim_cur)
...
    if (load_result->user_stack_size == 0) {
      load_result->user_stack_size = unix_stack_size(p);
      prot_size = mach_vm_trunc_page(size - load_result->user_stack_size);
    } else {
      prot_size = PAGE_SIZE;
    }

    prot_addr = addr;
    kr = mach_vm_protect(map,
             prot_addr,
             prot_size,
             FALSE,
             VM_PROT_NONE);
  ...
```

So that comes my conclusion: if the binary has a specified default stack size, this `load_result->user_stack_size` would not be zero (this should be set somewhere inside the mach-o parser/loader, I guess), so the kernel will only map a small page for the "STACK GUARD" region, otherwise the kernel will use the current stack size soft limit (inherited from the parent) as the `user_stack_size` and calculates a `prot_size`, which should be (rlim_max - rlim_cur). And of course, the python3 binary was built with default stack size, so the kernel does not provide a huge enough "STACK GUARD" region for the `setrlimit` syscall to allot more stack space than the default stack size.
History
Date User Action Args
2018-09-08 09:05:28marche147setrecipients: + marche147, ronaldoussoren, ned.deily, v2m
2018-09-08 09:05:28marche147setmessageid: <1536397528.28.0.56676864532.issue34602@psf.upfronthosting.co.za>
2018-09-08 09:05:28marche147linkissue34602 messages
2018-09-08 09:05:27marche147create