Message324824
Thanks for the repro! It did help for pinpointing the issue.
So I took a little spare time and dived into xnu kernel code, here is my assumption based on what I found (N.B. : My assumption comes from a simple experiment and a brief skim of the source code within 15 minutes or less, it could be seriously wrong since I'm not an expert of XNU kernel, and I currently don't have the time to build and debug it.) :
In bsd/kern/kern_resource.c, there's a function `dosetrlimit` which handles the `setrlimit` request, and here's part of it:
```
case RLIMIT_STACK:
// ...
if (limp->rlim_cur > alimp->rlim_cur) {
user_addr_t addr;
user_size_t size;
/* grow stack */
size = round_page_64(limp->rlim_cur);
size -= round_page_64(alimp->rlim_cur);
addr = p->user_stack - round_page_64(limp->rlim_cur);
kr = mach_vm_protect(current_map(),
addr, size,
FALSE, VM_PROT_DEFAULT);
if (kr != KERN_SUCCESS) {
error = EINVAL;
goto out;
}
} // ...
```
As we can see, the kernel will try to `mprotect` the memory preceding the stack to `VM_PROT_DEFAULT` (presumably read & write). I then used `vmmap` to see the difference between two binaries compiled with different commands. And the results are :
1. Binary compiled without default stack size:
```
- Before calling setrlimit
...
STACK GUARD 00007ffee76d9000-00007ffeeaed9000 [ 56.0M 0K 0K 0K] ---/rwx SM=NUL stack guard for thread 0
...
Stack 00007ffeeaed9000-00007ffeeb6d9000 [ 8192K 20K 20K 0K] rw-/rwx SM=PRV thread 0
...
VIRTUAL RESIDENT DIRTY SWAPPED VOLATILE NONVOL EMPTY REGION
REGION TYPE SIZE SIZE SIZE SIZE SIZE SIZE SIZE COUNT (non-coalesced)
=========== ======= ======== ===== ======= ======== ====== ===== =======
Kernel Alloc Once 8K 4K 4K 0K 0K 0K 0K 2
MALLOC guard page 16K 0K 0K 0K 0K 0K 0K 5
MALLOC metadata 60K 60K 60K 0K 0K 0K 0K 6
MALLOC_SMALL 16.0M 16K 16K 0K 0K 0K 0K 3 see MALLOC ZONE table below
MALLOC_TINY 2048K 32K 32K 0K 0K 0K 0K 3 see MALLOC ZONE table below
STACK GUARD 56.0M 0K 0K 0K 0K 0K 0K 2
Stack 8192K 20K 20K 0K 0K 0K 0K 2
__DATA 2324K 1192K 208K 0K 0K 0K 0K 43
__LINKEDIT 192.7M 21.7M 0K 0K 0K 0K 0K 4
__TEXT 9448K 8224K 0K 0K 0K 0K 0K 48
shared memory 8K 8K 8K 0K 0K 0K 0K 3
=========== ======= ======== ===== ======= ======== ====== ===== =======
TOTAL 286.3M 31.0M 348K 0K 0K 0K 0K 110
...
- After calling setrlimit
...
STACK GUARD 00007ffee76d9000-00007ffee76da000 [ 4K 0K 0K 0K] ---/rwx SM=NUL stack guard for thread 0
...
Stack 00007ffee76da000-00007ffeeaed9000 [ 56.0M 0K 0K 0K] rw-/rwx SM=NUL thread 0
Stack 00007ffeeaed9000-00007ffeeb6d9000 [ 8192K 20K 20K 0K] rw-/rwx SM=PRV thread 0
...
VIRTUAL RESIDENT DIRTY SWAPPED VOLATILE NONVOL EMPTY REGION
REGION TYPE SIZE SIZE SIZE SIZE SIZE SIZE SIZE COUNT (non-coalesced)
=========== ======= ======== ===== ======= ======== ====== ===== =======
Kernel Alloc Once 8K 4K 4K 0K 0K 0K 0K 2
MALLOC guard page 16K 0K 0K 0K 0K 0K 0K 5
MALLOC metadata 60K 60K 60K 0K 0K 0K 0K 6
MALLOC_SMALL 16.0M 20K 20K 0K 0K 0K 0K 3 see MALLOC ZONE table below
MALLOC_TINY 2048K 32K 32K 0K 0K 0K 0K 3 see MALLOC ZONE table below
STACK GUARD 4K 0K 0K 0K 0K 0K 0K 2
Stack 64.0M 20K 20K 0K 0K 0K 0K 3
__DATA 2324K 1192K 208K 0K 0K 0K 0K 43
__LINKEDIT 192.7M 21.7M 0K 0K 0K 0K 0K 4
__TEXT 9448K 8224K 0K 0K 0K 0K 0K 48
shared memory 8K 8K 8K 0K 0K 0K 0K 3
=========== ======= ======== ===== ======= ======== ====== ===== =======
TOTAL 286.3M 31.0M 352K 0K 0K 0K 0K 111
...
```
2. Binary compiled with default stack size:
```
Before calling setrlimit :
...
STACK GUARD 00007ffee09c2000-00007ffee09c3000 [ 4K 0K 0K 0K] ---/rwx SM=NUL stack guard for thread 0
...
Stack 00007ffee09c3000-00007ffee19c3000 [ 16.0M 20K 20K 0K] rw-/rwx SM=PRV thread 0
...
VIRTUAL RESIDENT DIRTY SWAPPED VOLATILE NONVOL EMPTY REGION
REGION TYPE SIZE SIZE SIZE SIZE SIZE SIZE SIZE COUNT (non-coalesced)
=========== ======= ======== ===== ======= ======== ====== ===== =======
Kernel Alloc Once 8K 4K 4K 0K 0K 0K 0K 2
MALLOC guard page 16K 0K 0K 0K 0K 0K 0K 5
MALLOC metadata 60K 60K 60K 0K 0K 0K 0K 6
MALLOC_SMALL 8192K 12K 12K 0K 0K 0K 0K 2 see MALLOC ZONE table below
MALLOC_TINY 1024K 20K 20K 0K 0K 0K 0K 2 see MALLOC ZONE table below
STACK GUARD 4K 0K 0K 0K 0K 0K 0K 2
Stack 16.0M 20K 20K 0K 0K 0K 0K 2
__DATA 2324K 1192K 208K 0K 0K 0K 0K 43
__LINKEDIT 192.7M 22.3M 0K 0K 0K 0K 0K 4
__TEXT 9448K 8232K 0K 0K 0K 0K 0K 48
shared memory 8K 8K 8K 0K 0K 0K 0K 3
=========== ======= ======== ===== ======= ======== ====== ===== =======
TOTAL 229.3M 31.7M 332K 0K 0K 0K 0K 108
```
As we can see, it seems that the kernel tried to `mprotect` (or we can say, allocate) from the "STACK GUARD" region. So where does this "STACK GUARD" region comes from? Let's see this:
bsd/kern/kern_exec.c, in `create_unix_stack` function (where the kernel creates the stack for a new task, I assume) :
```
...
#define unix_stack_size(p) (p->p_rlimit[RLIMIT_STACK].rlim_cur)
...
if (load_result->user_stack_size == 0) {
load_result->user_stack_size = unix_stack_size(p);
prot_size = mach_vm_trunc_page(size - load_result->user_stack_size);
} else {
prot_size = PAGE_SIZE;
}
prot_addr = addr;
kr = mach_vm_protect(map,
prot_addr,
prot_size,
FALSE,
VM_PROT_NONE);
...
```
So that comes my conclusion: if the binary has a specified default stack size, this `load_result->user_stack_size` would not be zero (this should be set somewhere inside the mach-o parser/loader, I guess), so the kernel will only map a small page for the "STACK GUARD" region, otherwise the kernel will use the current stack size soft limit (inherited from the parent) as the `user_stack_size` and calculates a `prot_size`, which should be (rlim_max - rlim_cur). And of course, the python3 binary was built with default stack size, so the kernel does not provide a huge enough "STACK GUARD" region for the `setrlimit` syscall to allot more stack space than the default stack size. |
|
Date |
User |
Action |
Args |
2018-09-08 09:05:28 | marche147 | set | recipients:
+ marche147, ronaldoussoren, ned.deily, v2m |
2018-09-08 09:05:28 | marche147 | set | messageid: <1536397528.28.0.56676864532.issue34602@psf.upfronthosting.co.za> |
2018-09-08 09:05:28 | marche147 | link | issue34602 messages |
2018-09-08 09:05:27 | marche147 | create | |
|