classification
Title: Segmentation fault with invalid Unicode command-line arguments in embedded Python
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, itaibn, ncoghlan, vstinner
Priority: normal Keywords:

Created on 2015-11-16 01:06 by itaibn, last changed 2019-09-26 00:53 by vstinner. This issue is now closed.

Messages (3)
msg254703 - (view) Author: Itai Bar-Natan (itaibn) Date: 2015-11-16 01:06
The following embedded application, which calls Py_Main with a "-W X" argument where X is not a valid Unicode string, returns a segmentation fault:

```
#include "Python.h"

main() {
    wchar_t *invalid_str;
    invalid_str = malloc(2*sizeof(wchar_t));
    invalid_str[0] = 0x110000;
    invalid_str[1] = 0;
    wchar_t *argv[4] = {L"embedded-python", L"-W", invalid_str, NULL};

    Py_Main(3, argv);
}
```

This segmentation fault is present in Python 3.4, 3.5, and the latest development branch I downloaded, but is not present in Python 3.2. This program is obviously invalid and it may be reasonable to emit a fatal error in this situation, but it should not give a segmentation fault.

I believe the issue is that this codes leads to exception being thrown before exceptions are initialized, and more specifically, a call to PyExceptionClass_Check() within PyErr_Object() reads a NULL pointer. I haven't tested this but I expect that this problem would not appear when calling Python directly since Python sanitizes the command line arguments from main(). Nonetheless even here the possibility of other exceptions being raised early in the initialization sequence remains a potential problem.
msg254739 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-11-16 16:32
The interpreter isn't initialized, so calling PyErr_Format in a release build segfaults when it tries to dereference a NULL PyThreadState. OTOH, a debug build should call PyThreadState_Get, which in this case calls Py_FatalError and aborts the process. Unfortunately 3.5.0+ debug builds don't call PyThreadState_Get due to the fix for issue 25150.

> the possibility of other exceptions being raised early in the 
> initialization sequence remains a potential problem.

PEP 432 proposes a pre-initialization phase that sets a valid Python thread state.
msg353246 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-26 00:53
This bug has been fixed in the implementation of the PEP 587.

$ gcc x.c -l python3.8 $(pkg-config python-3.8 --cflags --libs) -o x
$ ./x
Fatal Python error: pyinit_main: can't finish initializing sys
ValueError: character U+110000 is not in range [U+0000; U+10ffff]

Current thread 0x00007fd23364e6c0 (most recent call first):

=> it doesn't crash anymore, but write a error message and exit.
History
Date User Action Args
2019-09-26 00:53:26vstinnersetstatus: open -> closed

versions: + Python 3.8, Python 3.9, - Python 3.4, Python 3.5, Python 3.6
nosy: + vstinner

messages: + msg353246
resolution: fixed
stage: resolved
2015-11-16 16:32:54eryksunsetnosy: + ncoghlan, eryksun
messages: + msg254739
2015-11-16 01:06:05itaibncreate