This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: segfault on PyUnicode_DecodeFSDefaultAndSize for uninitialized Py
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: anthony shaw, vstinner
Priority: normal Keywords:

Created on 2019-03-21 03:52 by anthony shaw, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg338521 - (view) Author: anthony shaw (anthony shaw) Date: 2019-03-21 03:52
If for whatever reason, Py_Initialize() has not been run or failed to run, any call to Py_CompileStringFlags will call PyUnicode_DecodeFSDefault and the reference to interp will be NULL.

There is currently no null reference check in PyUnicode_DecodeFSDefaultAndSize which causes a segfault.

https://github.com/python/cpython/blob/master/Objects/unicodeobject.c#L3736-L3737 is the offending line.

It might be better to catch the null pointer and raise an unrecoverable error there?

Error: signal 11:
0   ceval-prof                          0x00000001066310f3 handler + 35
1   libsystem_platform.dylib            0x00007fff6adddb3d _sigtramp + 29
2   ???                                 0x0000000000000000 0x0 + 0
3   ceval-prof                          0x0000000106734536 PyUnicode_DecodeFSDefault + 38
4   ceval-prof                          0x0000000106879514 Py_CompileStringExFlags + 36
5   ceval-prof                          0x0000000106631280 main + 320
6   libdyld.dylib                       0x00007fff6abf2ed9 start + 1
msg338528 - (view) Author: anthony shaw (anthony shaw) Date: 2019-03-21 07:47
This applies to PyUnicode_EncodeFSDefault as well, it has the same issue
msg338529 - (view) Author: anthony shaw (anthony shaw) Date: 2019-03-21 07:48
I'm expecting a "this is not a bug, why would the interpreter not be initialized", but it would be nice to get a friendly error message since this is a public API.
IF so, am also happy to submit a PR with a fix
msg338532 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-21 09:19
> I'm expecting a "this is not a bug, why would the interpreter not be initialized",

this is not a bug, why would the interpreter not be initialized, as documented at:
https://docs.python.org/dev/c-api/init.html

> but it would be nice to get a friendly error message since this is a public API.
IF so, am also happy to submit a PR with a fix

Python has a *very large* C API. It doesn't seem worth it to me to modify every single Python function to detect when the API is misused.

There is maybe room for some clever tricks for some specific cases. For example, modify PyMem_Malloc and PyObject_Malloc default allocators to have an implementation with calls Py_FatalError() with a clear error message like "Py_Initialize must be called before using the Python C API". I modified Python internals to only use PyMem_RawMalloc during early Python initialization, and we can reconfigure PyMem_Malloc and PyObject_Malloc during Py_Initialize().

But... according to your backtrace, it wouldn't help for your exact use case: call PyUnicode_DecodeFSDefault() before Py_Initialize(). The crash occurs before PyMem_Malloc or PyObject_Malloc is called. I don't see any other clever tricks which would have no impact on performance when the API is properly used.


... I suggest to close this issue.
msg338533 - (view) Author: anthony shaw (anthony shaw) Date: 2019-03-21 09:28
This is because PyUnicode_DecodeFSDefaultAndSize calls _PyInterpreterState_GET_UNSAFE(), which already documents the potential NULL return value. 

/* Get the current interpreter state.

   The macro is unsafe: it does not check for error and it can return NULL.

   The caller must hold the GIL.

   See also _PyInterpreterState_Get()
   and _PyGILState_GetInterpreterStateUnsafe(). */
#define _PyInterpreterState_GET_UNSAFE() (_PyThreadState_GET()->interp)

> Python has a *very large* C API. It doesn't seem worth it to me to modify every single Python function to detect when the API is misused.

Understood, happy for this to be closed. Aware that I was misusing the API :-)
msg338534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-21 09:32
> This is because PyUnicode_DecodeFSDefaultAndSize calls _PyInterpreterState_GET_UNSAFE(), which already documents the potential NULL return value. 

_PyInterpreterState_GET_UNSAFE() is preferred over other functions getting the interpreter for best performances. _PyInterpreterState_GET_UNSAFE() is the most efficient way to access the interpreter. That's why I was talking about the cost of additional checks (to detect API misuage) at runtime.

> Understood, happy for this to be closed. Aware that I was misusing the API :-)

I close the issue.
History
Date User Action Args
2022-04-11 14:59:12adminsetgithub: 80567
2019-03-21 09:32:26vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg338534

stage: resolved
2019-03-21 09:28:42anthony shawsetmessages: + msg338533
2019-03-21 09:19:43vstinnersetmessages: + msg338532
2019-03-21 08:09:15SilentGhostsetnosy: + vstinner
type: crash
2019-03-21 07:48:32anthony shawsetmessages: + msg338529
2019-03-21 07:47:37anthony shawsetmessages: + msg338528
2019-03-21 03:52:10anthony shawcreate