classification
Title: Trashcan mechanism segfault during interpreter finalization in Python 2.7.4
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: Stian Lode, amaury.forgeotdarc, benjamin.peterson, flex.plexico, larry, lemburg, m_python, pitrou, python-dev
Priority: release blocker Keywords: patch

Created on 2013-04-12 13:43 by lemburg, last changed 2015-08-13 17:33 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
tstate_trashcan.patch pitrou, 2013-04-12 18:00
bt.txt Stian Lode, 2015-08-12 11:38 Stacktrace of segfault.
Messages (21)
msg186627 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 13:43
A user reported a segfault when using our mxURL extension with Python 2.7.4. Previous Python versions do not show this behavior, so it's new in Python 2.7.4.

The extension uses an Py_AtExit() function to clean up among other things a reference to a dictionary from another module. The dictionary deallocation then causes the segfault:

$ gdb /usr/local/bin/python2.7
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/bin/python2.7...done.
(gdb) r
Starting program: /usr/local/bin/python2.7
[Thread debugging using libthread_db enabled]
Python 2.7.4 (default, Apr  8 2013, 15:51:19)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mx.URL
>>> import sys
>>> sys.exit()

Program received signal SIGSEGV, Segmentation fault.
0x08091201 in dict_dealloc (mp=0xb7b2813c) at Objects/dictobject.c:1005
1005        Py_TRASHCAN_SAFE_BEGIN(mp)
(gdb) bt
#0  0x08091201 in dict_dealloc (mp=0xb7b2813c) at Objects/dictobject.c:1005
#1  0xb7875928 in mxURLModule_Cleanup () at mx/URL/mxURL/mxURL.c:2789
#2  0x0810553f in call_ll_exitfuncs () at Python/pythonrun.c:1763
#3  Py_Finalize () at Python/pythonrun.c:554
#4  0x08104bac in Py_Exit () at Python/pythonrun.c:1772
#5  handle_system_exit () at Python/pythonrun.c:1146
#6  0x0810517d in PyErr_PrintEx (set_sys_last_vars=<value optimized out>)
at Python/pythonrun.c:1156
#7  0x081058dd in PyRun_InteractiveOneFlags (fp=0xb7f8b420,
filename=0x815f9c0 "<stdin>", flags=0xbffffc4c) at Python/pythonrun.c:855
#8  0x08105a68 in PyRun_InteractiveLoopFlags (fp=0xb7f8b420,
filename=0x815f9c0 "<stdin>", flags=0xbffffc4c) at Python/pythonrun.c:772
#9  0x081062f2 in PyRun_AnyFileExFlags (fp=0xb7f8b420, filename=0x815f9c0
"<stdin>", closeit=0, flags=0xbffffc4c) at Python/pythonrun.c:741
#10 0x0805bb59 in Py_Main (argc=1, argv=0xbffffd34) at Modules/main.c:640
#11 0x0805abeb in main (argc=1, argv=0xbffffd34) at ./Modules/python.c:23
(gdb)
msg186628 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 13:45
After a closer look at recent checkins, I found this checking for the trash can mechanism: 5a2ef447b80d (ticket #13992).

This appears to be the cause:

    1.20  #define Py_TRASHCAN_SAFE_BEGIN(op) \
    1.21 -    if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \
    1.22 -        ++_PyTrash_delete_nesting;
    1.23 -        /* The body of the deallocator is here. */
    1.24 +    do { \
    1.25 +        PyThreadState *_tstate = PyThreadState_GET(); \
    1.26 +        if (_tstate->trash_delete_nesting < PyTrash_UNWIND_LEVEL) { \
    1.27 +            ++_tstate->trash_delete_nesting;
    1.28 +            /* The body of the deallocator is here. */

At the time the Py_AtExit functions are called, the thread state
is NULL, so the if (_tstate->...) segfaults.
msg186629 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 13:47
I think a solution to the problem would be to check _tstate for NULL and only use it if it is non-NULL - without threads you don't need to keep track of them ;-)
msg186630 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-04-12 14:00
> At the time the Py_AtExit functions are called, the thread state is NULL

I'd say this is the root cause. It's a bad thing to call Py_DECREF without a thread state.
Was it the case in previous versions?
msg186636 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 14:23
On 12.04.2013 16:00, Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc added the comment:
> 
>> At the time the Py_AtExit functions are called, the thread state is NULL
> 
> I'd say this is the root cause. It's a bad thing to call Py_DECREF without a thread state.
> Was it the case in previous versions?

The extension has been using this ever since it was written many
years ago and without any problems in all Python versions up, but not
including 2.7.4. It's the only way to do module cleanup in Python 2.x.

So far, we've only seen the problem for dictionaries that
are DECREFed.

I know that things may go wrong when DECREF'ing objects this late in
the finalization process, but still expect Python to handle errors
gracefully without segfaulting, as it has been the case for all
previous Python versions.
msg186644 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 15:32
Judging by the fact that the Py_AtExit funcs are called at the very end of Py_Finalize (after absolutely everything has been cleaned up), I think you shouldn't use any Python facilities at this point.
msg186645 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 15:39
By the way, why do you try to clean up the dictionary at this point? Any non-trivial deallocation code (e.g. __del__ method) will fail running.
msg186646 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 15:41
On 12.04.2013 17:32, Antoine Pitrou wrote:
> 
> Judging by the fact that the Py_AtExit funcs are called at the very end of Py_Finalize (after absolutely everything has been cleaned up), I think you shouldn't use any Python facilities at this point.

As mentioned earlier: this is the only way to cleanup extension modules
in Python 2.x and the trash can patch broke this.

> By the way, why do you try to clean up the dictionary at this point? Any non-trivial deallocation
code (e.g. __del__ method) will fail running.

The dictionaries just contain simple Python types that don't
have __del__ methods, so deallocation works fine and it prevents
memory leaks when embedding the Python interpreter and restarting
it several times.
msg186647 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 15:48
> As mentioned earlier: this is the only way to cleanup extension
> modules
> in Python 2.x and the trash can patch broke this.

Well, the doc clearly says
"Since Python’s internal finalization will have completed before
the cleanup function, no Python APIs should be called by func".

Why wouldn't you clean up the dict (if it exists) at module
initialization instead? You will have a working Python
runtime at this point.

That said, we could hack a 2.7-specific hack to the patch, in
order to disable the trashcan mechanism at shutdown.
msg186656 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 18:00
Marc-André, does this patch work for you?
msg186915 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-14 11:59
On 12.04.2013 20:00, Antoine Pitrou wrote:
> 
> Marc-André, does this patch work for you?
> 
> Added file: http://bugs.python.org/file29791/tstate_trashcan.patch

Thanks, Antoine. I can try this tomorrow.
msg186919 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-04-14 13:26
Yes, let's not break thing in point releases.
msg186975 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-15 11:20
Checked the patch: it fixes the problem. Thanks.

Will this go into Python 2.7.5 ?

I'm asking because we need to issue a patch level release of egenix-mx-base and if Python 2.7.5 will fix the problem, we'll just add the work-around for Python 2.7.4.

Regarding a better cleanup logic: This is available in Python 3.x with the new module init logic and we'll use it there.

PS: The approach with doing the cleanup at module startup time won't work, because the still existing objects will reference parts of the already cleaned up interpreter instance.
msg187015 - (view) Author: Roundup Robot (python-dev) Date: 2013-04-15 19:20
New changeset e814fbd470bf by Antoine Pitrou in branch '2.7':
Issue #17703: Fix a regression where an illegal use of Py_DECREF() after interpreter finalization can cause a crash.
http://hg.python.org/cpython/rev/e814fbd470bf
msg187016 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-15 19:21
Committed!
msg187020 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-15 19:26
On 15.04.2013 21:21, Antoine Pitrou wrote:
> 
> Committed!

Cool, thanks for the quick turnaround.
msg224511 - (view) Author: Guillaume Matte (flex.plexico) Date: 2014-08-01 17:42
It does still cause a Fatal Error in debug build as PyThreadState_GET() error out if _PyThreadState_Current is NULL
msg241412 - (view) Author: Daniel (m_python) Date: 2015-04-18 12:06
Guillaume already mentioned this, its still causing a Fatal Error. To fix this PyThreadState_GET() in Py_TRASHCAN_SAFE_BEGIN must be replaced with _PyThreadState_Current

#define Py_TRASHCAN_SAFE_BEGIN(op) \
    do { \
        PyThreadState *_tstate = _PyThreadState_Current; \
msg248453 - (view) Author: Stian Lode (Stian Lode) Date: 2015-08-12 11:38
I'm seeing this bug in Python 3.4.2 as well, and the patch here (tstate_trashcan.patch) appears to fix it. I'm using boost 1.57.0, and Python was compiled on a vanilla rhel6 system.
msg248473 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-08-12 20:46
Can anyone else confirm this bug in 3.4?
msg248531 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-08-13 17:33
> I'm seeing this bug in Python 3.4.2 as well, and the patch here (tstate_trashcan.patch) appears to fix it.

What is the context? Some specific C code?
History
Date User Action Args
2015-08-13 17:33:34pitrousetmessages: + msg248531
2015-08-12 20:46:15larrysetmessages: + msg248473
2015-08-12 11:38:39Stian Lodesetfiles: + bt.txt
versions: + Python 3.4, - Python 2.7
nosy: + Stian Lode, larry

messages: + msg248453
2015-04-18 12:06:50m_pythonsetnosy: + m_python
messages: + msg241412
2014-08-01 17:42:14flex.plexicosetnosy: + flex.plexico
messages: + msg224511
2013-04-15 19:26:41lemburgsetmessages: + msg187020
2013-04-15 19:21:03pitrousetstatus: open -> closed
resolution: fixed
messages: + msg187016

stage: commit review -> resolved
2013-04-15 19:20:26python-devsetnosy: + python-dev
messages: + msg187015
2013-04-15 19:13:19pitrousettype: crash
stage: commit review
2013-04-15 19:13:11pitrousettitle: Trash can mechanism segfault during interpreter finalization in Python 2.7.4 -> Trashcan mechanism segfault during interpreter finalization in Python 2.7.4
2013-04-15 11:20:45lemburgsetmessages: + msg186975
2013-04-14 13:26:43benjamin.petersonsetmessages: + msg186919
2013-04-14 11:59:43lemburgsetmessages: + msg186915
2013-04-14 11:40:55pitrousetpriority: normal -> release blocker
nosy: + benjamin.peterson
2013-04-12 18:00:15pitrousetfiles: + tstate_trashcan.patch
keywords: + patch
messages: + msg186656
2013-04-12 15:48:44pitrousetmessages: + msg186647
2013-04-12 15:41:58lemburgsetmessages: + msg186646
2013-04-12 15:39:27pitrousetmessages: + msg186645
2013-04-12 15:32:09pitrousetmessages: + msg186644
2013-04-12 14:23:23lemburgsetmessages: + msg186636
2013-04-12 14:00:40amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg186630
2013-04-12 13:47:32lemburgsetmessages: + msg186629
2013-04-12 13:45:37lemburgsetmessages: + msg186628
2013-04-12 13:43:12lemburgcreate