msg186627 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-12 13:43 |
A user reported a segfault when using our mxURL extension with Python 2.7.4. Previous Python versions do not show this behavior, so it's new in Python 2.7.4.
The extension uses an Py_AtExit() function to clean up among other things a reference to a dictionary from another module. The dictionary deallocation then causes the segfault:
$ gdb /usr/local/bin/python2.7
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/bin/python2.7...done.
(gdb) r
Starting program: /usr/local/bin/python2.7
[Thread debugging using libthread_db enabled]
Python 2.7.4 (default, Apr 8 2013, 15:51:19)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mx.URL
>>> import sys
>>> sys.exit()
Program received signal SIGSEGV, Segmentation fault.
0x08091201 in dict_dealloc (mp=0xb7b2813c) at Objects/dictobject.c:1005
1005 Py_TRASHCAN_SAFE_BEGIN(mp)
(gdb) bt
#0 0x08091201 in dict_dealloc (mp=0xb7b2813c) at Objects/dictobject.c:1005
#1 0xb7875928 in mxURLModule_Cleanup () at mx/URL/mxURL/mxURL.c:2789
#2 0x0810553f in call_ll_exitfuncs () at Python/pythonrun.c:1763
#3 Py_Finalize () at Python/pythonrun.c:554
#4 0x08104bac in Py_Exit () at Python/pythonrun.c:1772
#5 handle_system_exit () at Python/pythonrun.c:1146
#6 0x0810517d in PyErr_PrintEx (set_sys_last_vars=<value optimized out>)
at Python/pythonrun.c:1156
#7 0x081058dd in PyRun_InteractiveOneFlags (fp=0xb7f8b420,
filename=0x815f9c0 "<stdin>", flags=0xbffffc4c) at Python/pythonrun.c:855
#8 0x08105a68 in PyRun_InteractiveLoopFlags (fp=0xb7f8b420,
filename=0x815f9c0 "<stdin>", flags=0xbffffc4c) at Python/pythonrun.c:772
#9 0x081062f2 in PyRun_AnyFileExFlags (fp=0xb7f8b420, filename=0x815f9c0
"<stdin>", closeit=0, flags=0xbffffc4c) at Python/pythonrun.c:741
#10 0x0805bb59 in Py_Main (argc=1, argv=0xbffffd34) at Modules/main.c:640
#11 0x0805abeb in main (argc=1, argv=0xbffffd34) at ./Modules/python.c:23
(gdb)
|
msg186628 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-12 13:45 |
After a closer look at recent checkins, I found this checking for the trash can mechanism: 5a2ef447b80d (ticket #13992).
This appears to be the cause:
1.20 #define Py_TRASHCAN_SAFE_BEGIN(op) \
1.21 - if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \
1.22 - ++_PyTrash_delete_nesting;
1.23 - /* The body of the deallocator is here. */
1.24 + do { \
1.25 + PyThreadState *_tstate = PyThreadState_GET(); \
1.26 + if (_tstate->trash_delete_nesting < PyTrash_UNWIND_LEVEL) { \
1.27 + ++_tstate->trash_delete_nesting;
1.28 + /* The body of the deallocator is here. */
At the time the Py_AtExit functions are called, the thread state
is NULL, so the if (_tstate->...) segfaults.
|
msg186629 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-12 13:47 |
I think a solution to the problem would be to check _tstate for NULL and only use it if it is non-NULL - without threads you don't need to keep track of them ;-)
|
msg186630 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2013-04-12 14:00 |
> At the time the Py_AtExit functions are called, the thread state is NULL
I'd say this is the root cause. It's a bad thing to call Py_DECREF without a thread state.
Was it the case in previous versions?
|
msg186636 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-12 14:23 |
On 12.04.2013 16:00, Amaury Forgeot d'Arc wrote:
>
> Amaury Forgeot d'Arc added the comment:
>
>> At the time the Py_AtExit functions are called, the thread state is NULL
>
> I'd say this is the root cause. It's a bad thing to call Py_DECREF without a thread state.
> Was it the case in previous versions?
The extension has been using this ever since it was written many
years ago and without any problems in all Python versions up, but not
including 2.7.4. It's the only way to do module cleanup in Python 2.x.
So far, we've only seen the problem for dictionaries that
are DECREFed.
I know that things may go wrong when DECREF'ing objects this late in
the finalization process, but still expect Python to handle errors
gracefully without segfaulting, as it has been the case for all
previous Python versions.
|
msg186644 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-12 15:32 |
Judging by the fact that the Py_AtExit funcs are called at the very end of Py_Finalize (after absolutely everything has been cleaned up), I think you shouldn't use any Python facilities at this point.
|
msg186645 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-12 15:39 |
By the way, why do you try to clean up the dictionary at this point? Any non-trivial deallocation code (e.g. __del__ method) will fail running.
|
msg186646 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-12 15:41 |
On 12.04.2013 17:32, Antoine Pitrou wrote:
>
> Judging by the fact that the Py_AtExit funcs are called at the very end of Py_Finalize (after absolutely everything has been cleaned up), I think you shouldn't use any Python facilities at this point.
As mentioned earlier: this is the only way to cleanup extension modules
in Python 2.x and the trash can patch broke this.
> By the way, why do you try to clean up the dictionary at this point? Any non-trivial deallocation
code (e.g. __del__ method) will fail running.
The dictionaries just contain simple Python types that don't
have __del__ methods, so deallocation works fine and it prevents
memory leaks when embedding the Python interpreter and restarting
it several times.
|
msg186647 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-12 15:48 |
> As mentioned earlier: this is the only way to cleanup extension
> modules
> in Python 2.x and the trash can patch broke this.
Well, the doc clearly says
"Since Python’s internal finalization will have completed before
the cleanup function, no Python APIs should be called by func".
Why wouldn't you clean up the dict (if it exists) at module
initialization instead? You will have a working Python
runtime at this point.
That said, we could hack a 2.7-specific hack to the patch, in
order to disable the trashcan mechanism at shutdown.
|
msg186656 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-12 18:00 |
Marc-André, does this patch work for you?
|
msg186915 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-14 11:59 |
On 12.04.2013 20:00, Antoine Pitrou wrote:
>
> Marc-André, does this patch work for you?
>
> Added file: http://bugs.python.org/file29791/tstate_trashcan.patch
Thanks, Antoine. I can try this tomorrow.
|
msg186919 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2013-04-14 13:26 |
Yes, let's not break thing in point releases.
|
msg186975 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-15 11:20 |
Checked the patch: it fixes the problem. Thanks.
Will this go into Python 2.7.5 ?
I'm asking because we need to issue a patch level release of egenix-mx-base and if Python 2.7.5 will fix the problem, we'll just add the work-around for Python 2.7.4.
Regarding a better cleanup logic: This is available in Python 3.x with the new module init logic and we'll use it there.
PS: The approach with doing the cleanup at module startup time won't work, because the still existing objects will reference parts of the already cleaned up interpreter instance.
|
msg187015 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-04-15 19:20 |
New changeset e814fbd470bf by Antoine Pitrou in branch '2.7':
Issue #17703: Fix a regression where an illegal use of Py_DECREF() after interpreter finalization can cause a crash.
http://hg.python.org/cpython/rev/e814fbd470bf
|
msg187016 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-15 19:21 |
Committed!
|
msg187020 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2013-04-15 19:26 |
On 15.04.2013 21:21, Antoine Pitrou wrote:
>
> Committed!
Cool, thanks for the quick turnaround.
|
msg224511 - (view) |
Author: Guillaume Matte (flex.plexico) |
Date: 2014-08-01 17:42 |
It does still cause a Fatal Error in debug build as PyThreadState_GET() error out if _PyThreadState_Current is NULL
|
msg241412 - (view) |
Author: Daniel (m_python) |
Date: 2015-04-18 12:06 |
Guillaume already mentioned this, its still causing a Fatal Error. To fix this PyThreadState_GET() in Py_TRASHCAN_SAFE_BEGIN must be replaced with _PyThreadState_Current
#define Py_TRASHCAN_SAFE_BEGIN(op) \
do { \
PyThreadState *_tstate = _PyThreadState_Current; \
|
msg248453 - (view) |
Author: Stian Lode (Stian Lode) |
Date: 2015-08-12 11:38 |
I'm seeing this bug in Python 3.4.2 as well, and the patch here (tstate_trashcan.patch) appears to fix it. I'm using boost 1.57.0, and Python was compiled on a vanilla rhel6 system.
|
msg248473 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2015-08-12 20:46 |
Can anyone else confirm this bug in 3.4?
|
msg248531 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2015-08-13 17:33 |
> I'm seeing this bug in Python 3.4.2 as well, and the patch here (tstate_trashcan.patch) appears to fix it.
What is the context? Some specific C code?
|
msg332500 - (view) |
Author: Trevor Joynson (trevorj) |
Date: 2018-12-25 00:46 |
I can verify this bug does still in fact exist in all 3.x versions I've tested (3.4-3.7).
It's been the cause of many long-standing segfaults we've been having at exit in applications that use OpenRAVE and/or boost::python::numeric.
|
msg332501 - (view) |
Author: Trevor Joynson (trevorj) |
Date: 2018-12-25 00:51 |
I can also confirm that the included patch does "workaround" the issue.
A big pain point here for me is in determining what exit handler is causing this.
GDB has been it's usual great help in determining information about the exact segfault, but the problem code itself is rather elusive since it's ran so much earlier and just causes this issue downstream later in execution. Unfortunately the extensions I'm dealing with also happen to take a lifetime to compile, exacerbated by the fact that they are dependent upon one another.
Can we please apply this to 3.x's tree?
|
msg333146 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-01-07 12:09 |
This issue is closed. Can you please open a new issue? Try to describe the bug that you have and what you attemped to fix it.
|
msg335478 - (view) |
Author: Jeroen Demeyer (jdemeyer) *  |
Date: 2019-02-13 21:32 |
See also https://bugs.python.org/issue35983 for another trashcan-related issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:44 | admin | set | github: 61903 |
2019-02-13 21:32:31 | jdemeyer | set | nosy:
+ jdemeyer messages:
+ msg335478
|
2019-01-07 12:09:18 | vstinner | set | messages:
+ msg333146 |
2018-12-25 00:51:35 | trevorj | set | messages:
+ msg332501 |
2018-12-25 00:46:46 | trevorj | set | nosy:
+ trevorj, ned.deily
messages:
+ msg332500 versions:
+ Python 3.6, - Python 3.4 |
2017-04-21 16:25:36 | vstinner | set | nosy:
+ vstinner
|
2015-08-13 17:33:34 | pitrou | set | messages:
+ msg248531 |
2015-08-12 20:46:15 | larry | set | messages:
+ msg248473 |
2015-08-12 11:38:39 | Stian Lode | set | files:
+ bt.txt versions:
+ Python 3.4, - Python 2.7 nosy:
+ Stian Lode, larry
messages:
+ msg248453
|
2015-04-18 12:06:50 | m_python | set | nosy:
+ m_python messages:
+ msg241412
|
2014-08-01 17:42:14 | flex.plexico | set | nosy:
+ flex.plexico messages:
+ msg224511
|
2013-04-15 19:26:41 | lemburg | set | messages:
+ msg187020 |
2013-04-15 19:21:03 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg187016
stage: commit review -> resolved |
2013-04-15 19:20:26 | python-dev | set | nosy:
+ python-dev messages:
+ msg187015
|
2013-04-15 19:13:19 | pitrou | set | type: crash stage: commit review |
2013-04-15 19:13:11 | pitrou | set | title: Trash can mechanism segfault during interpreter finalization in Python 2.7.4 -> Trashcan mechanism segfault during interpreter finalization in Python 2.7.4 |
2013-04-15 11:20:45 | lemburg | set | messages:
+ msg186975 |
2013-04-14 13:26:43 | benjamin.peterson | set | messages:
+ msg186919 |
2013-04-14 11:59:43 | lemburg | set | messages:
+ msg186915 |
2013-04-14 11:40:55 | pitrou | set | priority: normal -> release blocker nosy:
+ benjamin.peterson
|
2013-04-12 18:00:15 | pitrou | set | files:
+ tstate_trashcan.patch keywords:
+ patch messages:
+ msg186656
|
2013-04-12 15:48:44 | pitrou | set | messages:
+ msg186647 |
2013-04-12 15:41:58 | lemburg | set | messages:
+ msg186646 |
2013-04-12 15:39:27 | pitrou | set | messages:
+ msg186645 |
2013-04-12 15:32:09 | pitrou | set | messages:
+ msg186644 |
2013-04-12 14:23:23 | lemburg | set | messages:
+ msg186636 |
2013-04-12 14:00:40 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg186630
|
2013-04-12 13:47:32 | lemburg | set | messages:
+ msg186629 |
2013-04-12 13:45:37 | lemburg | set | messages:
+ msg186628 |
2013-04-12 13:43:12 | lemburg | create | |