classification
Title: Crash in clear_weakref
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, dabeaz, jsafrane, pitrou, vstinner
Priority: normal Keywords:

Created on 2013-05-07 07:38 by jsafrane, last changed 2019-10-22 23:09 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
full-bt.txt jsafrane, 2013-05-07 07:37 Full backtrace of the crash
full-bt.txt jsafrane, 2013-05-07 10:50 Full backtrace of the crash, now with --with-pydebug
Messages (20)
msg188629 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-07 07:37
I have Python 2.7.4 running on Fedora Rawhide and I get segmentation fault with following backtrace:

#0  0x00007f73f69ca5f1 in clear_weakref (self=0x7f73ff515c00) at Objects/weakrefobject.c:56
#1  weakref_dealloc (self=0x7f73ff515c00) at Objects/weakrefobject.c:106
#2  0x00007f73f698ea27 in PyList_SetItem (op=<optimized out>, i=<optimized out>, newitem=<optimized out>) at Objects/listobject.c:218
#3  0x00007f73f69ba9db in add_subclass (type=type@entry=0x7f73e00456b0, base=<optimized out>) at Objects/typeobject.c:4155
#4  0x00007f73f69c440e in PyType_Ready (type=type@entry=0x7f73e00456b0) at Objects/typeobject.c:4120
#5  0x00007f73f69c6d4b in type_new (metatype=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Objects/typeobject.c:2467
#6  0x00007f73f69be7d3 in type_call (type=0x7f73f6cdad00 <PyType_Type>, args=0x7f73f61e1550, kwds=0x0) at Objects/typeobject.c:725
#7  0x00007f73f6954833 in PyObject_Call (func=func@entry=0x7f73f6cdad00 <PyType_Type>, arg=arg@entry=0x7f73f61e1550, kw=kw@entry=0x0) at Objects/abstract.c:2529
#8  0x00007f73f69553c9 in PyObject_CallFunctionObjArgs (callable=callable@entry=0x7f73f6cdad00 <PyType_Type>) at Objects/abstract.c:2760
#9  0x00007f73f6a06bf3 in build_class (name=<optimized out>, bases=0x7f73f61e3910, methods=0x7f73e0045590) at Python/ceval.c:4632
#10 PyEval_EvalFrameEx (f=f@entry=0x7f73e0043a40, throwflag=throwflag@entry=0) at Python/ceval.c:1928
#11 0x00007f73f6a0b46d in PyEval_EvalCodeEx (co=co@entry=0x7f73f61f50b0, globals=globals@entry=0x7f73e003bf00, locals=locals@entry=0x7f73e003bf00, args=args@entry=0x0, argcount=argcount@entry=0, 
    kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=closure@entry=0x0) at Python/ceval.c:3253
#12 0x00007f73f6a0b5a2 in PyEval_EvalCode (co=co@entry=0x7f73f61f50b0, globals=globals@entry=0x7f73e003bf00, locals=locals@entry=0x7f73e003bf00) at Python/ceval.c:667
#13 0x00007f73f6a22cfc in PyImport_ExecCodeModuleEx (name=name@entry=0x7f73e003d760 "warnings", co=co@entry=0x7f73f61f50b0, 
    pathname=pathname@entry=0x7f73e003ac90 "/usr/local/lib/python2.7/warnings.pyc") at Python/import.c:709
#14 0x00007f73f6a2305e in load_source_module (name=0x7f73e003d760 "warnings", pathname=0x7f73e003ac90 "/usr/local/lib/python2.7/warnings.pyc", fp=<optimized out>) at Python/import.c:1099
#15 0x00007f73f6a23f59 in import_submodule (mod=mod@entry=0x7f73f6cd2ec0 <_Py_NoneStruct>, subname=subname@entry=0x7f73e003d760 "warnings", fullname=fullname@entry=0x7f73e003d760 "warnings")
    at Python/import.c:2700
#16 0x00007f73f6a24b93 in load_next (p_buflen=<synthetic pointer>, buf=0x7f73e003d760 "warnings", p_name=<synthetic pointer>, altmod=0x7f73f6cd2ec0 <_Py_NoneStruct>, 
    mod=0x7f73f6cd2ec0 <_Py_NoneStruct>) at Python/import.c:2515
#17 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f73f6cd2ec0 <_Py_NoneStruct>, globals=<optimized out>, name=0x0) at Python/import.c:2224
#18 PyImport_ImportModuleLevel (name=0x7f73ff54fbf4 "warnings", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f73f6cd2ec0 <_Py_NoneStruct>, level=<optimized out>)
    at Python/import.c:2288
#19 0x00007f73f6a033af in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
...
#61 0x00007f73f6a37dbc in initsite () at Python/pythonrun.c:721
#62 Py_InitializeEx (install_sigs=1) at Python/pythonrun.c:265

(full back trace is attached, it's quite long).

(gdb) py-bt
#10 Frame 0x7f4bf8043df0, for file /usr/lib64/python2.7/warnings.py, line 281, in <module> ()
    class WarningMessage(object):
#23 Frame 0x7f4bf803d300, for file /usr/lib64/python2.7/posixpath.py, line 17, in <module> ()
    import warnings
#36 Frame 0x7f4bf8024fc0, for file /usr/lib64/python2.7/os.py, line 49, in <module> ()
    import posixpath as path
#49 Frame 0x7f4bf801c520, for file /usr/lib64/python2.7/site.py, line 62, in <module> ()
    import os

I get the same crash with vanilla Python 2.7.4 without Fedora patches. Python 2.7.3 works well and doesn't crash.
msg188630 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-07 07:38
I can reproduce the crash in very unusual setup:
1. OpenPegasus (http://www.openpegasus.org/), for this bug we may consider it just a network daemon, listening on TCP port. When a request comes, it is eventually processed by a provider (= something like plugin).
2. cmpi-bindings ([1], [2]), which allows to write these plugins in Python
+ some other python modules, but IMHO not relevant (e.g. pywbem [3])

1: https://github.com/kkaempf/cmpi-bindings
2: http://sourceforge.net/apps/mediawiki/pywbem/index.php?title=Provider_Home
3: http://sourceforge.net/apps/mediawiki/pywbem/index.php?title=Main_Page

Now, if the Pegasus daemon gets a request, it calls cmpi-bindings, which creates embedded Python [4], loads the python "plugin", and processes the request. If the "plugin" is idle for 15 minutes, it is unloaded by Pegasus (= the embedded Python is destroyed). So far everything works like charm. But, when new request arrives *after* the unload, Pegaasus calls cmpi-bindings again, which tries to create the embedded Python for second time and here I get the crash.

[4]: python initialization/shutdown: https://github.com/kkaempf/cmpi-bindings/blob/master/src/target_python.c, TargetInitialize() and TargetCleanup(), some marcos are generated by swig from https://github.com/kkaempf/cmpi-bindings/blob/master/swig/cmpi.i

I haven't been able to reproduce the crash with simpler setup (and I have tried, believe me). It is also possible that the Python initialization/shutdown in cmpi-bindings is wrong, but I am not able to find any bug here.
msg188631 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-07 07:38
Bisecting Python mercurial repository, I found the patch which causes the crash:

changeset:   80762:7e771f0363e2
branch:      2.7
parent:      80758:29627bd5b333
user:        Antoine Pitrou <solipsis@pitrou.net>
date:        Sat Dec 08 21:15:26 2012 +0100
summary:     Issue #16602: When a weakref's target was part of a long deallocation chain, the object could remain reachable through its weakref even though its refcount had dropped to zero.

If I revert the patch in Python 2.7.4, my setup works fine, without any crash.
msg188636 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-07 08:54
Hi Jan,

First, have you seen the following message on that bug report:
http://bugs.python.org/issue16602#msg177180

Second, I would suggest you build a debug build of Python (./configure --with-pydebug), it should give you more information in the stack trace, and allow you to debug using gdb.
msg188639 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-07 10:50
> First, have you seen the following message on that bug report:
> http://bugs.python.org/issue16602#msg177180

I'm reading it now... I searched for PyWeakref_GET_OBJECT in cmpi-bindings and both occurrences generated by SWIG and both look safe.
Is it hidden/wrapped by any other macro?

Sorry, I don't know much about python internals and extension development, I'm not author of cmpi-bindings.

And I'm attaching stack trace with --with-pydebug. Debugging with gdb is quite a problem, I have gdb linked with distribution Python 2.7.4 and it doesn't cooperate with my custom built python, which I have in LD_LIBRARY_PATH (so Pegasus gets the right one when loading providers).
msg188643 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-07 12:11
> Debugging with gdb is quite a problem, I have gdb linked with distribution Python
> 2.7.4 and it doesn't cooperate with my custom built python, which I have in
> LD_LIBRARY_PATH

Ok. Still, you should be able to inspect the variables at the crash point. Could you try to inspect the `self` variable inside weakref_dealloc, especially `self->wr_object` and its Py_TYPE() value? Also, what is the value of Py_REFCNT(self->wr_object)?

AFAICT, the only reason GET_WEAKREFS_LISTPTR() may crash is because of an invalid Py_TYPE(). Which should never happen.
msg188659 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-07 14:09
> Could you try to inspect the `self` variable inside weakref_dealloc,
> especially `self->wr_object` and its Py_TYPE() value? Also, what is the
> value of Py_REFCNT(self->wr_object)?

in weakref_dealloc at Objects/weakrefobject.c:106:

(gdb) p *self
$1 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x7fdb8ffc91a0 <_PyWeakref_RefType>}

(gdb) p *((PyWeakReference*)self)
$7 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x7fdb8ffc91a0 <_PyWeakref_RefType>, wr_object = 0x7fdb9c30bc00 <swigpyobject_type.9541>, wr_callback = 0x0, hash = -1, 
  wr_prev = 0x0, wr_next = 0x0}

(gdb) p *((PyWeakReference*)self)->wr_object
$9 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x0}

If I am reading Py_TYPE right, Py_TYPE(self->wr_object) must be 0 (=NULL).

<swigpyobject_type.9541> seems to be PyTypeObject generated by SWIG in cmpi-bindings, I'll dig into it. Please let me know if there is anything suspicious or worth checking.
msg188660 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-07 14:51
> (gdb) p *((PyWeakReference*)self)->wr_object
> $9 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x0}
> 
> If I am reading Py_TYPE right, Py_TYPE(self->wr_object) must be 0
> (=NULL).

Well, no, it should *always* be non-NULL (and it's a strong reference,
so it should be a pointer to a valid PyTypeObject).
There's nothing in the CPython source code which sets Py_TYPE(something)
(i.e. something->ob_type) to NULL.
msg188661 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-07 14:52
swigpyobject_type is a static PyTypeObject variable (similar to all static PyTypeObject structures we write in extension modules, but inside a function)

It should never be deallocated... There may be a refcount issue with this object.
msg188663 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-07 15:06
> swigpyobject_type is a static PyTypeObject variable (similar to all
> static PyTypeObject structures we write in extension modules, but
> inside a function)
> 
> It should never be deallocated... There may be a refcount issue with
> this object.

Even if it's deallocated, the Py_TYPE of an instance should never
become NULL. At worse it may point to invalid memory.
msg188665 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-07 15:28
Right. But this is an embedded interpreter, and SWIG does not call PyType_Ready() again; the old type is returned instead.
msg188667 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-07 15:32
> But this is an embedded interpreter, and SWIG does not call
> PyType_Ready() again; the old type is returned instead.

Yuk. Perhaps Dave Beazley can give us some insights here?

Jan, one possibility would be for Pegasus to stop "unloading" Python, it seems.
msg188668 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-07 15:38
> Right. But this is an embedded interpreter, and SWIG does not call
> PyType_Ready() again; the old type is returned instead.

Python crashes in Py_Initialize(). SWIG_init() is called right after it.
So even if SWIG calls PyType_Ready, it would be too late.

Why python remembers SWIG types after Py_Finalize() in the first place?
I want to destroy it and start with fresh instance.

Jan
msg188670 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-07 15:54
Python remembers SWIG types because SWIG generates code like this:

PyTypeObject * SwigPyObject_TypeOnce(void) {
  static PyTypeObject swigpyobject_type;
  static int type_init = 0;
  if (!type_init) {
    // ... initialization code ...
    swigpyobject_type = tmp;
    type_init = 1;
    if (PyType_Ready(&swigpyobject_type) < 0)
      return NULL;
  }
}

SWIG should reset "type_init" on a fresh interpreter.
The initXxx() function should do this.
msg188673 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-07 16:06
> Why python remembers SWIG types after Py_Finalize() in the first
> place?
> I want to destroy it and start with fresh instance.

Because a significant amount of static data inside CPython actually
survives Py_Finalize :-/
msg188755 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-09 06:58
On 05/07/2013 06:06 PM, Antoine Pitrou wrote:
> a significant amount of static data inside CPython actually survives
> Py_Finalize :-/

As a solution, would it be possible to wipe all registered types in
Py_Finalize?

Jan
msg188756 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-09 07:07
On 05/07/2013 05:32 PM, Antoine Pitrou wrote:
> Jan, one possibility would be for Pegasus to stop "unloading" Python,
> it seems.

It is always possibility. Actually, Pegasus "plugin" is just a shared
object (.so) and the .so is linked with Python. Pegasus calls dlopen()
and dlclose() on it. After dlclose(), the "plugin" is removed from
memory. Unfortunately, libpython2.7.so stays loaded, at least
/proc/XXX/mems says so. If there was a way to unload libpython2.7.so
from memory too...

Jan
msg189227 - (view) Author: Jan Safranek (jsafrane) Date: 2013-05-14 15:28
On 05/09/2013 09:07 AM, Jan Safranek wrote:
> 
> Jan Safranek added the comment:
> 
> On 05/07/2013 05:32 PM, Antoine Pitrou wrote:
>> Jan, one possibility would be for Pegasus to stop "unloading" Python,
>> it seems.
> 
> It is always possibility. Actually, Pegasus "plugin" is just a shared
> object (.so) and the .so is linked with Python. Pegasus calls dlopen()
> and dlclose() on it. After dlclose(), the "plugin" is removed from
> memory. Unfortunately, libpython2.7.so stays loaded, at least
> /proc/XXX/mems says so. If there was a way to unload libpython2.7.so
> from memory too...

libpython2.7.so is not unloaded because python extensions, e.g.
/usr/lib64/python2.7/lib-dynload/_heapq.so depend on it. And _heapq.so
was dlopenened by Python and it was not dlclosed -> glibc does not
unload it.

It seems that Py_Finalize() does not even close opened shared objects.
Isn't it a bug?

Jan
msg189245 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-14 18:45
Le mardi 14 mai 2013 à 15:28 +0000, Jan Safranek a écrit :
> libpython2.7.so is not unloaded because python extensions, e.g.
> /usr/lib64/python2.7/lib-dynload/_heapq.so depend on it. And _heapq.so
> was dlopenened by Python and it was not dlclosed -> glibc does not
> unload it.
> 
> It seems that Py_Finalize() does not even close opened shared objects.
> Isn't it a bug?

What do you call shared objects in this context? .so files?
Indeed they are not closed, because usually extension modules are not
reload-safe: therefore, their basic structures are kept eternally once
initialized.
msg355174 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-22 23:09
There is no activity since 2013. I guess that the user found a way to workaround this crash, or managed to get it fixed. I close the issue.
History
Date User Action Args
2019-10-22 23:09:31vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg355174

resolution: out of date
stage: resolved
2013-05-14 18:45:14pitrousetmessages: + msg189245
2013-05-14 15:28:11jsafranesetmessages: + msg189227
2013-05-09 07:07:57jsafranesetmessages: + msg188756
2013-05-09 06:58:22jsafranesetmessages: + msg188755
2013-05-07 16:06:10pitrousetmessages: + msg188673
2013-05-07 15:54:16amaury.forgeotdarcsetmessages: + msg188670
2013-05-07 15:38:15jsafranesetmessages: + msg188668
2013-05-07 15:32:01pitrousetnosy: + dabeaz
messages: + msg188667
2013-05-07 15:28:50amaury.forgeotdarcsetmessages: + msg188665
2013-05-07 15:06:49pitrousetmessages: + msg188663
2013-05-07 14:52:51amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg188661
2013-05-07 14:51:10pitrousetmessages: + msg188660
2013-05-07 14:09:04jsafranesetmessages: + msg188659
2013-05-07 12:11:22pitrousetmessages: + msg188643
2013-05-07 10:50:35jsafranesetfiles: + full-bt.txt

messages: + msg188639
2013-05-07 08:54:58pitrousetmessages: + msg188636
2013-05-07 07:39:34ezio.melottisetnosy: + pitrou, benjamin.peterson
2013-05-07 07:38:41jsafranesetmessages: + msg188631
2013-05-07 07:38:28jsafranesetmessages: + msg188630
2013-05-07 07:38:00jsafranecreate