classification
Title: sock_dealloc() may call __repr__ when socket class is already collected by GC
Type: crash Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: asterite, iritkatriel, vstinner, xiang.zhang
Priority: normal Keywords:

Created on 2017-05-03 16:44 by asterite, last changed 2021-06-15 12:57 by asterite. This issue is now closed.

Files
File name Uploaded Description Edit
trace-verbose.txt asterite, 2017-05-03 16:44 Verbose stack trace with python object printed by python-gdb.py
test.py asterite, 2017-05-03 16:46 A reduced program to reproduce an issue. Depends on the second file (a module) test2.py
test2.py asterite, 2017-05-03 16:47 Second part of a reduced PoC (a module)
Messages (6)
msg292924 - (view) Author: asterite (asterite) Date: 2017-05-03 16:44
I faced a crash during iterpreter shutdown (Py_Finalize), in final garbage collection phase.

The reason, as far as I could understand, was the following:

In Python 3, when an unclosed socket is destroyed, it issues a ResourceWarning, which contains a string representation of socket object (obtained by __repr__).

If socket class object and unclosed socket object (instance) are in a reference cycle, GC may collect socket class object before it collects an instance when dealing with that cycle. Then, when instance is collected, sock_dealloc() will be called, which will call PyErr_WarnFormat with %R format specifier, which triggers a call to __repr__:

static void
sock_dealloc(PySocketSockObject *s)
{
    if (s->sock_fd != INVALID_SOCKET) {
        // ...
        if (PyErr_WarnFormat(PyExc_ResourceWarning, 1,
                             "unclosed %R", s))
        //...
}

At this moment, socket class object was already deallocated, so was __repr__, resides in it. Particularly, in my tests, when __repr__ attempted to access a global variable (getattr), it tried to take it's name from co_names tuple in it's code object. This tuple was deallocated, and co_names pointed to tuple free list. Trying to use a tuple from free list as a dict key resulted in a crash (more detailed stack trace will be provided in attachment):

# gdb python3.5 test.py
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
...
Starting program: /usr/local/bin/python3.5 test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
test

Program received signal SIGSEGV, Segmentation fault.
0x000000000048998c in PyObject_Hash (v=0x0) at Objects/object.c:761
761         PyTypeObject *tp = Py_TYPE(v);
(gdb) bt
#0  0x000000000048998c in PyObject_Hash (v=0x0) at Objects/object.c:761
#1  0x0000000000498d96 in tuplehash (v=0x7ffff7f39948) at Objects/tupleobject.c:351
#2  0x00000000004899b2 in PyObject_Hash (v=0x7ffff7f39948) at Objects/object.c:763
... tuplehash and PyObject_Hash here also ...
#34 0x00000000004899b2 in PyObject_Hash (v=0x7ffff6801708) at Objects/object.c:763
#35 0x0000000000498d96 in tuplehash (v=0x7ffff6812ca8) at Objects/tupleobject.c:351
#36 0x00000000004899b2 in PyObject_Hash (v=0x7ffff6812ca8) at Objects/object.c:763
#37 0x000000000047a056 in PyDict_GetItemWithError (op=0x7ffff7e83608, key=0x7ffff6812ca8) at Objects/dictobject.c:1143
#38 0x000000000047a1e8 in _PyDict_LoadGlobal (globals=0x7ffff7e83608, builtins=0x7ffff6ea8a88, key=0x7ffff6812ca8) at Objects/dictobject.c:1192
#39 0x0000000000509117 in PyEval_EvalFrameEx (f=0xa196d8, throwflag=0) at Python/ceval.c:2408
#40 0x0000000000511d0b in _PyEval_EvalCodeWithName (_co=0x7ffff7e85270, globals=0x7ffff7e83608, locals=0x0, args=0x7ffff7f07c68, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4071
#41 0x0000000000511e00 in PyEval_EvalCodeEx (_co=0x7ffff7e85270, globals=0x7ffff7e83608, locals=0x0, args=0x7ffff7f07c68, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4092
#42 0x00000000005db41c in function_call (func=0x7ffff683ec80, arg=0x7ffff7f07c50, kw=0x0) at Objects/funcobject.c:627
#43 0x000000000043ae8f in PyObject_Call (func=0x7ffff683ec80, arg=0x7ffff7f07c50, kw=0x0) at Objects/abstract.c:2166
#44 0x00000000005cb13c in method_call (func=0x7ffff683ec80, arg=0x7ffff7f07c50, kw=0x0) at Objects/classobject.c:330
#45 0x000000000043ae8f in PyObject_Call (func=0x7ffff7eaca48, arg=0x7ffff7fa8048, kw=0x0) at Objects/abstract.c:2166
#46 0x00000000005138a9 in PyEval_CallObjectWithKeywords (func=0x7ffff7eaca48, arg=0x7ffff7fa8048, kw=0x0) at Python/ceval.c:4633
#47 0x00000000004abf3e in slot_tp_repr (self=0x7ffff684e8d0) at Objects/typeobject.c:5992
#48 0x0000000000488ef9 in PyObject_Repr (v=0x7ffff684e8d0) at Objects/object.c:482
#49 0x00000000004c03fb in unicode_fromformat_arg (writer=0x7fffffffdeb0, f=0x7ffff6ed9850 "R", vargs=0x7fffffffdef0) at Objects/unicodeobject.c:2645
#50 0x00000000004c0754 in PyUnicode_FromFormatV (format=0x7ffff6ed9846 "unclosed %R", vargs=0x7fffffffdf60) at Objects/unicodeobject.c:2710
#51 0x00000000004f90e7 in PyErr_WarnFormat (category=0x8a1a20 <_PyExc_ResourceWarning>, stack_level=1, format=0x7ffff6ed9846 "unclosed %R") at Python/_warnings.c:895
#52 0x00007ffff6ed438a in sock_dealloc (s=0x7ffff684e8d0) at /root/test/test35/cpython/Modules/socketmodule.c:4177
#53 0x000000000049d02c in subtype_dealloc (self=0x7ffff684e8d0) at Objects/typeobject.c:1209
#54 0x00000000004781c5 in free_keys_object (keys=0xa3fd90) at Objects/dictobject.c:351
#55 0x000000000047a908 in PyDict_Clear (op=0x7ffff7f55448) at Objects/dictobject.c:1388
#56 0x000000000047df02 in dict_tp_clear (op=0x7ffff7f55448) at Objects/dictobject.c:2595
#57 0x00000000005679de in delete_garbage (collectable=0x7fffffffe2f0, old=0x8ce440 <generations+64>) at Modules/gcmodule.c:866
#58 0x0000000000567e30 in collect (generation=2, n_collected=0x0, n_uncollectable=0x0, nofail=1) at Modules/gcmodule.c:1014
#59 0x0000000000568e84 in _PyGC_CollectNoFail () at Modules/gcmodule.c:1605
#60 0x0000000000532023 in PyImport_Cleanup () at Python/import.c:481
#61 0x0000000000542083 in Py_Finalize () at Python/pylifecycle.c:576
#62 0x0000000000420fdf in Py_Main (argc=2, argv=0x92d010) at Modules/main.c:798
#63 0x000000000041bc4f in main (argc=2, argv=0x7fffffffe748) at ./Programs/python.c:65
(gdb)

So this is something like a use-after-free.

I think the root cause is that finalizer that can call some python code is called at deallocation phase, interleaving with object's deletion. As well as with objects with __del__ method, GC can't know the order in which objects should be deleted if they are in a reference cycle.

This problem does not exist in 3.6 because there is a separate finalizing method sock_finalize there where a ResourceWarning is issued, and such finalizers are called for the whole reference cycle before any object in it is deallocated (at least this is how I understood, correct me if I'm wrong). So, this change with finalizers probably should be backported to previous versions of Python 3.

I also could not reproduce this in 3.3, but I didn't have time to research that closely yet.
msg292928 - (view) Author: asterite (asterite) Date: 2017-05-03 17:00
I made up a small program to reproduce an issue, it consists of two files, test.py and test2.py, the second one is imported as a module by the first one, so they should be but in the same dir.

# ls
test.py  test2.py
# python3.5 test.py 
test
Segmentation fault

The files contain a rather weird combination of references to objects put in other objects or to modules of standard lib. This is what I got after reducing the code to reproduce the problem. All those references are probably needed to make a cycle and to postpone it's deallocation until the interpreter shutdown.

Regarding where in the real world such a code exists: it was spread by a combination of modules, namely tornado (https://github.com/tornadoweb/tornado), pyzmq (https://github.com/zeromq/pyzmq) and circus (https://github.com/circus-tent/circus), I found the problem when running one of the tests of circus:

~/circus# python3.5 -m nose -v circus.tests.test_reloadconfig:TestConfig.test_reload_ignorearbiterwatchers
test_reload_ignorearbiterwatchers (circus.tests.test_reloadconfig.TestConfig) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.037s

OK
Segmentation fault

Also, sometimes that test fails like this:
# python3.5 -m nose -v circus.tests.test_reloadconfig:TestConfig.test_reload_ignorearbiterwatchers
test_reload_ignorearbiterwatchers (circus.tests.test_reloadconfig.TestConfig) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.021s

OK
Fatal Python error: GC object already tracked

Current thread 0x00007f20aa7b6700 (most recent call first):
Aborted
msg293230 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-05-08 10:50
Checked with asterite's scripts and it crashes in 3.5 but not in 3.6 and master. Victor, your patch in #26590 seems the resolution of this failure.
msg293497 - (view) Author: asterite (asterite) Date: 2017-05-11 14:46
Sorry if I misunderstood how the bug tracker works, but why did you delete version 3.4 ? Python 3.4 is also affected.
msg293500 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-11 15:23
> Sorry if I misunderstood how the bug tracker works, but why did you delete version 3.4 ? Python 3.4 is also affected.

Right, but Python 3.4 doesn't accept bug fixes anymore:
https://docs.python.org/devguide/#status-of-python-branches
msg395864 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-06-15 09:12
Since this is fixed in 3.6 and it’s too late for 3.5, I think this issue can be closed.
History
Date User Action Args
2021-06-15 12:57:36asteritesetstatus: pending -> closed
stage: resolved
2021-06-15 09:12:04iritkatrielsetstatus: open -> pending

nosy: + iritkatriel
messages: + msg395864

resolution: out of date
2017-05-11 15:23:59vstinnersetmessages: + msg293500
2017-05-11 14:46:20asteritesetmessages: + msg293497
2017-05-08 10:50:59xiang.zhangsetnosy: + xiang.zhang

messages: + msg293230
versions: - Python 3.4
2017-05-03 17:00:39asteritesetmessages: + msg292928
2017-05-03 16:47:34vstinnersetnosy: + vstinner
2017-05-03 16:47:13asteritesetfiles: + test2.py
2017-05-03 16:46:29asteritesetfiles: + test.py
2017-05-03 16:44:02asteritecreate