This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: segmentation fault in pure-python multi-threaded server
Type: crash Stage: resolved
Components: Interpreter Core, IO Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, neologix, pitrou, python-dev, vsemionov, vstinner
Priority: normal Keywords: patch

Created on 2011-09-29 22:42 by vsemionov, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
backtrace vsemionov, 2011-09-29 22:42 interpreter backtrace
buffered_closed_gc-2.diff neologix, 2011-10-01 17:47 review
buffered_closed_gc-3.diff neologix, 2011-10-05 16:27 review
Messages (21)
msg144656 - (view) Author: Victor Semionov (vsemionov) Date: 2011-09-29 22:42
Hello,

I'm developing a multi-threaded TCP server and have been seeing segmentation faults on 3.2 on Linux and 3.2.2 on Windows. This happens when using only pure-Python libraries, so I believe the problem is in the interpreter. The issue is very easy to reproduce with my code, but I think it is obscure, because I have not been able to reproduce it with a smaller program.

Here's what happens. The server accepts TCP connections, and creates a thread for every new connection. When the client sends a request, the server initiates its own TCP connection to a database. If any socket IO operation fails by raising a socket error (e.g. the database is down), those errors are caught by the calling code, and it gracefully terminates the thread. However, when the next client connects and sends a request, even if the server-initiated connections are successfully established, the interpreter crashes a bit later during the processing of the client's request (I think during IO operations).

Strangely, this does not occur if the thread recovers and does not terminate after catching an exception (as the case with failed redis connections). Also, I was able to port my program to python 2.7, and it did not crash.

To reproduce, you will need pg8000, which is a pure-python dbapi driver. You will need to get my program, wordbase, from the mercurial repository at https://bitbucket.org/vsemionov/wordbase (changeset 31c6554e67ee) and edit src/wordbase/db/pgsql.py. Change "import psycopg2 as dbapi" to "import pg8000.dbapi as dbapi". This is just to ensure that no C-based library is used. Steps to reproduce:
0. Ensure postgres is not running
1. Start wordbase with src/wordbase/wordbase.py -f <conf_file>. Use the path to the provided sample conf file at src/wordbase/wordbase.conf. By default you'll need to be root, in order to be able to create a log file.
2. Connect a client with "telnet localhost 2628" and enter "d hello". This should fail with status 420. Reconnect and repeat the same step a couple of times. The interpreter usually crashes after repeating this step.

I'm providing the interpreter's backtrace, which is obtained from Python 3.2 on Linux. It is attached in a separate file.

If you need any other information, please let me know.

Best regards,
Victor Semionov
msg144704 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-30 20:35
Confirmed with default.
The problem is that the TextIOWrapper gets collected after the underlying BufferedRWPair has been cleared (tp_clear) by the garbage collector: when _PyIOBase_finalize() is called for the TextIOWrapper, it checks if the textio is closed, which indirectly checks if the underlying rwpair is closed:
"""
static PyObject *
bufferedrwpair_closed_get(rwpair *self, void *context)
{
    return PyObject_GetAttr((PyObject *) self->writer, _PyIO_str_closed);
}
"""

Since self->writer has already been set to NULL by bufferedrwpair_clear(), PyObject_GetAttr() segfaults.

@Victor
Could you try the patch attached?
msg144706 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-30 21:03
With test.
msg144710 - (view) Author: Victor Semionov (vsemionov) Date: 2011-09-30 23:54
Thanks Charles-François,

I tested your patch with "make test" and with my program. Both work fine.
msg144728 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-01 12:02
Shouldn't the test use "self.BufferedRWPair" instead of "io.BufferedRWPair"?
Also, is it ok to just return NULL or should the error state also be set?
msg144744 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-01 17:47
> Shouldn't the test use "self.BufferedRWPair" instead of
> "io.BufferedRWPair"?

Yes.

> Also, is it ok to just return NULL or should the error state also be
> set?

Well, I'm not sure, that why I made you and Amaury noisy :-)
AFAICT, this is the only case where _check_closed can encounter a NULL self->writer. And this specific situation is not an error (nothing prevents the rwpair from being garbaged collected before the textio) ,and _PyIOBase_finalize() explicitely clears any error returned:
"""
    /* If `closed` doesn't exist or can't be evaluated as bool, then the
       object is probably in an unusable state, so ignore. */
    res = PyObject_GetAttr(self, _PyIO_str_closed);
    if (res == NULL)
        PyErr_Clear();
    else {
        closed = PyObject_IsTrue(res);
        Py_DECREF(res);
        if (closed == -1)
            PyErr_Clear();
    }
"""

Furthermore, I'm not sure about what kind of error would make sense here.
msg144838 - (view) Author: Victor Semionov (vsemionov) Date: 2011-10-03 19:12
Any plans to fix this in the next release?
msg144872 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-04 10:42
> > Also, is it ok to just return NULL or should the error state also be
> > set?
> 
> Well, I'm not sure, that why I made you and Amaury noisy :-)
> AFAICT, this is the only case where _check_closed can encounter a NULL
> self->writer.

Probably. OTOH, not setting the error state when returning NULL is
usually an error (and can result in difficult-to-debug problems), so
let's stay on the safe side.

> Furthermore, I'm not sure about what kind of error would make sense here.

RuntimeError perhaps.
msg144876 - (view) Author: Victor Semionov (vsemionov) Date: 2011-10-04 11:07
> Probably. OTOH, not setting the error state when returning NULL is
> usually an error (and can result in difficult-to-debug problems), so
> let's stay on the safe side.
> 
> > Furthermore, I'm not sure about what kind of error would make sense here.
> 
> RuntimeError perhaps.

Does that mean that an application will see a Python exception?
msg144878 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-10-04 11:35
An "unraisable exception" warning will be displayed.
msg144879 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-04 11:35
> Probably. OTOH, not setting the error state when returning NULL is
> usually an error (and can result in difficult-to-debug problems), so
> let's stay on the safe side.

> RuntimeError perhaps.

OK, I'll update the patch accordingly.

> Does that mean that an application will see a Python exception?

No, the finalization code explicitly clears any exception set.
msg144955 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-05 16:27
Sorry, forgot about this issue...
Updated patch (I'm not really satisfied with the error message, don't
hesitate if you can think of a better wording).
msg144963 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-05 17:22
The latest patch looks good to me.
As for the error message, how about "the BufferedRWPair object is being garbage-collected".
msg144965 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-05 17:53
New changeset d60c00015f01 by Charles-François Natali in branch '3.2':
Issue #13070: Fix a crash when a TextIOWrapper caught in a reference cycle
http://hg.python.org/cpython/rev/d60c00015f01

New changeset 7defc1e5d13a by Charles-François Natali in branch 'default':
Issue #13070: Fix a crash when a TextIOWrapper caught in a reference cycle
http://hg.python.org/cpython/rev/7defc1e5d13a
msg144967 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-05 18:44
Committed to 3.2 and default.

Victor, thanks for the report!
msg144968 - (view) Author: Victor Semionov (vsemionov) Date: 2011-10-05 19:25
Great, thanks to you too, for fixing it!
msg144980 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-06 00:50
The issue doesn't affect Python 2.7?
msg144984 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-06 06:53
> The issue doesn't affect Python 2.7?
>

Duh!
I was sure the _io module had been introduced in Python 3 (I/O layer
rewrite, etc).
Yes, it does apply to 2.7. I'll commit the patch later today.
msg144987 - (view) Author: Victor Semionov (vsemionov) Date: 2011-10-06 10:15
I did not see any segfaults when I ran my app on 2.7. Please verify that 2.7 is really affected before making changes.
msg145001 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-10-06 11:45
Your application does not segfault with 2.7 because buffered files and sockets use a very different implementation.
The io module is present in all versions, but only Python3 uses it for all file-like objects.
If the unit test (test_rwpair_cleared_before_textio) crashes 2.7, the fix should be applied.
msg145025 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-06 17:07
New changeset 89b9e4bf6f1f by Charles-François Natali in branch '2.7':
Issue #13070: Fix a crash when a TextIOWrapper caught in a reference cycle
http://hg.python.org/cpython/rev/89b9e4bf6f1f
History
Date User Action Args
2022-04-11 14:57:22adminsetgithub: 57279
2011-10-06 17:07:41python-devsetmessages: + msg145025
2011-10-06 11:45:56amaury.forgeotdarcsetmessages: + msg145001
2011-10-06 10:15:28vsemionovsetmessages: + msg144987
2011-10-06 06:53:25neologixsetmessages: + msg144984
2011-10-06 00:50:47vstinnersetnosy: + vstinner
messages: + msg144980
2011-10-05 19:25:04vsemionovsetmessages: + msg144968
2011-10-05 18:44:09neologixsetstatus: open -> closed
resolution: fixed
messages: + msg144967

stage: resolved
2011-10-05 17:53:31python-devsetnosy: + python-dev
messages: + msg144965
2011-10-05 17:22:48pitrousetmessages: + msg144963
2011-10-05 16:27:18neologixsetfiles: + buffered_closed_gc-3.diff

messages: + msg144955
2011-10-04 11:35:50neologixsetmessages: + msg144879
2011-10-04 11:35:11amaury.forgeotdarcsetmessages: + msg144878
2011-10-04 11:07:02vsemionovsetmessages: + msg144876
2011-10-04 10:42:13pitrousetmessages: + msg144872
2011-10-03 19:12:36vsemionovsetmessages: + msg144838
2011-10-01 17:47:18neologixsetfiles: - buffered_closed_gc-1.diff
2011-10-01 17:47:03neologixsetfiles: + buffered_closed_gc-2.diff

messages: + msg144744
2011-10-01 12:02:48pitrousetmessages: + msg144728
2011-09-30 23:54:17vsemionovsetmessages: + msg144710
2011-09-30 21:11:42neologixsetfiles: + buffered_closed_gc-1.diff
2011-09-30 21:11:22neologixsetfiles: - buffered_closed_gc-1.diff
2011-09-30 21:11:12neologixsetfiles: - buffered_closed_gc.diff
2011-09-30 21:03:28neologixsetfiles: + buffered_closed_gc-1.diff

messages: + msg144706
2011-09-30 20:35:43neologixsetfiles: + buffered_closed_gc.diff
versions: + Python 3.3
nosy: + amaury.forgeotdarc, neologix, pitrou

messages: + msg144704

keywords: + patch
2011-09-29 22:42:44vsemionovcreate