msg144656 - (view) |
Author: Victor Semionov (vsemionov) |
Date: 2011-09-29 22:42 |
Hello,
I'm developing a multi-threaded TCP server and have been seeing segmentation faults on 3.2 on Linux and 3.2.2 on Windows. This happens when using only pure-Python libraries, so I believe the problem is in the interpreter. The issue is very easy to reproduce with my code, but I think it is obscure, because I have not been able to reproduce it with a smaller program.
Here's what happens. The server accepts TCP connections, and creates a thread for every new connection. When the client sends a request, the server initiates its own TCP connection to a database. If any socket IO operation fails by raising a socket error (e.g. the database is down), those errors are caught by the calling code, and it gracefully terminates the thread. However, when the next client connects and sends a request, even if the server-initiated connections are successfully established, the interpreter crashes a bit later during the processing of the client's request (I think during IO operations).
Strangely, this does not occur if the thread recovers and does not terminate after catching an exception (as the case with failed redis connections). Also, I was able to port my program to python 2.7, and it did not crash.
To reproduce, you will need pg8000, which is a pure-python dbapi driver. You will need to get my program, wordbase, from the mercurial repository at https://bitbucket.org/vsemionov/wordbase (changeset 31c6554e67ee) and edit src/wordbase/db/pgsql.py. Change "import psycopg2 as dbapi" to "import pg8000.dbapi as dbapi". This is just to ensure that no C-based library is used. Steps to reproduce:
0. Ensure postgres is not running
1. Start wordbase with src/wordbase/wordbase.py -f <conf_file>. Use the path to the provided sample conf file at src/wordbase/wordbase.conf. By default you'll need to be root, in order to be able to create a log file.
2. Connect a client with "telnet localhost 2628" and enter "d hello". This should fail with status 420. Reconnect and repeat the same step a couple of times. The interpreter usually crashes after repeating this step.
I'm providing the interpreter's backtrace, which is obtained from Python 3.2 on Linux. It is attached in a separate file.
If you need any other information, please let me know.
Best regards,
Victor Semionov
|
msg144704 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-09-30 20:35 |
Confirmed with default.
The problem is that the TextIOWrapper gets collected after the underlying BufferedRWPair has been cleared (tp_clear) by the garbage collector: when _PyIOBase_finalize() is called for the TextIOWrapper, it checks if the textio is closed, which indirectly checks if the underlying rwpair is closed:
"""
static PyObject *
bufferedrwpair_closed_get(rwpair *self, void *context)
{
return PyObject_GetAttr((PyObject *) self->writer, _PyIO_str_closed);
}
"""
Since self->writer has already been set to NULL by bufferedrwpair_clear(), PyObject_GetAttr() segfaults.
@Victor
Could you try the patch attached?
|
msg144706 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-09-30 21:03 |
With test.
|
msg144710 - (view) |
Author: Victor Semionov (vsemionov) |
Date: 2011-09-30 23:54 |
Thanks Charles-François,
I tested your patch with "make test" and with my program. Both work fine.
|
msg144728 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-10-01 12:02 |
Shouldn't the test use "self.BufferedRWPair" instead of "io.BufferedRWPair"?
Also, is it ok to just return NULL or should the error state also be set?
|
msg144744 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-10-01 17:47 |
> Shouldn't the test use "self.BufferedRWPair" instead of
> "io.BufferedRWPair"?
Yes.
> Also, is it ok to just return NULL or should the error state also be
> set?
Well, I'm not sure, that why I made you and Amaury noisy :-)
AFAICT, this is the only case where _check_closed can encounter a NULL self->writer. And this specific situation is not an error (nothing prevents the rwpair from being garbaged collected before the textio) ,and _PyIOBase_finalize() explicitely clears any error returned:
"""
/* If `closed` doesn't exist or can't be evaluated as bool, then the
object is probably in an unusable state, so ignore. */
res = PyObject_GetAttr(self, _PyIO_str_closed);
if (res == NULL)
PyErr_Clear();
else {
closed = PyObject_IsTrue(res);
Py_DECREF(res);
if (closed == -1)
PyErr_Clear();
}
"""
Furthermore, I'm not sure about what kind of error would make sense here.
|
msg144838 - (view) |
Author: Victor Semionov (vsemionov) |
Date: 2011-10-03 19:12 |
Any plans to fix this in the next release?
|
msg144872 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-10-04 10:42 |
> > Also, is it ok to just return NULL or should the error state also be
> > set?
>
> Well, I'm not sure, that why I made you and Amaury noisy :-)
> AFAICT, this is the only case where _check_closed can encounter a NULL
> self->writer.
Probably. OTOH, not setting the error state when returning NULL is
usually an error (and can result in difficult-to-debug problems), so
let's stay on the safe side.
> Furthermore, I'm not sure about what kind of error would make sense here.
RuntimeError perhaps.
|
msg144876 - (view) |
Author: Victor Semionov (vsemionov) |
Date: 2011-10-04 11:07 |
> Probably. OTOH, not setting the error state when returning NULL is
> usually an error (and can result in difficult-to-debug problems), so
> let's stay on the safe side.
>
> > Furthermore, I'm not sure about what kind of error would make sense here.
>
> RuntimeError perhaps.
Does that mean that an application will see a Python exception?
|
msg144878 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2011-10-04 11:35 |
An "unraisable exception" warning will be displayed.
|
msg144879 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-10-04 11:35 |
> Probably. OTOH, not setting the error state when returning NULL is
> usually an error (and can result in difficult-to-debug problems), so
> let's stay on the safe side.
> RuntimeError perhaps.
OK, I'll update the patch accordingly.
> Does that mean that an application will see a Python exception?
No, the finalization code explicitly clears any exception set.
|
msg144955 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-10-05 16:27 |
Sorry, forgot about this issue...
Updated patch (I'm not really satisfied with the error message, don't
hesitate if you can think of a better wording).
|
msg144963 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-10-05 17:22 |
The latest patch looks good to me.
As for the error message, how about "the BufferedRWPair object is being garbage-collected".
|
msg144965 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-10-05 17:53 |
New changeset d60c00015f01 by Charles-François Natali in branch '3.2':
Issue #13070: Fix a crash when a TextIOWrapper caught in a reference cycle
http://hg.python.org/cpython/rev/d60c00015f01
New changeset 7defc1e5d13a by Charles-François Natali in branch 'default':
Issue #13070: Fix a crash when a TextIOWrapper caught in a reference cycle
http://hg.python.org/cpython/rev/7defc1e5d13a
|
msg144967 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-10-05 18:44 |
Committed to 3.2 and default.
Victor, thanks for the report!
|
msg144968 - (view) |
Author: Victor Semionov (vsemionov) |
Date: 2011-10-05 19:25 |
Great, thanks to you too, for fixing it!
|
msg144980 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-10-06 00:50 |
The issue doesn't affect Python 2.7?
|
msg144984 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-10-06 06:53 |
> The issue doesn't affect Python 2.7?
>
Duh!
I was sure the _io module had been introduced in Python 3 (I/O layer
rewrite, etc).
Yes, it does apply to 2.7. I'll commit the patch later today.
|
msg144987 - (view) |
Author: Victor Semionov (vsemionov) |
Date: 2011-10-06 10:15 |
I did not see any segfaults when I ran my app on 2.7. Please verify that 2.7 is really affected before making changes.
|
msg145001 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2011-10-06 11:45 |
Your application does not segfault with 2.7 because buffered files and sockets use a very different implementation.
The io module is present in all versions, but only Python3 uses it for all file-like objects.
If the unit test (test_rwpair_cleared_before_textio) crashes 2.7, the fix should be applied.
|
msg145025 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-10-06 17:07 |
New changeset 89b9e4bf6f1f by Charles-François Natali in branch '2.7':
Issue #13070: Fix a crash when a TextIOWrapper caught in a reference cycle
http://hg.python.org/cpython/rev/89b9e4bf6f1f
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:22 | admin | set | github: 57279 |
2011-10-06 17:07:41 | python-dev | set | messages:
+ msg145025 |
2011-10-06 11:45:56 | amaury.forgeotdarc | set | messages:
+ msg145001 |
2011-10-06 10:15:28 | vsemionov | set | messages:
+ msg144987 |
2011-10-06 06:53:25 | neologix | set | messages:
+ msg144984 |
2011-10-06 00:50:47 | vstinner | set | nosy:
+ vstinner messages:
+ msg144980
|
2011-10-05 19:25:04 | vsemionov | set | messages:
+ msg144968 |
2011-10-05 18:44:09 | neologix | set | status: open -> closed resolution: fixed messages:
+ msg144967
stage: resolved |
2011-10-05 17:53:31 | python-dev | set | nosy:
+ python-dev messages:
+ msg144965
|
2011-10-05 17:22:48 | pitrou | set | messages:
+ msg144963 |
2011-10-05 16:27:18 | neologix | set | files:
+ buffered_closed_gc-3.diff
messages:
+ msg144955 |
2011-10-04 11:35:50 | neologix | set | messages:
+ msg144879 |
2011-10-04 11:35:11 | amaury.forgeotdarc | set | messages:
+ msg144878 |
2011-10-04 11:07:02 | vsemionov | set | messages:
+ msg144876 |
2011-10-04 10:42:13 | pitrou | set | messages:
+ msg144872 |
2011-10-03 19:12:36 | vsemionov | set | messages:
+ msg144838 |
2011-10-01 17:47:18 | neologix | set | files:
- buffered_closed_gc-1.diff |
2011-10-01 17:47:03 | neologix | set | files:
+ buffered_closed_gc-2.diff
messages:
+ msg144744 |
2011-10-01 12:02:48 | pitrou | set | messages:
+ msg144728 |
2011-09-30 23:54:17 | vsemionov | set | messages:
+ msg144710 |
2011-09-30 21:11:42 | neologix | set | files:
+ buffered_closed_gc-1.diff |
2011-09-30 21:11:22 | neologix | set | files:
- buffered_closed_gc-1.diff |
2011-09-30 21:11:12 | neologix | set | files:
- buffered_closed_gc.diff |
2011-09-30 21:03:28 | neologix | set | files:
+ buffered_closed_gc-1.diff
messages:
+ msg144706 |
2011-09-30 20:35:43 | neologix | set | files:
+ buffered_closed_gc.diff versions:
+ Python 3.3 nosy:
+ amaury.forgeotdarc, neologix, pitrou
messages:
+ msg144704
keywords:
+ patch |
2011-09-29 22:42:44 | vsemionov | create | |