Title: segmentation fault in ast_for_atom
Type: crash Stage:
Components: Versions: Python 2.6
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, ot
Priority: high Keywords: patch

Created on 2008-11-20 17:36 by ot, last changed 2008-11-21 22:27 by benjamin.peterson. This issue is now closed.

File name Uploaded Description Edit
ast.patch ot, 2008-11-20 17:36
bad_unicodeerror.patch amaury.forgeotdarc, 2008-11-20 18:39
use_PyObject_Str_and_test.patch benjamin.peterson, 2008-11-21 03:42
use_PyObject_Str_and_test2.patch benjamin.peterson, 2008-11-21 03:48 the same thing without the reference leak :)
use_PyObject_Str_and_test3.patch benjamin.peterson, 2008-11-21 22:04
Messages (9)
msg76119 - (view) Author: Giuseppe Ottaviano (ot) Date: 2008-11-20 17:36
Hi all,
trying to compile Python 2.6 I got a segmentation fault while
byte-compiling the modules. 

The segmentation fault happened in ast_for_atom, parsing an Unicode
entity. I found out that another problem prevented unicodedata to be
imported, so unicode_decode raised an exception. I think the problem is
in the following lines:

            if (PyErr_ExceptionMatches(PyExc_UnicodeError)){
                PyObject *type, *value, *tback, *errstr;
                PyErr_Fetch(&type, &value, &tback);
                errstr = ((PyUnicodeErrorObject *)value)->reason;

I'm not an expert of CPython internals, but the exception is raised with
PyErr_SetString, so value is a PyStringObject, and that cast is invalid.
Changing the last line to 

                errstr = value;

everything works. 

The patch is attached.
msg76124 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-11-20 18:26
Can you provide the code that caused the seg fault?
msg76125 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-20 18:39
I think I got the point: to decode strings like "\N{CHARACTER NAME}"
PyUnicode_DecodeUnicodeEscape imports the unicodedata module.
If this fails, PyErr_SetString(PyExc_UnicodeError, "some message")
is called.

The exception will eventually be caught by ast_for_atom (in
Python/ast.c), but the exception there is not normalized: type is
PyExc_UnicodeError when the value is a string. Hence the invalid cast.
The exception cannot be normalized there: UnicodeError.__init__ needs 5

I think the error was to call PyErr_SetString in the first place.
The attached patch replaces it with PyErr_SetObject and a full
UnicodeDecodeError object.

This deserves a unit test, but I wonder how to reliably make the import
msg76130 - (view) Author: Giuseppe Ottaviano (ot) Date: 2008-11-20 19:08
@amaury: Yes, this is exactly what I noticed. I didn't know how to create 
an instance of a PyUnicodeErrorObject. BTW, isn't PyErr_SetString used 
throughout the code? Should all those calls replaced with PyErr_SetObject?

@benjamin: The bug can be easily reproduced:
brian:tmp ot$ echo 'raise Exception()' >
brian:tmp ot$ python2.6 -c "print u'\N{SOFT HYPHEN}'"
Segmentation fault

I don't know if this can be used as an unit test.
msg76163 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-11-21 03:42
Here's an alternative patch which simply calls PyObject_Str on the value
of the exception. It also has a test.
msg76166 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-21 08:39
Yes, it's better to simply call str() on the exception. I was confused
by the fact that UnicodeError is a simple exception, with a single
message, unlike UnicodeDecodeError which has more attributes.

About the patch:
- The test does not work if is run before: the unicodedata
module is imported only once on the first \N{} decoding. You could try
to run the test in a subprocess.

- In ast.c, errstr should be DECREF'd.
msg76206 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-11-21 22:04
This new patch address review issues.
msg76210 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-21 22:21
Everything looks good.

(20 lines of a complex test for two simple lines of code that nobody 
will ever run... unit tests is a religion)
msg76211 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-11-21 22:27
On Fri, Nov 21, 2008 at 4:21 PM, Amaury Forgeot d'Arc
<> wrote:
> Amaury Forgeot d'Arc <> added the comment:
> Everything looks good.

Thanks for the review. Fixed in r67320.

> (20 lines of a complex test for two simple lines of code that nobody
> will ever run... unit tests is a religion)

One can't say we didn't try. :)
Date User Action Args
2008-11-21 22:27:58benjamin.petersonsetstatus: open -> closed
resolution: accepted -> fixed
2008-11-21 22:27:44benjamin.petersonsetmessages: + msg76211
2008-11-21 22:21:24amaury.forgeotdarcsetkeywords: - needs review
resolution: accepted
messages: + msg76210
2008-11-21 22:04:07benjamin.petersonsetfiles: + use_PyObject_Str_and_test3.patch
messages: + msg76206
2008-11-21 08:39:15amaury.forgeotdarcsetmessages: + msg76166
2008-11-21 03:48:07benjamin.petersonsetfiles: + use_PyObject_Str_and_test2.patch
title: Patch for segmentation fault in ast_for_atom -> segmentation fault in ast_for_atom
2008-11-21 03:42:49benjamin.petersonsetfiles: + use_PyObject_Str_and_test.patch
messages: + msg76163
2008-11-20 19:08:32otsetmessages: + msg76130
2008-11-20 18:39:32amaury.forgeotdarcsetkeywords: + needs review
files: + bad_unicodeerror.patch
messages: + msg76125
nosy: + amaury.forgeotdarc
2008-11-20 18:26:01benjamin.petersonsetpriority: high
nosy: + benjamin.peterson
messages: + msg76124
2008-11-20 17:36:05otcreate