classification
Title: sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename
Type: crash Stage: resolved
Components: Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2019-07-01 09:49 by vstinner, last changed 2019-07-01 15:42 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
sf.py vstinner, 2019-07-01 09:49
sf.xml vstinner, 2019-07-01 09:49
excepthook_syntaxerror.py vstinner, 2019-07-01 10:15
Pull Requests
URL Status Linked Edit
PR 14504 merged vstinner, 2019-07-01 10:35
PR 14514 merged miss-islington, 2019-07-01 14:51
PR 14515 merged vstinner, 2019-07-01 14:55
Messages (7)
msg346988 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 09:49
Using attached sf.py and sf.xml, I can crash Python. lxml builds a fake traceback to inject the XML filename the XML line number where the parsing error occurs. The problem is that the filename is a bytes object, whereas print_exception() expects the filename to be a Unicode string.

Attached PR fix the crash.

Fedora bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1665490


Example:

$ python3 sf.py
<lxml.etree._ElementTree object at 0x7f7d0f8abd08>
Traceback (most recent call last):
  File "sf.py", line 6, in <module>
    xml2 = etree.parse("sf.xml")
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError
Segmentation fault (core dumped)


(gdb) frame 6
#6  0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
753	/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c: No such file or directory.
(gdb) l
748	in /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c
(gdb) p filename
$1 = b'sf.xml'
(gdb) p *filename
$2 = {
  ob_refcnt = 2, 
  ob_type = 0x7ffff7db5da0 <PyBytes_Type>
}


Extract of print_exception():

        PyObject *message, *filename, *text;
        int lineno, offset;
        if (!parse_syntax_error(value, &message, &filename,
                                &lineno, &offset, &text))
            PyErr_Clear();
        else {
            PyObject *line;

            Py_DECREF(value);
            value = message;

            line = PyUnicode_FromFormat("  File \"%S\", line %d\n",   // <====== HERE
                                          filename, lineno);
            Py_DECREF(filename);


More gdb traceback:

Program received signal SIGSEGV, Segmentation fault.
find_maxchar_surrogates (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, 
    end=0xfffffffffffffffd <error: Cannot access memory at address 0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
1660	/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c: No such file or directory.
Missing separate debuginfos, use: dnf debuginfo-install abrt-libs-2.12.0-2.fc30.x86_64 augeas-libs-1.12.0-1.fc30.x86_64 bzip2-libs-1.0.6-29.fc30.x86_64 dbus-libs-1.12.16-1.fc30.x86_64 elfutils-libelf-0.176-3.fc30.x86_64 elfutils-libs-0.176-3.fc30.x86_64 expat-2.2.6-2.fc30.x86_64 glib2-2.60.4-1.fc30.x86_64 libacl-2.2.53-3.fc30.x86_64 libcap-2.26-5.fc30.x86_64 libdb-5.3.28-37.fc30.x86_64 libffi-3.1-19.fc30.x86_64 libgcc-9.1.1-1.fc30.x86_64 libgcrypt-1.8.4-3.fc30.x86_64 libgpg-error-1.33-2.fc30.x86_64 libmount-2.33.2-1.fc30.x86_64 libreport-2.10.0-3.fc30.x86_64 libselinux-2.9-1.fc30.x86_64 libstdc++-9.1.1-1.fc30.x86_64 libtar-1.2.20-17.fc30.x86_64 libtool-ltdl-2.4.6-29.fc30.x86_64 libuuid-2.33.2-1.fc30.x86_64 libxcrypt-4.4.6-2.fc30.x86_64 libxml2-2.9.9-2.fc30.x86_64 libxslt-1.1.33-1.fc30.x86_64 libzstd-1.4.0-1.fc30.x86_64 lz4-libs-1.8.3-2.fc30.x86_64 pcre-8.43-2.fc30.x86_64 popt-1.16-17.fc30.x86_64 python3-abrt-2.12.0-2.fc30.x86_64 python3-dbus-1.2.8-5.fc30.x86_64 python3-libreport-2.10.0-3.fc30.x86_64 python3-lxml-4.2.5-2.fc30.x86_64 python3-systemd-234-8.fc30.x86_64 python3-xmlsec-1.3.3-5.fc30.x86_64 rpm-libs-4.14.2.1-4.fc30.1.x86_64 systemd-libs-241-8.git9ef65cb.fc30.x86_64 xmlsec1-1.2.27-2.fc30.x86_64 xmlsec1-openssl-1.2.27-2.fc30.x86_64 xz-libs-5.2.4-5.fc30.x86_64 zlib-1.2.11-15.fc30.x86_64
(gdb) where
#0  0x00007ffff7c321ad in find_maxchar_surrogates
    (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, end=0xfffffffffffffffd <error: Cannot access memory at address 0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
#1  0x00007ffff7c321ad in _PyUnicode_Ready (unicode=b'sf.xml') at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1699
#2  0x00007ffff7afad8e in unicode_fromformat_write_str (precision=-1, width=-1, str=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2596
#3  0x00007ffff7afad8e in unicode_fromformat_arg (vargs=0x7fffffffcb80, f=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2807
#4  0x00007ffff7afad8e in PyUnicode_FromFormatV (format=<optimized out>, vargs=<optimized out>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2914
#5  0x00007ffff7b82a99 in PyUnicode_FromFormat (format=format@entry=0x7ffff7c9b045 "  File \"%U\", line %d\n")
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2966
#6  0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
#7  0x00007ffff7c85898 in print_exception_recursive
    (f=<_io.TextIOWrapper at remote 0x7fffea910708>, value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, seen=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:901
#8  0x00007ffff7c8b8bb in PyErr_Display
    (exception=<optimized out>, value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, tb=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:935
#9  0x00007ffff7c8b93c in sys_excepthook (self=<optimized out>, args=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/sysmodule.c:332
(...)
msg346992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 10:15
The bug can be reproduced on the master branch of Python. I tested with versions:

* lxml 4.3.4
* xmlsec 1.3.3  

The filename comes from a lxml.etree.XMLSyntaxError exception which inherits from SyntaxError. The bug can be reproduced without lxml nor xmlsec, just with attached excepthook_syntaxerror.py:

$ ./python excepthook_syntaxerror.py
Traceback (most recent call last):
  File "/home/vstinner/prog/python/master/excepthook_syntaxerror.py", line 3, in <module>
    raise SyntaxError("msg", (b"bytes_filename", 123, 0, "text"))
Objects/unicodeobject.c:492: _PyUnicode_CheckConsistency: Assertion "((((((PyObject*)(op))->ob_type))->tp_flags & ((1UL << 28))) != 0)" failed
Enable tracemalloc to get the memory block allocation traceback

object  : b'bytes_filename'
type    : bytes
refcount: 4
address : 0x7fe22fe7ce50
Fatal Python error: _PyObject_AssertFailed

Current thread 0x00007fe23cfbf740 (most recent call first):
  File "/home/vstinner/prog/python/master/excepthook_syntaxerror.py", line 5 in <module>
Aborted (core dumped)


excepthook_syntaxerror.py uses:

raise SyntaxError("msg", (b"bytes_filename", 123, 0, "text"))
msg347028 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 14:51
New changeset f9b7457bd7f438263e0d2dd1f70589ad56a2585e by Victor Stinner in branch 'master':
bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504)
https://github.com/python/cpython/commit/f9b7457bd7f438263e0d2dd1f70589ad56a2585e
msg347029 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 14:57
Python 2.7 is not affected:

            const char *filename, *text;
            int lineno, offset;
            if (!parse_syntax_error(value, &message, &filename,
                                    &lineno, &offset, &text))
                PyErr_Clear();
            else {
                char buf[10];
                PyFile_WriteString("  File \"", f);
                if (filename == NULL)
                    PyFile_WriteString("<string>", f);
                else
                    PyFile_WriteString(filename, f);
                PyFile_WriteString("\", line ", f);
                PyOS_snprintf(buf, sizeof(buf), "%d", lineno);
                PyFile_WriteString(buf, f);
                PyFile_WriteString("\n", f);
msg347030 - (view) Author: miss-islington (miss-islington) Date: 2019-07-01 15:11
New changeset 2683ded568b24fff1139edd9127a349f432292a6 by Miss Islington (bot) in branch '3.8':
bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504)
https://github.com/python/cpython/commit/2683ded568b24fff1139edd9127a349f432292a6
msg347035 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 15:41
New changeset 8cbffc4d96d1da0fbc38da6f34f2da30c5ffd601 by Victor Stinner in branch '3.7':
bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504) (GH-14515)
https://github.com/python/cpython/commit/8cbffc4d96d1da0fbc38da6f34f2da30c5ffd601
msg347036 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 15:42
Ok, it's not fixed in all affected maintained branches, I close the issue.
History
Date User Action Args
2019-09-06 09:38:28vstinnerlinkissue38042 superseder
2019-07-01 15:42:28vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg347036

stage: patch review -> resolved
2019-07-01 15:41:50vstinnersetmessages: + msg347035
2019-07-01 15:11:20miss-islingtonsetnosy: + miss-islington
messages: + msg347030
2019-07-01 14:57:00vstinnersetmessages: + msg347029
2019-07-01 14:55:52vstinnersetpull_requests: + pull_request14331
2019-07-01 14:51:31miss-islingtonsetpull_requests: + pull_request14330
2019-07-01 14:51:25vstinnersetmessages: + msg347028
2019-07-01 10:35:46vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request14320
2019-07-01 10:33:56vstinnersettitle: print_exception() crash when lxml/xmlsec raises a parser error -> sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename
2019-07-01 10:15:06vstinnersetfiles: + excepthook_syntaxerror.py

title: print_exception() crash when lxml 4.2.5 raises a parser error -> print_exception() crash when lxml/xmlsec raises a parser error
messages: + msg346992
versions: + Python 3.8, Python 3.9
2019-07-01 09:49:18vstinnersetfiles: + sf.xml
2019-07-01 09:49:11vstinnercreate