This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2019-07-01.09:49:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1561974551.02.0.598899276375.issue37467@roundup.psfhosted.org>
In-reply-to
Content
Using attached sf.py and sf.xml, I can crash Python. lxml builds a fake traceback to inject the XML filename the XML line number where the parsing error occurs. The problem is that the filename is a bytes object, whereas print_exception() expects the filename to be a Unicode string.

Attached PR fix the crash.

Fedora bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1665490


Example:

$ python3 sf.py
<lxml.etree._ElementTree object at 0x7f7d0f8abd08>
Traceback (most recent call last):
  File "sf.py", line 6, in <module>
    xml2 = etree.parse("sf.xml")
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError
Segmentation fault (core dumped)


(gdb) frame 6
#6  0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
753	/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c: No such file or directory.
(gdb) l
748	in /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c
(gdb) p filename
$1 = b'sf.xml'
(gdb) p *filename
$2 = {
  ob_refcnt = 2, 
  ob_type = 0x7ffff7db5da0 <PyBytes_Type>
}


Extract of print_exception():

        PyObject *message, *filename, *text;
        int lineno, offset;
        if (!parse_syntax_error(value, &message, &filename,
                                &lineno, &offset, &text))
            PyErr_Clear();
        else {
            PyObject *line;

            Py_DECREF(value);
            value = message;

            line = PyUnicode_FromFormat("  File \"%S\", line %d\n",   // <====== HERE
                                          filename, lineno);
            Py_DECREF(filename);


More gdb traceback:

Program received signal SIGSEGV, Segmentation fault.
find_maxchar_surrogates (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, 
    end=0xfffffffffffffffd <error: Cannot access memory at address 0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
1660	/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c: No such file or directory.
Missing separate debuginfos, use: dnf debuginfo-install abrt-libs-2.12.0-2.fc30.x86_64 augeas-libs-1.12.0-1.fc30.x86_64 bzip2-libs-1.0.6-29.fc30.x86_64 dbus-libs-1.12.16-1.fc30.x86_64 elfutils-libelf-0.176-3.fc30.x86_64 elfutils-libs-0.176-3.fc30.x86_64 expat-2.2.6-2.fc30.x86_64 glib2-2.60.4-1.fc30.x86_64 libacl-2.2.53-3.fc30.x86_64 libcap-2.26-5.fc30.x86_64 libdb-5.3.28-37.fc30.x86_64 libffi-3.1-19.fc30.x86_64 libgcc-9.1.1-1.fc30.x86_64 libgcrypt-1.8.4-3.fc30.x86_64 libgpg-error-1.33-2.fc30.x86_64 libmount-2.33.2-1.fc30.x86_64 libreport-2.10.0-3.fc30.x86_64 libselinux-2.9-1.fc30.x86_64 libstdc++-9.1.1-1.fc30.x86_64 libtar-1.2.20-17.fc30.x86_64 libtool-ltdl-2.4.6-29.fc30.x86_64 libuuid-2.33.2-1.fc30.x86_64 libxcrypt-4.4.6-2.fc30.x86_64 libxml2-2.9.9-2.fc30.x86_64 libxslt-1.1.33-1.fc30.x86_64 libzstd-1.4.0-1.fc30.x86_64 lz4-libs-1.8.3-2.fc30.x86_64 pcre-8.43-2.fc30.x86_64 popt-1.16-17.fc30.x86_64 python3-abrt-2.12.0-2.fc30.x86_64 python3-dbus-1.2.8-5.fc30.x86_64 python3-libreport-2.10.0-3.fc30.x86_64 python3-lxml-4.2.5-2.fc30.x86_64 python3-systemd-234-8.fc30.x86_64 python3-xmlsec-1.3.3-5.fc30.x86_64 rpm-libs-4.14.2.1-4.fc30.1.x86_64 systemd-libs-241-8.git9ef65cb.fc30.x86_64 xmlsec1-1.2.27-2.fc30.x86_64 xmlsec1-openssl-1.2.27-2.fc30.x86_64 xz-libs-5.2.4-5.fc30.x86_64 zlib-1.2.11-15.fc30.x86_64
(gdb) where
#0  0x00007ffff7c321ad in find_maxchar_surrogates
    (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, end=0xfffffffffffffffd <error: Cannot access memory at address 0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
#1  0x00007ffff7c321ad in _PyUnicode_Ready (unicode=b'sf.xml') at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1699
#2  0x00007ffff7afad8e in unicode_fromformat_write_str (precision=-1, width=-1, str=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2596
#3  0x00007ffff7afad8e in unicode_fromformat_arg (vargs=0x7fffffffcb80, f=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2807
#4  0x00007ffff7afad8e in PyUnicode_FromFormatV (format=<optimized out>, vargs=<optimized out>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2914
#5  0x00007ffff7b82a99 in PyUnicode_FromFormat (format=format@entry=0x7ffff7c9b045 "  File \"%U\", line %d\n")
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2966
#6  0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
#7  0x00007ffff7c85898 in print_exception_recursive
    (f=<_io.TextIOWrapper at remote 0x7fffea910708>, value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, seen=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:901
#8  0x00007ffff7c8b8bb in PyErr_Display
    (exception=<optimized out>, value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, tb=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:935
#9  0x00007ffff7c8b93c in sys_excepthook (self=<optimized out>, args=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/sysmodule.c:332
(...)
History
Date User Action Args
2019-07-01 09:49:11vstinnersetrecipients: + vstinner
2019-07-01 09:49:11vstinnersetmessageid: <1561974551.02.0.598899276375.issue37467@roundup.psfhosted.org>
2019-07-01 09:49:10vstinnerlinkissue37467 messages
2019-07-01 09:49:10vstinnercreate