Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename #81648

Closed
vstinner opened this issue Jul 1, 2019 · 7 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vstinner
Copy link
Member

vstinner commented Jul 1, 2019

BPO 37467
Nosy @vstinner, @miss-islington
PRs
  • bpo-37467: Fix PyErr_Display() for bytes filename #14504
  • [3.8] bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504) #14514
  • [3.7] bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504) #14515
  • Files
  • sf.py
  • sf.xml
  • excepthook_syntaxerror.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-07-01.15:42:28.918>
    created_at = <Date 2019-07-01.09:49:10.994>
    labels = ['3.7', '3.8', '3.9', 'type-crash']
    title = 'sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename'
    updated_at = <Date 2019-07-01.15:42:28.917>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-07-01.15:42:28.917>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-07-01.15:42:28.918>
    closer = 'vstinner'
    components = []
    creation = <Date 2019-07-01.09:49:10.994>
    creator = 'vstinner'
    dependencies = []
    files = ['48449', '48450', '48451']
    hgrepos = []
    issue_num = 37467
    keywords = ['patch']
    message_count = 7.0
    messages = ['346988', '346992', '347028', '347029', '347030', '347035', '347036']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'miss-islington']
    pr_nums = ['14504', '14514', '14515']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue37467'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2019

    Using attached sf.py and sf.xml, I can crash Python. lxml builds a fake traceback to inject the XML filename the XML line number where the parsing error occurs. The problem is that the filename is a bytes object, whereas print_exception() expects the filename to be a Unicode string.

    Attached PR fix the crash.

    Fedora bug report:
    https://bugzilla.redhat.com/show_bug.cgi?id=1665490

    Example:

    $ python3 sf.py
    <lxml.etree._ElementTree object at 0x7f7d0f8abd08>
    Traceback (most recent call last):
      File "sf.py", line 6, in <module>
        xml2 = etree.parse("sf.xml")
      File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
      File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
      File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
      File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
      File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
      File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
      File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
      File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError
    Segmentation fault (core dumped)

    (gdb) frame 6
    #6 0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
    753 /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c: No such file or directory.
    (gdb) l
    748 in /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c
    (gdb) p filename
    $1 = b'sf.xml'
    (gdb) p *filename
    $2 = {
    ob_refcnt = 2,
    ob_type = 0x7ffff7db5da0 <PyBytes_Type>
    }

    Extract of print_exception():

            PyObject *message, *filename, *text;
            int lineno, offset;
            if (!parse_syntax_error(value, &message, &filename,
                                    &lineno, &offset, &text))
                PyErr_Clear();
            else {
                PyObject *line;
    
                Py_DECREF(value);
                value = message;
    
                line = PyUnicode_FromFormat("  File \"%S\", line %d\n",   // <====== HERE
                                              filename, lineno);
                Py_DECREF(filename);

    More gdb traceback:

    Program received signal SIGSEGV, Segmentation fault.
    find_maxchar_surrogates (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>,
    end=0xfffffffffffffffd <error: Cannot access memory at address 0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
    1660 /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c: No such file or directory.
    Missing separate debuginfos, use: dnf debuginfo-install abrt-libs-2.12.0-2.fc30.x86_64 augeas-libs-1.12.0-1.fc30.x86_64 bzip2-libs-1.0.6-29.fc30.x86_64 dbus-libs-1.12.16-1.fc30.x86_64 elfutils-libelf-0.176-3.fc30.x86_64 elfutils-libs-0.176-3.fc30.x86_64 expat-2.2.6-2.fc30.x86_64 glib2-2.60.4-1.fc30.x86_64 libacl-2.2.53-3.fc30.x86_64 libcap-2.26-5.fc30.x86_64 libdb-5.3.28-37.fc30.x86_64 libffi-3.1-19.fc30.x86_64 libgcc-9.1.1-1.fc30.x86_64 libgcrypt-1.8.4-3.fc30.x86_64 libgpg-error-1.33-2.fc30.x86_64 libmount-2.33.2-1.fc30.x86_64 libreport-2.10.0-3.fc30.x86_64 libselinux-2.9-1.fc30.x86_64 libstdc++-9.1.1-1.fc30.x86_64 libtar-1.2.20-17.fc30.x86_64 libtool-ltdl-2.4.6-29.fc30.x86_64 libuuid-2.33.2-1.fc30.x86_64 libxcrypt-4.4.6-2.fc30.x86_64 libxml2-2.9.9-2.fc30.x86_64 libxslt-1.1.33-1.fc30.x86_64 libzstd-1.4.0-1.fc30.x86_64 lz4-libs-1.8.3-2.fc30.x86_64 pcre-8.43-2.fc30.x86_64 popt-1.16-17.fc30.x86_64 python3-abrt-2.12.0-2.fc30.x86_64 python3-dbus-1.2.8-5.fc30.x86_64 python3-libreport-2.10.0-3.fc30.x86_64 python3-lxml-4.2.5-2.fc30.x86_64 python3-systemd-234-8.fc30.x86_64 python3-xmlsec-1.3.3-5.fc30.x86_64 rpm-libs-4.14.2.1-4.fc30.1.x86_64 systemd-libs-241-8.git9ef65cb.fc30.x86_64 xmlsec1-1.2.27-2.fc30.x86_64 xmlsec1-openssl-1.2.27-2.fc30.x86_64 xz-libs-5.2.4-5.fc30.x86_64 zlib-1.2.11-15.fc30.x86_64
    (gdb) where
    #0 0x00007ffff7c321ad in find_maxchar_surrogates
    (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, end=0xfffffffffffffffd <error: Cannot access memory at address 0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
    #1 0x00007ffff7c321ad in _PyUnicode_Ready (unicode=b'sf.xml') at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1699
    #2 0x00007ffff7afad8e in unicode_fromformat_write_str (precision=-1, width=-1, str=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2596
    #3 0x00007ffff7afad8e in unicode_fromformat_arg (vargs=0x7fffffffcb80, f=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2807
    #4 0x00007ffff7afad8e in PyUnicode_FromFormatV (format=<optimized out>, vargs=<optimized out>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2914
    #5 0x00007ffff7b82a99 in PyUnicode_FromFormat (format=format@entry=0x7ffff7c9b045 " File \"%U\", line %d\n")
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2966
    #6 0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
    #7 0x00007ffff7c85898 in print_exception_recursive
    (f=<_io.TextIOWrapper at remote 0x7fffea910708>, value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, seen=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:901
    #8 0x00007ffff7c8b8bb in PyErr_Display
    (exception=<optimized out>, value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, tb=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:935
    #9 0x00007ffff7c8b93c in sys_excepthook (self=<optimized out>, args=<optimized out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/sysmodule.c:332
    (...)

    @vstinner vstinner added 3.7 (EOL) end of life type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 1, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2019

    The bug can be reproduced on the master branch of Python. I tested with versions:

    • lxml 4.3.4
    • xmlsec 1.3.3

    The filename comes from a lxml.etree.XMLSyntaxError exception which inherits from SyntaxError. The bug can be reproduced without lxml nor xmlsec, just with attached excepthook_syntaxerror.py:

    $ ./python excepthook_syntaxerror.py
    Traceback (most recent call last):
      File "/home/vstinner/prog/python/master/excepthook_syntaxerror.py", line 3, in <module>
        raise SyntaxError("msg", (b"bytes_filename", 123, 0, "text"))
    Objects/unicodeobject.c:492: _PyUnicode_CheckConsistency: Assertion "((((((PyObject*)(op))->ob_type))->tp_flags & ((1UL << 28))) != 0)" failed
    Enable tracemalloc to get the memory block allocation traceback

    object : b'bytes_filename'
    type : bytes
    refcount: 4
    address : 0x7fe22fe7ce50
    Fatal Python error: _PyObject_AssertFailed

    Current thread 0x00007fe23cfbf740 (most recent call first):
    File "/home/vstinner/prog/python/master/excepthook_syntaxerror.py", line 5 in <module>
    Aborted (core dumped)

    excepthook_syntaxerror.py uses:

    raise SyntaxError("msg", (b"bytes_filename", 123, 0, "text"))

    @vstinner vstinner added 3.8 only security fixes 3.9 only security fixes labels Jul 1, 2019
    @vstinner vstinner changed the title print_exception() crash when lxml 4.2.5 raises a parser error print_exception() crash when lxml/xmlsec raises a parser error Jul 1, 2019
    @vstinner vstinner changed the title print_exception() crash when lxml/xmlsec raises a parser error sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename Jul 1, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2019

    New changeset f9b7457 by Victor Stinner in branch 'master':
    bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504)
    f9b7457

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2019

    Python 2.7 is not affected:

                const char *filename, *text;
                int lineno, offset;
                if (!parse_syntax_error(value, &message, &filename,
                                        &lineno, &offset, &text))
                    PyErr_Clear();
                else {
                    char buf[10];
                    PyFile_WriteString("  File \"", f);
                    if (filename == NULL)
                        PyFile_WriteString("<string>", f);
                    else
                        PyFile_WriteString(filename, f);
                    PyFile_WriteString("\", line ", f);
                    PyOS_snprintf(buf, sizeof(buf), "%d", lineno);
                    PyFile_WriteString(buf, f);
                    PyFile_WriteString("\n", f);

    @miss-islington
    Copy link
    Contributor

    New changeset 2683ded by Miss Islington (bot) in branch '3.8':
    bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504)
    2683ded

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2019

    New changeset 8cbffc4 by Victor Stinner in branch '3.7':
    bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504) (GH-14515)
    8cbffc4

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2019

    Ok, it's not fixed in all affected maintained branches, I close the issue.

    @vstinner vstinner closed this as completed Jul 1, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants