Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes with lines of the form "async \" #76033

Closed
AlexandreHamelin mannequin opened this issue Oct 23, 2017 · 7 comments
Closed

Crashes with lines of the form "async \" #76033

AlexandreHamelin mannequin opened this issue Oct 23, 2017 · 7 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@AlexandreHamelin
Copy link
Mannequin

AlexandreHamelin mannequin commented Oct 23, 2017

BPO 31852
Nosy @vstinner, @1st1, @pablogsal
PRs
  • [3.6] bpo-31852: Fix segfault caused by using the async soft keyword #4122
  • Files
  • async_parser_crash.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-10-31.00:49:59.198>
    created_at = <Date 2017-10-23.19:22:17.763>
    labels = ['interpreter-core', 'type-crash']
    title = 'Crashes with lines of the form "async \\"'
    updated_at = <Date 2017-10-31.11:01:46.857>
    user = 'https://bugs.python.org/AlexandreHamelin'

    bugs.python.org fields:

    activity = <Date 2017-10-31.11:01:46.857>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-10-31.00:49:59.198>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2017-10-23.19:22:17.763>
    creator = 'Alexandre Hamelin'
    dependencies = []
    files = ['47237']
    hgrepos = []
    issue_num = 31852
    keywords = ['patch']
    message_count = 7.0
    messages = ['304835', '304975', '304995', '305265', '305266', '305267', '305286']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'yselivanov', 'pablogsal', 'Alexandre Hamelin']
    pr_nums = ['4122']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue31852'
    versions = ['Python 3.6']

    @AlexandreHamelin
    Copy link
    Mannequin Author

    AlexandreHamelin mannequin commented Oct 23, 2017

    Hi.

    Python 3.6.2 crashes when interpreting lines with the text "async \" (future keyword 'async' and ending with a backslash).

    Tested in a docker environment (debian jessie). (see github.com/0xquad/docker-python36 if needed)

    Examples:

    $ docker run -ti --rm python36
    root@4c09392f83c8:/# python3.6
    Python 3.6.2 (default, Aug  4 2017, 14:35:04)
    [GCC 6.4.0 20170724] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> async \
    ...
      File "<stdin>", line 1
        \ufffd\ufffdF\ufffd\ufffd
             ^
    SyntaxError: invalid syntax
    >>> async \
    Segmentation fault
    root@4c09392f83c8:/#

    Also,

    ----- file: test.py
    #/usr/bin/python3.6
    async \
    <repeated 30000 times>
    -----

    $ ./test.py
    Segmentation fault
    $

    Haven't taken the time to produce a backtrace or investigate with latest the dev versions or any further.

    Let me know if I can assist in any way.

    @AlexandreHamelin AlexandreHamelin mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 23, 2017
    @pablogsal
    Copy link
    Member

    This issue is fixed in the master branch (version 3.7.0 alpha 2). The issue was fixed in this PR:

    #1669

    The cause is that async was not a proper keyword and the parser segfaults when checking for the new token and parsing the newline. In particular, this happens here:

    translate_newlines at Parser/tokenizer.c:713
    713 buf = PyMem_MALLOC(needed_length);

    This is the stack trace:

    #0 _PyObject_Alloc (ctx=<optimized out>, elsize=10, nelem=1, use_calloc=0) at Objects/obmalloc.c:806
    #1 _PyObject_Malloc (ctx=<optimized out>, nbytes=10) at Objects/obmalloc.c:985
    #2 0x0000000000453020 in translate_newlines (tok=0x9187b0, exec_input=0, s=0x7ffff7fa40e0 "async \\\n") at Parser/tokenizer.c:713
    #3 tok_nextc (tok=tok@entry=0x9187b0) at Parser/tokenizer.c:943
    #4 0x0000000000454948 in tok_get (tok=tok@entry=0x9187b0, p_start=p_start@entry=0x7fffffffdc40, p_end=p_end@entry=0x7fffffffdc50)
    at Parser/tokenizer.c:1382
    #5 0x0000000000455749 in PyTokenizer_Get (tok=tok@entry=0x9187b0, p_start=p_start@entry=0x7fffffffdc40, p_end=p_end@entry=0x7fffffffdc50)
    at Parser/tokenizer.c:1902
    #6 0x000000000045158d in parsetok (tok=0x9187b0, g=<optimized out>, start=256, err_ret=err_ret@entry=0x7fffffffdce0,
    flags=flags@entry=0x7fffffffdcd0) at Parser/parsetok.c:208
    #7 0x0000000000452280 in PyParser_ParseFileObject (fp=<optimized out>, filename=filename@entry=0x7ffff7f1b848, enc=<optimized out>,
    g=<optimized out>, start=<optimized out>, ps1=<optimized out>, ps2=0x7ffff7e63648 "... ", err_ret=err_ret@entry=0x7fffffffdce0,
    flags=flags@entry=0x7fffffffdcd0) at Parser/parsetok.c:134
    #8 0x0000000000433949 in PyParser_ASTFromFileObject (fp=<optimized out>, filename=0x7ffff7f1b848, enc=<optimized out>,
    start=<optimized out>, ps1=<optimized out>, ps2=<optimized out>, flags=0x7fffffffde90, errcode=0x7fffffffdd80, arena=0x7ffff7fe2168)
    at Python/pythonrun.c:1166
    #9 0x0000000000433b5b in PyRun_InteractiveOneObject (fp=fp@entry=0x7ffff74b2640 <IO_2_1_stdin>, filename=filename@entry=0x7ffff7f1b848,
    flags=flags@entry=0x7fffffffde90) at Python/pythonrun.c:218
    #10 0x0000000000433eae in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74b2640 <IO_2_1_stdin>,
    filename_str=filename_str@entry=0x5dd7a4 "<stdin>", flags=flags@entry=0x7fffffffde90) at Python/pythonrun.c:115
    #11 0x0000000000433fbc in PyRun_AnyFileExFlags (fp=0x7ffff74b2640 <IO_2_1_stdin>, filename=0x5dd7a4 "<stdin>", closeit=0,
    flags=0x7fffffffde90) at Python/pythonrun.c:77
    #12 0x00000000004476fa in run_file (p_cf=0x7fffffffde90, filename=<optimized out>, fp=0x7ffff74b2640 <IO_2_1_stdin>) at Modules/main.c:341
    #13 Py_Main (argc=argc@entry=1, argv=argv@entry=0x910010) at Modules/main.c:895
    #14 0x000000000041e17a in main (argc=1, argv=<optimized out>) at ./Programs/python.c:102

    After applying commit ac31770 the issue is fixed.

    Does it make sense to backport ac31770 to 3.6?

    @vstinner
    Copy link
    Member

    Does it make sense to backport ac31770 to 3.6?

    No, async was not a keyword in Python 3.6 on purpose. Making it a keyword can break a lot of code.

    I confirm that Python 3.6 still crashs with a very high number of "async " prefixes: try attached async_parser_crash.py.

    Extract of the gdb traceback on a crash:

    (...)
    #665 0x0000000000454867 in tok_get (tok=0x7fffff8b98c0,
    p_start=0x7fffff8b9cb8, p_end=0x7fffff8b9cb0) at Parser/tokenizer.c:1571
    #666 0x0000000000454867 in tok_get (tok=0x7fffff8b9d40,
    p_start=0x7fffff8ba138, p_end=0x7fffff8ba130) at Parser/tokenizer.c:1571
    #667 0x0000000000454867 in tok_get (tok=0x7fffff8ba1c0,
    p_start=0x7fffff8ba5b8, p_end=0x7fffff8ba5b0) at Parser/tokenizer.c:1571
    #668 0x0000000000454867 in tok_get (tok=0x7fffff8ba640,
    p_start=0x7fffff8baa38, p_end=0x7fffff8baa30) at Parser/tokenizer.c:1571
    #669 0x0000000000454867 in tok_get (tok=0x7fffff8baac0,
    p_start=0x7fffff8baeb8, p_end=0x7fffff8baeb0) at Parser/tokenizer.c:1571
    #670 0x0000000000454867 in tok_get (tok=0x7fffff8baf40,
    p_start=0x7fffff8bb338, p_end=0x7fffff8bb330) at Parser/tokenizer.c:1571
    #671 0x0000000000454867 in tok_get (tok=0x7fffff8bb3c0,
    p_start=0x7fffff8bb7b8, p_end=0x7fffff8bb7b0) at Parser/tokenizer.c:1571
    (...)

    It looks like a stack overflow.

    The tokenizer may fail earlier on "async async ".

    @vstinner
    Copy link
    Member

    New changeset 690c36f by Victor Stinner (Pablo Galindo) in branch '3.6':
    [3.6] bpo-31852: Fix segfault caused by using the async soft keyword (GH-4122)
    690c36f

    @vstinner
    Copy link
    Member

    Thank you Alexandre Hamelin for the bug report and Pablo Galindo for the fix ;-)

    @AlexandreHamelin
    Copy link
    Mannequin Author

    AlexandreHamelin mannequin commented Oct 31, 2017

    Awesome work, thanks to you!

    Would it also be the case for 'await' ?

    @vstinner
    Copy link
    Member

    Would it also be the case for 'await' ?

    "async" requires to maintain a "async_def" state. It seems like await doesn't need a state for itself, but rely on the "async_def" state which has been fixed.

    Extract of Parser/tokenizer.c:

                /* Current token length is 5. */
                if (tok->async_def) {
                    /* We're inside an 'async def' function. */
                    if (memcmp(tok->start, "async", 5) == 0) {
                        return ASYNC;
                    }
                    if (memcmp(tok->start, "await", 5) == 0) {
                        return AWAIT;
                    }
                }

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants