classification
Title: Crashes with lines of the form "async \"
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Alexandre Hamelin, pablogsal, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2017-10-23 19:22 by Alexandre Hamelin, last changed 2017-10-31 11:01 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
async_parser_crash.py vstinner, 2017-10-25 15:56
Pull Requests
URL Status Linked Edit
PR 4122 merged pablogsal, 2017-10-25 22:21
Messages (7)
msg304835 - (view) Author: Alexandre Hamelin (Alexandre Hamelin) Date: 2017-10-23 19:22
Hi.

Python 3.6.2 crashes when interpreting lines with the text "async \" (future keyword 'async' and ending with a backslash).

Tested in a docker environment (debian jessie). (see github.com/0xquad/docker-python36 if needed)

Examples:

$ docker run -ti --rm python36
4c09392f83c8">root@4c09392f83c8:/# python3.6
Python 3.6.2 (default, Aug  4 2017, 14:35:04)
[GCC 6.4.0 20170724] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> async \
...
  File "<stdin>", line 1
    \ufffd\ufffdF\ufffd\ufffd
         ^
SyntaxError: invalid syntax
>>> async \
Segmentation fault
4c09392f83c8">root@4c09392f83c8:/#



Also,

----- file: test.py
#/usr/bin/python3.6
async \
<repeated 30000 times>
-----

$ ./test.py
Segmentation fault
$


Haven't taken the time to produce a backtrace or investigate with latest the dev versions or any further.

Let me know if I can assist in any way.
msg304975 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2017-10-25 10:16
This issue is fixed in the master branch (version 3.7.0 alpha 2). The issue was fixed in this PR:

https://github.com/python/cpython/pull/1669

The cause is that async was not a proper keyword and the parser segfaults when checking for the new token and parsing the newline. In particular, this happens here:

translate_newlines at Parser/tokenizer.c:713
713         buf = PyMem_MALLOC(needed_length);

This is the stack trace:

#0  _PyObject_Alloc (ctx=<optimized out>, elsize=10, nelem=1, use_calloc=0) at Objects/obmalloc.c:806
#1  _PyObject_Malloc (ctx=<optimized out>, nbytes=10) at Objects/obmalloc.c:985
#2  0x0000000000453020 in translate_newlines (tok=0x9187b0, exec_input=0, s=0x7ffff7fa40e0 "async \\\n") at Parser/tokenizer.c:713
#3  tok_nextc (tok=tok@entry=0x9187b0) at Parser/tokenizer.c:943
#4  0x0000000000454948 in tok_get (tok=tok@entry=0x9187b0, p_start=p_start@entry=0x7fffffffdc40, p_end=p_end@entry=0x7fffffffdc50)
    at Parser/tokenizer.c:1382
#5  0x0000000000455749 in PyTokenizer_Get (tok=tok@entry=0x9187b0, p_start=p_start@entry=0x7fffffffdc40, p_end=p_end@entry=0x7fffffffdc50)
    at Parser/tokenizer.c:1902
#6  0x000000000045158d in parsetok (tok=0x9187b0, g=<optimized out>, start=256, err_ret=err_ret@entry=0x7fffffffdce0,
    flags=flags@entry=0x7fffffffdcd0) at Parser/parsetok.c:208
#7  0x0000000000452280 in PyParser_ParseFileObject (fp=<optimized out>, filename=filename@entry=0x7ffff7f1b848, enc=<optimized out>,
    g=<optimized out>, start=<optimized out>, ps1=<optimized out>, ps2=0x7ffff7e63648 "... ", err_ret=err_ret@entry=0x7fffffffdce0,
    flags=flags@entry=0x7fffffffdcd0) at Parser/parsetok.c:134
#8  0x0000000000433949 in PyParser_ASTFromFileObject (fp=<optimized out>, filename=0x7ffff7f1b848, enc=<optimized out>,
    start=<optimized out>, ps1=<optimized out>, ps2=<optimized out>, flags=0x7fffffffde90, errcode=0x7fffffffdd80, arena=0x7ffff7fe2168)
    at Python/pythonrun.c:1166
#9  0x0000000000433b5b in PyRun_InteractiveOneObject (fp=fp@entry=0x7ffff74b2640 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7f1b848,
    flags=flags@entry=0x7fffffffde90) at Python/pythonrun.c:218
#10 0x0000000000433eae in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74b2640 <_IO_2_1_stdin_>,
    filename_str=filename_str@entry=0x5dd7a4 "<stdin>", flags=flags@entry=0x7fffffffde90) at Python/pythonrun.c:115
#11 0x0000000000433fbc in PyRun_AnyFileExFlags (fp=0x7ffff74b2640 <_IO_2_1_stdin_>, filename=0x5dd7a4 "<stdin>", closeit=0,
    flags=0x7fffffffde90) at Python/pythonrun.c:77
#12 0x00000000004476fa in run_file (p_cf=0x7fffffffde90, filename=<optimized out>, fp=0x7ffff74b2640 <_IO_2_1_stdin_>) at Modules/main.c:341
#13 Py_Main (argc=argc@entry=1, argv=argv@entry=0x910010) at Modules/main.c:895
#14 0x000000000041e17a in main (argc=1, argv=<optimized out>) at ./Programs/python.c:102

After applying commit ac317700ce7439e38a8b420218d9a5035bba92ed the issue is fixed.

Does it make sense to backport ac317700ce7439e38a8b420218d9a5035bba92ed to 3.6?
msg304995 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-25 15:56
> Does it make sense to backport ac317700ce7439e38a8b420218d9a5035bba92ed to 3.6?

No, async was not a keyword in Python 3.6 on purpose. Making it a keyword can break a lot of code.

I confirm that Python 3.6 still crashs with a very high number of "async " prefixes: try attached async_parser_crash.py.

Extract of the gdb traceback on a crash:

(...)
#665 0x0000000000454867 in tok_get (tok=0x7fffff8b98c0, 
    p_start=0x7fffff8b9cb8, p_end=0x7fffff8b9cb0) at Parser/tokenizer.c:1571
#666 0x0000000000454867 in tok_get (tok=0x7fffff8b9d40, 
    p_start=0x7fffff8ba138, p_end=0x7fffff8ba130) at Parser/tokenizer.c:1571
#667 0x0000000000454867 in tok_get (tok=0x7fffff8ba1c0, 
    p_start=0x7fffff8ba5b8, p_end=0x7fffff8ba5b0) at Parser/tokenizer.c:1571
#668 0x0000000000454867 in tok_get (tok=0x7fffff8ba640, 
    p_start=0x7fffff8baa38, p_end=0x7fffff8baa30) at Parser/tokenizer.c:1571
#669 0x0000000000454867 in tok_get (tok=0x7fffff8baac0, 
    p_start=0x7fffff8baeb8, p_end=0x7fffff8baeb0) at Parser/tokenizer.c:1571
#670 0x0000000000454867 in tok_get (tok=0x7fffff8baf40, 
    p_start=0x7fffff8bb338, p_end=0x7fffff8bb330) at Parser/tokenizer.c:1571
#671 0x0000000000454867 in tok_get (tok=0x7fffff8bb3c0, 
    p_start=0x7fffff8bb7b8, p_end=0x7fffff8bb7b0) at Parser/tokenizer.c:1571
(...)

It looks like a stack overflow.

The tokenizer may fail earlier on "async async ".
msg305265 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-31 00:46
New changeset 690c36f2f1085145d364a89bfed5944dd2470308 by Victor Stinner (Pablo Galindo) in branch '3.6':
[3.6] bpo-31852: Fix segfault caused by using the async soft keyword (GH-4122)
https://github.com/python/cpython/commit/690c36f2f1085145d364a89bfed5944dd2470308
msg305266 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-31 00:50
Thank you Alexandre Hamelin for the bug report and Pablo Galindo for the fix ;-)
msg305267 - (view) Author: Alexandre Hamelin (Alexandre Hamelin) Date: 2017-10-31 02:44
Awesome work, thanks to you!

Would it also be the case for 'await' ?
msg305286 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-31 11:01
> Would it also be the case for 'await' ?

"async" requires to maintain a "async_def" state. It seems like await doesn't need a state for itself, but rely on the "async_def" state which has been fixed.

Extract of Parser/tokenizer.c:

            /* Current token length is 5. */
            if (tok->async_def) {
                /* We're inside an 'async def' function. */
                if (memcmp(tok->start, "async", 5) == 0) {
                    return ASYNC;
                }
                if (memcmp(tok->start, "await", 5) == 0) {
                    return AWAIT;
                }
            }
History
Date User Action Args
2018-09-24 15:32:03xtreaklinkissue26000 superseder
2017-10-31 11:01:46vstinnersetmessages: + msg305286
2017-10-31 02:44:00Alexandre Hamelinsetmessages: + msg305267
2017-10-31 00:50:32vstinnersetmessages: + msg305266
2017-10-31 00:49:59vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-10-31 00:46:38vstinnersetmessages: + msg305265
2017-10-25 22:21:52pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request4091
2017-10-25 15:56:06vstinnersetfiles: + async_parser_crash.py
nosy: + vstinner
messages: + msg304995

2017-10-25 15:49:50vstinnersetnosy: + yselivanov
2017-10-25 10:16:39pablogsalsetnosy: + pablogsal
messages: + msg304975
2017-10-23 19:22:17Alexandre Hamelincreate