classification
Title: Stack overflow when parse long expression to AST
Type: crash Stage: patch review
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, brett.cannon, docs@python, ncoghlan, serhiy.storchaka, terry.reedy
Priority: normal Keywords: easy

Created on 2018-02-03 18:22 by serhiy.storchaka, last changed 2018-03-10 00:14 by brett.cannon.

Pull Requests
URL Status Linked Edit
PR 5960 merged brett.cannon, 2018-03-02 21:30
PR 6041 merged miss-islington, 2018-03-09 20:04
PR 6042 merged miss-islington, 2018-03-09 20:06
PR 6043 merged brett.cannon, 2018-03-09 20:53
PR 6045 merged miss-islington, 2018-03-09 21:19
PR 6046 merged miss-islington, 2018-03-09 21:19
PR 6047 merged brett.cannon, 2018-03-09 23:53
PR 6048 open miss-islington, 2018-03-09 23:59
Messages (23)
msg311568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-03 18:22
Python 2 can crash when compile long expression. 

>>> x = eval('""' + '+chr(33)'*100000)
Segmentation fault (core dumped)

This was fixed in Python 3. RecursionError is raised now.

>>> x = eval('""' + '+chr(33)'*100000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded during compilation
>>> x = eval('+chr(33)'*1000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded during compilation

But compiling to AST still can crash.

>>> import ast
>>> x = ast.parse('+chr(33)'*1000000)
Segmentation fault (core dumped)
msg311570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-03 18:29
There is also a regression in 3.7 when compile a long expression.

In 3.6:

>>> compile('+a'*1000000, '?', 'eval')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded during compilation

In 3.7:

>>> compile('+a'*1000000, '?', 'eval')
Segmentation fault (core dumped)
msg311920 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-02-10 00:22
Experimenting on Windows with IDLE's current 3.7 Shell, where a user process crash restarts Shell, compile('+a'*31365, '?', 'eval') consistently gives RecursionError, values a bit larger sometimes crash, and values much larger (32000, at max) consistently crash.  Perhaps it is not just a coincidence that the regressed lower limit for crashing is near 2**15.
msg311922 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-02-10 00:30
Ditto as to the limit for ast.parse.

>>> import ast; ast.parse('+chr(33)'*32000)
	    
============================ RESTART: Shell =========================
>>> import ast; ast.parse('+chr(33)'*31000)
	    
<_ast.Module object at 0x000001E7920F34A8>

I see the same in 3.6 and 3.5, so this is not a 3.7 regression.  Rather, was the ast.parse behavior somehow copied into compile for 3.7, perhaps by 'simplifying' code?
msg311945 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-10 09:00
On Linux the limit is much larger that 2**15. This depends on the stack size, it is smaller on Windows.

The stack is overflowed by recursive call of ast2obj_expr in Python/Python-ast.c. The same problem exists in other recursive AST processing code: optimizing, unparsing. This is why 3.7 can crash in cases when 3.6 was not crashed.

I don't see an easy way of fixing this. The common way is surrounding recursive calls with Py_EnterRecursiveCall()/Py_LeaveRecursiveCall(). But this make the limit too small (1000 by default). In other cases it is enough. The data structures with 1000 levels of nesting are rare. In any case the Python compiler has lower limit on nesting indented blocks. But in this case the linear sequence of binary operators is compiled to deeply nested structure.

I see two hard ways of properly fixing this.

1. Rewrite all AST expression processing code with using an explicit stack instead of function stack. This will complicate a lot of C code too much. Python code should be rewritten too if you don't want to get a RecursionError in corner cases, but for common cases it can be left unmodified.

2. Change binary operations representation, so that a+b+c+d will be represented as ('+', (a, b, c, d)) instead of ('+', ('+', ('+', a, b), c), d). This is backward incompatible change and needs to rewrite any AST processing code (in C and Python). This solution can't be backported. But the resulting code will be simpler than when rewrite them for the first approach.
msg311978 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-02-11 03:26
The other option is to simply not worry about it and acknowledge you can
crash the compiler with a crazy-sized expression. Option 1 is too much work
and option 2 takes us from an AST to more of an s-expression format which
is a significant shift in design (now if people want to make that kind of
change that's fine, but it should be consistent across the board).

On Sat, Feb 10, 2018, 14:30 Serhiy Storchaka, <report@bugs.python.org>
wrote:

>
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
>
> On Linux the limit is much larger that 2**15. This depends on the stack
> size, it is smaller on Windows.
>
> The stack is overflowed by recursive call of ast2obj_expr in
> Python/Python-ast.c. The same problem exists in other recursive AST
> processing code: optimizing, unparsing. This is why 3.7 can crash in cases
> when 3.6 was not crashed.
>
> I don't see an easy way of fixing this. The common way is surrounding
> recursive calls with Py_EnterRecursiveCall()/Py_LeaveRecursiveCall(). But
> this make the limit too small (1000 by default). In other cases it is
> enough. The data structures with 1000 levels of nesting are rare. In any
> case the Python compiler has lower limit on nesting indented blocks. But in
> this case the linear sequence of binary operators is compiled to deeply
> nested structure.
>
> I see two hard ways of properly fixing this.
>
> 1. Rewrite all AST expression processing code with using an explicit stack
> instead of function stack. This will complicate a lot of C code too much.
> Python code should be rewritten too if you don't want to get a
> RecursionError in corner cases, but for common cases it can be left
> unmodified.
>
> 2. Change binary operations representation, so that a+b+c+d will be
> represented as ('+', (a, b, c, d)) instead of ('+', ('+', ('+', a, b), c),
> d). This is backward incompatible change and needs to rewrite any AST
> processing code (in C and Python). This solution can't be backported. But
> the resulting code will be simpler than when rewrite them for the first
> approach.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue32758>
> _______________________________________
>
msg311993 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-11 10:56
> The other option is to simply not worry about it and acknowledge you can
> crash the compiler with a crazy-sized expression.

Agreed, this is the most practical option. Do you want to write such acknowledgement?
msg312009 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-02-11 19:25
If ast_parse returns, a correct tree (A) rather than a buggy tree is a hard requirement.  If ast_parse does not return,  an exception (B) rather than a crash is strongly desired.  We should not risk A to get B.  I presume that Serhiy is suggesting that option 1 either has such a risk or would consume developer resources that might be better spent on other improvements.

For Python, option 2, seems pretty useless for real code because there are much better ways to sum a long sequence: sum(iterable_of_numbers),  seq.append, ''.join(iterable_of_string).

Possible addition to the ast.parse entry.
"Syntactically correct but extremely long or complex source strings may result in a RecursionError or program crash."
msg312030 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-02-12 05:30
On Sun, Feb 11, 2018, 16:26 Serhiy Storchaka, <report@bugs.python.org>
wrote:

>
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
>
> > The other option is to simply not worry about it and acknowledge you can
> > crash the compiler with a crazy-sized expression.
>
> Agreed, this is the most practical option. Do you want to write such
> acknowledgement?
>

I think adding a test to test_crashers and an appropriate comment pointing
to this issue is sufficient.

> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue32758>
> _______________________________________
>
msg312701 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-24 07:04
A consequence of this is that ast.literal_eval() can crash.

>>> import ast
>>> ast.literal_eval("+0"*200000)
Segmentation fault (core dumped)

It should be documented that ast.literal_eval() is not safe.
msg313165 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-02 22:09
The PR adds the documentation warnings. Serhiy, can you double-check that I have the appropriate functions and the comment is acceptable?
msg313185 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-03 11:55
Thank you Brett! The comment LGTM.

Is it worth to add warnings to other functions?

* compile(), exec() and eval(). They are crashed due to recursion in the AST optimizer. This is a regression of 3.7. compile(..., PyCF_ONLY_AST) is the same as ast.parse() and crashed in older versions.

* dbm.dumb.open(). It calls ast.literal_eval(). The dbm.dumb databases are considered slow but portable. Before issue22885 this function was even more vulnerable due to using eval(). Since changing it to ast.literal_eval() some developers could consider it safe, but this is not true.

* A number of functions in the inspect module which directly or indirectly call ast.parse() on the __text_signature__ attribute. The risk of this vulnerability is very low.
msg313192 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-03 18:59
You're probably right and it's worth propagating the warning a bit wider.
msg313495 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 20:03
New changeset 7a7f100eb352d08938ee0f5ba59c18f56dc4a7b5 by Brett Cannon in branch 'master':
bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960)
https://github.com/python/cpython/commit/7a7f100eb352d08938ee0f5ba59c18f56dc4a7b5
msg313500 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 20:35
New changeset b316c44b0105d11a80ff971636143735f3655bbf by Brett Cannon (Miss Islington (bot)) in branch '3.6':
bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) (GH-6042)
https://github.com/python/cpython/commit/b316c44b0105d11a80ff971636143735f3655bbf
msg313501 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 20:35
New changeset f2fffd41b42d88fe36b483852ae33d5a415b7082 by Brett Cannon (Miss Islington (bot)) in branch '3.7':
bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) (GH-6041)
https://github.com/python/cpython/commit/f2fffd41b42d88fe36b483852ae33d5a415b7082
msg313502 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 20:42
I have the changes in for Python 3 for the ast module.

Updated TODO list:

- [x] ast module
- [ ] compile()
- [ ] eval() for >= 3.7
- [ ] exec() for >= 3.7
- [ ] dbm.dumb.open()
- [ ] inspect
msg313503 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 20:43
Actually, the TODO list is:

- [x] ast module for Python 3
- [ ] compile()
- [ ] eval() for >= 3.7
- [ ] exec() for >= 3.7
- [ ] dbm.dumb.open()
- [ ] inspect
- [ ] ast module for Python 2 (see https://github.com/python/cpython/pull/5960)
msg313506 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-09 21:19
I think we can ignore the inspect module. It is unlikely that it will cause a crash unintentionally, and it is hard to use this for attacks. The attacker needs to create an extension function with malicious __text_signature__, but if he is able to execute arbitrary binary code, there is a much larger problem.

And perhaps there is no need to repeat the warning for exec() and eval(). They are considered more dangerous than compile().
msg313508 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 21:42
Thanks for the feedback, Serhiy! Based on that, the new TODO list is:

- [x] ast module for Python 3
- [x] compile() for Python 3
- [ ] dbm.dumb.open()
- [ ] ast module for Python 2 (see https://github.com/python/cpython/pull/5960)
- [ ] compile() for Python 2 (see https://github.com/python/cpython/pull/6043)
msg313509 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-09 21:55
In 3.7 compile() can crash not only when compiling to an AST object (due to recursive AST optimization).
msg313510 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-09 23:17
@Serhiy: Correct, which is what the warning says: https://github.com/python/cpython/pull/6043/files .
msg313511 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-10 00:14
- [x] ast module for Python 3
- [x] compile() for Python 3
- [x] dbm.dumb.open()
- [ ] ast module for Python 2 (see https://github.com/python/cpython/pull/5960)
- [ ] compile() for Python 2 (see https://github.com/python/cpython/pull/6043)

At this point there is only Python 2 stuff and I'm not bothered enough to see those changed, so I will leave this open for anyone who wants to put in the effort to backport the warnings.
History
Date User Action Args
2018-03-10 00:14:06brett.cannonsetkeywords: + easy, - patch
assignee: brett.cannon ->
messages: + msg313511

versions: - Python 3.6, Python 3.7, Python 3.8
2018-03-09 23:59:26miss-islingtonsetpull_requests: + pull_request5809
2018-03-09 23:53:07brett.cannonsetpull_requests: + pull_request5808
2018-03-09 23:17:10brett.cannonsetmessages: + msg313510
2018-03-09 21:55:26yselivanovsetnosy: - yselivanov
2018-03-09 21:55:01serhiy.storchakasetmessages: + msg313509
2018-03-09 21:42:35brett.cannonsetmessages: + msg313508
2018-03-09 21:19:56serhiy.storchakasetmessages: + msg313506
2018-03-09 21:19:47miss-islingtonsetpull_requests: + pull_request5807
2018-03-09 21:19:40miss-islingtonsetpull_requests: + pull_request5806
2018-03-09 20:53:30brett.cannonsetpull_requests: + pull_request5804
2018-03-09 20:43:53brett.cannonsetmessages: + msg313503
2018-03-09 20:42:59brett.cannonsetmessages: + msg313502
2018-03-09 20:35:44brett.cannonsetmessages: + msg313501
2018-03-09 20:35:23brett.cannonsetmessages: + msg313500
2018-03-09 20:06:30miss-islingtonsetpull_requests: + pull_request5803
2018-03-09 20:04:31miss-islingtonsetpull_requests: + pull_request5802
2018-03-09 20:03:25brett.cannonsetmessages: + msg313495
2018-03-03 18:59:41brett.cannonsetassignee: docs@python -> brett.cannon
messages: + msg313192
2018-03-03 11:55:24serhiy.storchakasetmessages: + msg313185
versions: + Python 2.7
2018-03-02 22:09:50brett.cannonsetmessages: + msg313165
2018-03-02 21:30:38brett.cannonsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request5728
2018-02-24 07:04:15serhiy.storchakasetmessages: + msg312701
2018-02-12 08:02:43serhiy.storchakasetassignee: docs@python

nosy: + docs@python
components: + Documentation, - Interpreter Core
stage: needs patch
2018-02-12 05:30:45brett.cannonsetmessages: + msg312030
2018-02-11 19:25:22terry.reedysetmessages: + msg312009
2018-02-11 10:56:51serhiy.storchakasetmessages: + msg311993
2018-02-11 03:26:42brett.cannonsetmessages: + msg311978
2018-02-10 09:00:06serhiy.storchakasetmessages: + msg311945
2018-02-10 00:30:44terry.reedysetmessages: + msg311922
2018-02-10 00:22:41terry.reedysetnosy: + terry.reedy
messages: + msg311920
2018-02-03 18:29:47serhiy.storchakasetmessages: + msg311570
2018-02-03 18:22:51serhiy.storchakacreate