Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow when parse long expression to AST #76939

Closed
serhiy-storchaka opened this issue Feb 3, 2018 · 26 comments
Closed

Stack overflow when parse long expression to AST #76939

serhiy-storchaka opened this issue Feb 3, 2018 · 26 comments
Labels
docs Documentation in the Doc dir easy type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

BPO 32758
Nosy @brettcannon, @terryjreedy, @ncoghlan, @benjaminp, @serhiy-storchaka, @Mariatta, @csabella, @tirkarthi
PRs
  • bpo-32758: warn that a couple ast functions can crash the interpreter #5960
  • [3.7] bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) #6041
  • [3.6] bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) #6042
  • bpo-32758: Warn that compile() can crash when compiling to an AST object #6043
  • [3.7] bpo-32758: Warn that compile() can crash when compiling to an AST object (GH-6043) #6045
  • [3.6] bpo-32758: Warn that compile() can crash when compiling to an AST object (GH-6043) #6046
  • bpo-32758: Warn that dbm.dumb.open() can crash Python #6047
  • [3.7] bpo-32758: Warn that dbm.dumb.open() can crash Python (GH-6047) #6048
  • [2.7] bpo-32758: warn that a couple ast functions can crash the interpreter (GH-5960) #16565
  • [2.7] bpo-32758: Warn that compile() can crash when compiling to an AST object (GH-6043) #16566
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-18.08:02:07.842>
    created_at = <Date 2018-02-03.18:22:51.069>
    labels = ['easy', 'type-crash', 'docs']
    title = 'Stack overflow when parse long expression to AST'
    updated_at = <Date 2019-10-18.08:02:07.841>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2019-10-18.08:02:07.841>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-18.08:02:07.842>
    closer = 'serhiy.storchaka'
    components = ['Documentation']
    creation = <Date 2018-02-03.18:22:51.069>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32758
    keywords = ['patch', 'easy']
    message_count = 26.0
    messages = ['311568', '311570', '311920', '311922', '311945', '311978', '311993', '312009', '312030', '312701', '313165', '313185', '313192', '313495', '313500', '313501', '313502', '313503', '313506', '313508', '313509', '313510', '313511', '336476', '354869', '354870']
    nosy_count = 9.0
    nosy_names = ['brett.cannon', 'terry.reedy', 'ncoghlan', 'benjamin.peterson', 'docs@python', 'serhiy.storchaka', 'Mariatta', 'cheryl.sabella', 'xtreak']
    pr_nums = ['5960', '6041', '6042', '6043', '6045', '6046', '6047', '6048', '16565', '16566']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue32758'
    versions = ['Python 2.7']

    @serhiy-storchaka
    Copy link
    Member Author

    Python 2 can crash when compile long expression.

    >>> x = eval('""' + '+chr(33)'*100000)
    Segmentation fault (core dumped)

    This was fixed in Python 3. RecursionError is raised now.

    >>> x = eval('""' + '+chr(33)'*100000)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded during compilation
    >>> x = eval('+chr(33)'*1000000)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded during compilation

    But compiling to AST still can crash.

    >>> import ast
    >>> x = ast.parse('+chr(33)'*1000000)
    Segmentation fault (core dumped)

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.7 (EOL) end of life 3.8 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 3, 2018
    @serhiy-storchaka
    Copy link
    Member Author

    There is also a regression in 3.7 when compile a long expression.

    In 3.6:

    >>> compile('+a'*1000000, '?', 'eval')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded during compilation

    In 3.7:

    >>> compile('+a'*1000000, '?', 'eval')
    Segmentation fault (core dumped)

    @terryjreedy
    Copy link
    Member

    Experimenting on Windows with IDLE's current 3.7 Shell, where a user process crash restarts Shell, compile('+a'*31365, '?', 'eval') consistently gives RecursionError, values a bit larger sometimes crash, and values much larger (32000, at max) consistently crash. Perhaps it is not just a coincidence that the regressed lower limit for crashing is near 2**15.

    @terryjreedy
    Copy link
    Member

    Ditto as to the limit for ast.parse.

    >>> import ast; ast.parse('+chr(33)'*32000)
    	    
    ============================ RESTART: Shell =========================
    >>> import ast; ast.parse('+chr(33)'*31000)
    	    
    <_ast.Module object at 0x000001E7920F34A8>

    I see the same in 3.6 and 3.5, so this is not a 3.7 regression. Rather, was the ast.parse behavior somehow copied into compile for 3.7, perhaps by 'simplifying' code?

    @serhiy-storchaka
    Copy link
    Member Author

    On Linux the limit is much larger that 2**15. This depends on the stack size, it is smaller on Windows.

    The stack is overflowed by recursive call of ast2obj_expr in Python/Python-ast.c. The same problem exists in other recursive AST processing code: optimizing, unparsing. This is why 3.7 can crash in cases when 3.6 was not crashed.

    I don't see an easy way of fixing this. The common way is surrounding recursive calls with Py_EnterRecursiveCall()/Py_LeaveRecursiveCall(). But this make the limit too small (1000 by default). In other cases it is enough. The data structures with 1000 levels of nesting are rare. In any case the Python compiler has lower limit on nesting indented blocks. But in this case the linear sequence of binary operators is compiled to deeply nested structure.

    I see two hard ways of properly fixing this.

    1. Rewrite all AST expression processing code with using an explicit stack instead of function stack. This will complicate a lot of C code too much. Python code should be rewritten too if you don't want to get a RecursionError in corner cases, but for common cases it can be left unmodified.

    2. Change binary operations representation, so that a+b+c+d will be represented as ('+', (a, b, c, d)) instead of ('+', ('+', ('+', a, b), c), d). This is backward incompatible change and needs to rewrite any AST processing code (in C and Python). This solution can't be backported. But the resulting code will be simpler than when rewrite them for the first approach.

    @brettcannon
    Copy link
    Member

    The other option is to simply not worry about it and acknowledge you can
    crash the compiler with a crazy-sized expression. Option 1 is too much work
    and option 2 takes us from an AST to more of an s-expression format which
    is a significant shift in design (now if people want to make that kind of
    change that's fine, but it should be consistent across the board).

    On Sat, Feb 10, 2018, 14:30 Serhiy Storchaka, <report@bugs.python.org>
    wrote:

    Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

    On Linux the limit is much larger that 2**15. This depends on the stack
    size, it is smaller on Windows.

    The stack is overflowed by recursive call of ast2obj_expr in
    Python/Python-ast.c. The same problem exists in other recursive AST
    processing code: optimizing, unparsing. This is why 3.7 can crash in cases
    when 3.6 was not crashed.

    I don't see an easy way of fixing this. The common way is surrounding
    recursive calls with Py_EnterRecursiveCall()/Py_LeaveRecursiveCall(). But
    this make the limit too small (1000 by default). In other cases it is
    enough. The data structures with 1000 levels of nesting are rare. In any
    case the Python compiler has lower limit on nesting indented blocks. But in
    this case the linear sequence of binary operators is compiled to deeply
    nested structure.

    I see two hard ways of properly fixing this.

    1. Rewrite all AST expression processing code with using an explicit stack
      instead of function stack. This will complicate a lot of C code too much.
      Python code should be rewritten too if you don't want to get a
      RecursionError in corner cases, but for common cases it can be left
      unmodified.

    2. Change binary operations representation, so that a+b+c+d will be
      represented as ('+', (a, b, c, d)) instead of ('+', ('+', ('+', a, b), c),
      d). This is backward incompatible change and needs to rewrite any AST
      processing code (in C and Python). This solution can't be backported. But
      the resulting code will be simpler than when rewrite them for the first
      approach.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue32758\>


    @serhiy-storchaka
    Copy link
    Member Author

    The other option is to simply not worry about it and acknowledge you can
    crash the compiler with a crazy-sized expression.

    Agreed, this is the most practical option. Do you want to write such acknowledgement?

    @terryjreedy
    Copy link
    Member

    If ast_parse returns, a correct tree (A) rather than a buggy tree is a hard requirement. If ast_parse does not return, an exception (B) rather than a crash is strongly desired. We should not risk A to get B. I presume that Serhiy is suggesting that option 1 either has such a risk or would consume developer resources that might be better spent on other improvements.

    For Python, option 2, seems pretty useless for real code because there are much better ways to sum a long sequence: sum(iterable_of_numbers), seq.append, ''.join(iterable_of_string).

    Possible addition to the ast.parse entry.
    "Syntactically correct but extremely long or complex source strings may result in a RecursionError or program crash."

    @brettcannon
    Copy link
    Member

    On Sun, Feb 11, 2018, 16:26 Serhiy Storchaka, <report@bugs.python.org>
    wrote:

    Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

    > The other option is to simply not worry about it and acknowledge you can
    > crash the compiler with a crazy-sized expression.

    Agreed, this is the most practical option. Do you want to write such
    acknowledgement?

    I think adding a test to test_crashers and an appropriate comment pointing
    to this issue is sufficient.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue32758\>


    @serhiy-storchaka serhiy-storchaka added docs Documentation in the Doc dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 12, 2018
    @serhiy-storchaka
    Copy link
    Member Author

    A consequence of this is that ast.literal_eval() can crash.

    >>> import ast
    >>> ast.literal_eval("+0"*200000)
    Segmentation fault (core dumped)

    It should be documented that ast.literal_eval() is not safe.

    @brettcannon
    Copy link
    Member

    The PR adds the documentation warnings. Serhiy, can you double-check that I have the appropriate functions and the comment is acceptable?

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you Brett! The comment LGTM.

    Is it worth to add warnings to other functions?

    • compile(), exec() and eval(). They are crashed due to recursion in the AST optimizer. This is a regression of 3.7. compile(..., PyCF_ONLY_AST) is the same as ast.parse() and crashed in older versions.

    • dbm.dumb.open(). It calls ast.literal_eval(). The dbm.dumb databases are considered slow but portable. Before bpo-22885 this function was even more vulnerable due to using eval(). Since changing it to ast.literal_eval() some developers could consider it safe, but this is not true.

    • A number of functions in the inspect module which directly or indirectly call ast.parse() on the __text_signature__ attribute. The risk of this vulnerability is very low.

    @brettcannon
    Copy link
    Member

    You're probably right and it's worth propagating the warning a bit wider.

    @brettcannon brettcannon assigned brettcannon and unassigned docspython Mar 3, 2018
    @brettcannon
    Copy link
    Member

    New changeset 7a7f100 by Brett Cannon in branch 'master':
    bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960)
    7a7f100

    @brettcannon
    Copy link
    Member

    New changeset b316c44 by Brett Cannon (Miss Islington (bot)) in branch '3.6':
    bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) (GH-6042)
    b316c44

    @brettcannon
    Copy link
    Member

    New changeset f2fffd4 by Brett Cannon (Miss Islington (bot)) in branch '3.7':
    bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) (GH-6041)
    f2fffd4

    @brettcannon
    Copy link
    Member

    I have the changes in for Python 3 for the ast module.

    Updated TODO list:

    • ast module
    • compile()
    • eval() for >= 3.7
    • exec() for >= 3.7
    • dbm.dumb.open()
    • inspect

    @brettcannon
    Copy link
    Member

    Actually, the TODO list is:

    @serhiy-storchaka
    Copy link
    Member Author

    I think we can ignore the inspect module. It is unlikely that it will cause a crash unintentionally, and it is hard to use this for attacks. The attacker needs to create an extension function with malicious __text_signature__, but if he is able to execute arbitrary binary code, there is a much larger problem.

    And perhaps there is no need to repeat the warning for exec() and eval(). They are considered more dangerous than compile().

    @brettcannon
    Copy link
    Member

    Thanks for the feedback, Serhiy! Based on that, the new TODO list is:

    @serhiy-storchaka
    Copy link
    Member Author

    In 3.7 compile() can crash not only when compiling to an AST object (due to recursive AST optimization).

    @brettcannon
    Copy link
    Member

    @serhiy: Correct, which is what the warning says: https://github.com/python/cpython/pull/6043/files .

    @brettcannon
    Copy link
    Member

    At this point there is only Python 2 stuff and I'm not bothered enough to see those changed, so I will leave this open for anyone who wants to put in the effort to backport the warnings.

    @brettcannon brettcannon added easy and removed 3.7 (EOL) end of life 3.8 only security fixes labels Mar 10, 2018
    @brettcannon brettcannon removed their assignment Mar 10, 2018
    @tirkarthi
    Copy link
    Member

    This seems like an easy issue that the warnings in docs for ast need to be manually backported to 2.7 since miss Islington cannot cherry pick two PRs for 2.7 in https://bugs.python.org/issue32758#msg313511 .

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 8eb27cc by Serhiy Storchaka (Ashley Whetter) in branch '2.7':
    bpo-32758: Warn that compile() can crash when compiling to an AST object (GH-6043) (GH-16566)
    8eb27cc

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset dedb99a by Serhiy Storchaka (Ashley Whetter) in branch '2.7':
    bpo-32758: Warn that ast.parse() and ast.literal_eval() can segfault the interpreter (GH-5960) (GH-16565)
    dedb99a

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir easy type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants