This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Ideas for making ast.literal_eval() usable
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, eamanu, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-12-29 22:22 by rhettinger, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 19899 merged BTaskaya, 2020-05-04 11:05
Messages (6)
msg359011 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-12-29 22:22
A primary goal for ast.literal_eval() is to "Safely evaluate an expression node or a string".

In the context of a real application, we need to do several things to make it possible to fulfill its design goal:

1) We should document possible exceptions that need to be caught.  So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

2) Define a size limit guaranteed not to give a MemoryError.  The smallest unsafe size I've found so far is 301 characters:

     s = '(' * 100 + '0' + ',)' * 100
     literal_eval(s)                    # Raises a MemoryError

3) Consider writing a standalone expression compiler that doesn't have the same small limits as our usual compile() function.  This would make literal_eval() usable for evaluating tainted inputs with bigger datasets. (Imagine if the json module could only be safely used with inputs under 301 characters).

4) Perhaps document an example of how we suggest that someone process tainted input:

     expr = input('Enter a dataset in Python format: ')
     if len(expr) > 300:
        error(f'Maximum supported size is 300, not {len(expr)}')
     try:
        data = literal_eval(expr)
     except (TypeError, MemoryError, SyntaxError, ValueError):
        error('Input cannot be evaluated')
msg361995 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-02-14 19:37
> 1) We should document possible exceptions that need to be caught.  So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

Maybe we should wrap all of these into something like LiteralEvalError to easily catch all of them, LiteralEvalError can be subclass of that four but I guess in some cases this change might break code.

> 2) Define a size limit guaranteed not to give a MemoryError.  The smallest unsafe size I've found so far is 301 characters:

>>> s = "(" * 101 + ")" * 101
>>> len(s)
202
>>> ast.literal_eval(s)
s_push: parser stack overflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/ast.py", line 61, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/local/lib/python3.9/ast.py", line 49, in parse
    return compile(source, filename, mode, flags,
MemoryError

> 3) Consider writing a standalone expression compiler that doesn't have the same small limits as our usual compile() function.  This would make literal_eval() usable for evaluating tainted inputs with bigger datasets. (Imagine if the json module could only be safely used with inputs under 301 characters).

Can you explain it a bit more detailed, how does this standalone expression compiler should work?
msg361998 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-02-14 21:13
> 1) We should document possible exceptions that need to be caught.  So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

Also, an addition to these errors is RecursionError
>>> t = ast.Tuple(elts=[], ctx=ast.Load())
>>> t.elts.append(t)
>>> ast.literal_eval(t)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/ast.py", line 101, in literal_eval
    return _convert(node_or_string)
  File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
    return tuple(map(_convert, node.elts))
  File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
    return tuple(map(_convert, node.elts))
  File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
    return tuple(map(_convert, node.elts))
  [Previous line repeated 496 more times]
RecursionError: maximum recursion depth exceeded
msg368042 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-04 12:24
It can also crash.

    ast.literal_eval('+0'*10**6)

The cause is that all AST handling C code (in particularly converting the AST from C to Python) is recursive, and therefore can overflow the C stack. Some recursive code has arbitrary limits which cause raising exceptions like MemoryError in the initial example, but not all code has such checks.
msg383563 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-12-22 00:15
New changeset fbc7723778be01b8f3bb72d2dcac15ab9fbb9923 by Batuhan Taskaya in branch 'master':
bpo-39159: Declare error that might be raised from literal_eval (GH-19899)
https://github.com/python/cpython/commit/fbc7723778be01b8f3bb72d2dcac15ab9fbb9923
msg383568 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-12-22 02:41
> Can you explain it a bit more detailed, 
> how does this standalone expression compiler should work?

Aim for something like JSON parser but for the supported Python constant expressions and with the existing tokenize module to feed a hand-rolled on-recursive parser
History
Date User Action Args
2022-04-11 14:59:24adminsetgithub: 83340
2020-12-22 02:41:55rhettingersetmessages: + msg383568
2020-12-22 00:15:54rhettingersetmessages: + msg383563
2020-05-04 19:10:08eamanusetnosy: + eamanu
2020-05-04 12:24:41serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg368042
2020-05-04 11:05:31BTaskayasetkeywords: + patch
stage: patch review
pull_requests: + pull_request19210
2020-02-14 21:13:46BTaskayasetmessages: + msg361998
2020-02-14 19:37:52BTaskayasetversions: + Python 3.9
nosy: + BTaskaya

messages: + msg361995

components: + Library (Lib)
type: enhancement
2019-12-29 22:22:49rhettingercreate