ast.literal_eval supports non-literals in Python 3 #75959

DavidBieber · 2017-10-12T22:31:18Z

BPO	31778
Nosy	@brettcannon, @nascheme, @rhettinger, @terryjreedy, @ncoghlan, @benjaminp, @bitdancer, @serhiy-storchaka
PRs	bpo-31778: Make ast.literal_eval() more strict. #4035

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-01-04.09:27:28.780>
created_at = <Date 2017-10-12.22:31:18.436>
labels = ['type-feature', '3.7', 'docs']
title = 'ast.literal_eval supports non-literals in Python 3'
updated_at = <Date 2018-01-04.09:27:28.779>
user = 'https://bugs.python.org/DavidBieber'

bugs.python.org fields:

activity = <Date 2018-01-04.09:27:28.779>
actor = 'serhiy.storchaka'
assignee = 'docs@python'
closed = True
closed_date = <Date 2018-01-04.09:27:28.780>
closer = 'serhiy.storchaka'
components = ['Documentation']
creation = <Date 2017-10-12.22:31:18.436>
creator = 'David Bieber'
dependencies = []
files = []
hgrepos = []
issue_num = 31778
keywords = ['patch']
message_count = 11.0
messages = ['304294', '304367', '304374', '304379', '304400', '304573', '304574', '304580', '305973', '305987', '309460']
nosy_count = 10.0
nosy_names = ['brett.cannon', 'nascheme', 'rhettinger', 'terry.reedy', 'ncoghlan', 'benjamin.peterson', 'r.david.murray', 'docs@python', 'serhiy.storchaka', 'David Bieber']
pr_nums = ['4035']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue31778'
versions = ['Python 3.7']

DavidBieber · 2017-10-12T22:31:18Z

# Overview
ast.literal_eval supports some non-literals in Python 3.
The behavior of this function runs contrary to the documented behavior.

# The Issue
The documentation says of the function "It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing."

However, literal_eval is capable of evaluating expressions with certain operators, particular the operators "+" and "-".

As has been explained previously, the reason for this is to support "complex" literals such as 2+3j. However, this has unintended consequences which I believe to be indicative of a bug. Examples of the unintended consequences include ast.literal_eval('1+1') == 2 ast.literal_eval('2017-10-10') == 1997. I would expect each of these calls to literal_eval to throw a ValueError, as the input string is not a literal. Instead, literal_eval successfully evaluates the input string, in the latter case giving an unexpected result (since the intent of the string is to represent a 21st century date.)

Since issue arose as a Python Fire issue, where the behavior of Python Fire was unexpected for inputs such as those described above (1+1 and 2017-10-10) only in Python 3. For context, Python Fire is a CLI library which uses literal_eval as part of its command line argument parsing procedure.

I think the resolution to this issue is having literal_eval raise a ValueError if the ast of the input represents anything other than a Python literal, as described in the documentation. That is, "The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None." Additional operations, such as the binary operations "+" and "-", unless they explicitly create a complex number, should produce a ValueError.

If that resolution is not the direction we take, I also would appreciate knowing if there is another built in approach for determining if a string or ast node represents a literal.

# Reproducing
The following code snippets produce different behaviors in Python 2 and Python 3.

import ast
ast.literal_eval('1+1')

import ast
ast.literal_eval('2017-10-10')

# References

The Python Fire issue is here: Only plain literals should be parsed google/python-fire#97
Prior discussion of a similar issue: https://bugs.python.org/issue22525
I think is where complex number support was originally added: https://bugs.python.org/issue4907
In this thread, https://bugs.python.org/issue24146, one commenter explains that literal_eval's support for "2+3" is an unintentional side effect of supporting complex literals.

terryjreedy · 2017-10-13T22:03:44Z

It has been some time since literal_eval literally only evaluated literals. 'constant_eval' might be a better name now, with the proviso of 'safely, in reasonable time'.

>>> from ast import literal_eval as le
>>> le('(1,2,3)')
(1, 2, 3)
>>> le('(1,2, (3,4))')
(1, 2, (3, 4))

I believe there was once a time when a simple tuple would be evaluated, while a nested one would not be.

"It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing." I do not read this as prohibiting all operators, but rather that now all will be accepted.

>>> le(2**2)
...
ValueError: malformed node or string: 4

Exponentiation of ints can take exponential time and can be used for denial of service attacks.

>>> le('2017-10-10')
1997

This is correct. For '2017-10-10' to be a string representing a date, it must be quoted as a string in the code.

>>> le("'2017-10-10'")
'2017-10-10'

Rolling back previous enhancements would break existing code, so a deprecation period would be required. But I would be inclined to instead update the doc to match the updated code better. Lets see what others think.

ncoghlan · 2017-10-14T05:05:40Z

I'm marking this as documentation issue for now, as the operators that literal_eval allows are solely those where constant folding support is needed to correctly handle complex and negative numbers (as noted in the original post):

>>> dis.dis("+1")
  1           0 LOAD_CONST               1 (1)
              2 RETURN_VALUE
>>> dis.dis("-1")
  1           0 LOAD_CONST               1 (-1)
              2 RETURN_VALUE
>>> dis.dis("1+1")
  1           0 LOAD_CONST               1 (2)
              2 RETURN_VALUE
>>> dis.dis("1+1j")
  1           0 LOAD_CONST               2 ((1+1j))
              2 RETURN_VALUE
>>> dis.dis("2017-10-10")
  1           0 LOAD_CONST               3 (1997)
              2 RETURN_VALUE

So the key promise that literal_eval makes is that it will not permit arbitrary code execution, but the docs should make it clearer that it *does* permit constant folding for addition and subtraction in order to handle the full range of numeric literals.

If folks want to ensure that the input string *doesn't* include a binary operator, then that currently needs to be checked separately with ast.parse:

>>> type(ast.parse("2+3").body[0].value) is ast.BinOp
True
>>> type(ast.parse("2017-10-10").body[0].value) is ast.BinOp
True

For 3.7+, that check could potentially be encapsulated as an "allow_folding=True" keyword-only parameter (where the default gives the current behaviour, while "allow_folding=False" disables processing of UnaryOp and BinOp), but the docs update is the more immediate need.

DavidBieber · 2017-10-14T06:08:20Z

# Replies

Rolling back previous enhancements would break existing code.

I sympathize completely with the need to maintain backward compatibility. And if this is the reason that this issue gets treated only as a documentation issue, rather than as a behavior issue, I can appreciate that.

If that is the case and literal_eval is not going to only evaluate literals, then for my use case I'll need a way to determine from a string whether it represents a literal. I can implement this myself using ast.parse and walking the resulting tree, looking for non-literal AST nodes. Would such an "is_literal" function would be more appropriate in the ast module than as a one-off function in Python Fire?

The key promise that literal_eval makes is that it will not permit arbitrary code execution.

I disagree that this is the only key promise, and here's my reasoning. The docstring has two sentences, and each one makes a promise:

"Safely evaluate an expression node or a string containing a Python expression."
"The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None."
(1) says that evaluation is safe -- this is the key promise that you reference. (2) is also a promise though, that only certain types are allowed. While one could argue that the behavior of the function is not specified for inputs violating that criteria, I think the clear correct thing to do is to raise a ValueError if the value doesn't meet the criteria. This is what was done in Python 2, where the docstring for literal_eval is these same two sentences (modulo the inclusion of bytes and sets). It's my opinion that Python 2's behavior better matches the docstring as well as the behavior implied by the function's name.

# Additional observations

Python 2 _does_ support parsing complex literals, but does not support treating e.g. '1+1' as a literal.
ast.literal_eval('1+1j') # works in both Python 2 and Python 3
ast.literal_eval('1j+1') # raises a ValueError in Python 2, returns 1+1j in Python 3
ast.literal_eval('1+1') # raises a ValueError in Python 2, returns 2 in Python 3
Python 3 supports parsing addition and subtraction at any level of nesting.
ast.literal_eval('(1, (0, 1+1+1))') # raises a ValueError in Python 2, returns (1, (0, 3)) in Python 3

In my opinion, Python 2's behavior is correct in these situations since it matches the documentation and only evals literals as defined in the documentation.

# Source
The relevant code in Python 2.7.3 is here. It explicitly allows NUM +/- COMPLEX, but not even COMPLEX +/- NUM. The corresponding code for Python 3 is here. You'll notice it supports adding and subtracting arbitrary numeric types (int, float, complex).

---

Thanks for your replies and for hearing me out on this issue.

serhiy-storchaka · 2017-10-14T19:43:09Z

The support of parsing addition and subtraction at any level of nesting was added by bc95973. The commit message and NEWS entry don't contain an issue number, thus the rationale of this change is not known. Raymond, could you please explain?

bitdancer · 2017-10-18T11:54:55Z

"Safely evaluate an expression node or a string containing a Python expression."

The behavior you are citing matches that documentation, as far as I can see. 1+1 is an expression involving supported literals.

serhiy-storchaka · 2017-10-18T12:01:30Z

"""
The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
"""

1+1 is not a literal number.

"""
It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.
"""

serhiy-storchaka · 2017-10-18T14:35:31Z

PR 4035 makes ast.literal_eval() more strict.

serhiy-storchaka · 2017-11-09T17:42:58Z

Ping.

nascheme · 2017-11-09T21:45:10Z

Just a comment on what I guess is the intended use of literal_eval(), i.e. taking a potentially untrusted string and turning it into a Python object. Exposing the whole of the Python parser to potential attackers would make me very worried. Parsing code for all of Python syntax is just going to be very complicated and there can easily be bugs there. Generating an AST and then walking over it to see if it is safe is also scary. The "attack surface" is too large. This is similar to the Shellshock bug. If you can trust the supplier of the string then okay but I would guess that literal_eval() is going to get used for untrusted inputs.

It would be really nice to have something like ast.literal_eval() that could be used for untrusted strings. I would implement it by writing a retricted parser. Keep it extremely simple. Validate it by heavy code reviews and extensive testing (e.g. fuzzing).

serhiy-storchaka · 2018-01-04T09:15:46Z

New changeset d8ac4d1 by Serhiy Storchaka in branch 'master':
bpo-31778: Make ast.literal_eval() more strict. (bpo-4035)
d8ac4d1

DavidBieber mannequin added 3.8 only security fixes 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Oct 12, 2017

terryjreedy added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Oct 13, 2017

ncoghlan added the docs Documentation in the Doc dir label Oct 14, 2017

ncoghlan assigned docspython Oct 14, 2017

serhiy-storchaka removed the 3.8 only security fixes label Oct 18, 2017

serhiy-storchaka closed this as completed Jan 4, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ast.literal_eval supports non-literals in Python 3 #75959

ast.literal_eval supports non-literals in Python 3 #75959

DavidBieber mannequin commented Oct 12, 2017

DavidBieber mannequin commented Oct 12, 2017

terryjreedy commented Oct 13, 2017

ncoghlan commented Oct 14, 2017

DavidBieber mannequin commented Oct 14, 2017

serhiy-storchaka commented Oct 14, 2017

bitdancer commented Oct 18, 2017

serhiy-storchaka commented Oct 18, 2017

serhiy-storchaka commented Oct 18, 2017

serhiy-storchaka commented Nov 9, 2017

nascheme commented Nov 9, 2017

serhiy-storchaka commented Jan 4, 2018

ast.literal_eval supports non-literals in Python 3 #75959

ast.literal_eval supports non-literals in Python 3 #75959

Comments

DavidBieber mannequin commented Oct 12, 2017

DavidBieber mannequin commented Oct 12, 2017

terryjreedy commented Oct 13, 2017

ncoghlan commented Oct 14, 2017

DavidBieber mannequin commented Oct 14, 2017

serhiy-storchaka commented Oct 14, 2017

bitdancer commented Oct 18, 2017

serhiy-storchaka commented Oct 18, 2017

serhiy-storchaka commented Oct 18, 2017

serhiy-storchaka commented Nov 9, 2017

nascheme commented Nov 9, 2017

serhiy-storchaka commented Jan 4, 2018