classification
Title: Improve error reporting involving f-strings (PEP 498)
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: 12458 Superseder:
Assigned To: Nosy List: Chi Hsuan Yen, Claudiu.Popa, Jim Fasarakis-Hilliard, Mark.Shannon, eric.smith, levkivskyi, martin.panter, mbdevpl, r.david.murray
Priority: normal Keywords:

Created on 2016-12-23 12:01 by Chi Hsuan Yen, last changed 2017-03-15 06:33 by mbdevpl.

Messages (5)
msg283874 - (view) Author: Chi Hsuan Yen (Chi Hsuan Yen) * Date: 2016-12-23 12:01
Here are the two examples I found confusing when playing with f-strings. The first one involves with a NameError:

$ cat test2
f'''
{
FOO
}
'''

$ python3.7m test2
Traceback (most recent call last):
  File "test2", line 5, in <module>
    '''
NameError: name 'FOO' is not defined


It would be better if the error reporter points to the actual line of the error:

$ python3.7m test2
Traceback (most recent call last):
  File "test2", line 3, in <module>
    FOO
NameError: name 'FOO' is not defined

The second one involves a SyntaxError:

$ cat test2 
f'''
{
a b c
}
'''

$ python3.7m test2
  File "<fstring>", line 2
    a b c
      ^
SyntaxError: invalid syntax

It would be better if the line number is relative to the file instead of the expression in f-strings:

$ python3.7m test2
  File "test2", line 3
    a b c
      ^
SyntaxError: invalid syntax

By the way, external checkers like pyflakes also suffers. They rely on ASTs. Nodes in f-strings have their lineno relative to the {...} expression instead of the whole code string. For example:

import ast

code = '''
f'{LOL}'
'''

for node in ast.walk(ast.parse(code, "<stdin>", "exec")):
    if isinstance(node, ast.Name):
        print(node.lineno)


Prints 1 instead of 2.

Another by the way, ruby reports correct line numbers:

$ cat test3 
"
#{
FOO
}
"

$ ruby test3 
test3:3:in `<main>': uninitialized constant FOO (NameError)


$ cat test3 
"
#{
@@
}
"

$ ruby test3
test3:3: `@@' without identifiers is not allowed as a class variable name
test3:3: syntax error, unexpected end-of-input


Added the author and the primary reviewer of issue24965.
msg283878 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-12-23 13:55
These are not problems with f-strings in particular, they are problems in general with the way python parsing and error reporting happens.  The second is presumably (I haven't gotten around to understanding how f-strings work under the hood) an example of error reporting from a separately evaled string.

Improvements in this area are certainly welcome.  There is an open issue relevant to your first example, issue 12458.  I'm sure that f-strings complicate the solution at least slightly, but I think there are more fundamental pre-requisites to be addressed first in solving it.
msg285896 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-01-20 10:20
This problem is the parsing of f-strings. 

The expressions in an f-string are not "eval"ed in the sense of the eval() function. They are evaluated exactly the same as any other Python expression. However the parsing of f-strings does not provide correct line numbers.

This problem also manifests itself in the ast and tokenize modules.

>>> m = ast.parse("""f'''
... {
... FOO
... }
... '''
... """)

>>> m.body[0].value.values[1].value.id
'FOO'
>>> m.body[0].value.values[1].value.lineno
2

That 2 should be a 3, and yet
eval(compile(m, "test2", "exec"))
  File "<stdin>", line 1, in <module>
  File "test2", line 5, in <module>
NameError: name 'FOO' is not defined

gives line 5 for the error, so not only are the line numbers wrong they are inconsistent.

The problem is that the internals of the f-string are not tokenized and parsed using the normal mechanism, but in an ad-hoc fashion in Python-ast.c as demonstrated when tokenizing the source

$ python3.6 -m tokenize test2
0,0-0,0:            ENCODING       'utf-8'        
1,0-5,3:            STRING         "f'''\n{\nFOO\n}\n'''"
5,3-5,4:            NEWLINE        '\n'           
6,0-6,0:            ENDMARKER      ''

The f-string could should be tokenized as something like:
FSTRING_START f'''
STRING_PART \n 
LEFT_BRACE {
NEWLINE
IDENTIFIER FOO
NEWLINE
RIGHT_BRACE }
STRING_PART \n
FSTRING_END '''

Although this would complicate the tokenizer, it would mean that the internals of f-strings could be made explicit in the grammar, and that the compiler could generate correct offsets.
msg285898 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2017-01-20 10:23
It is also worth mentioning that incorrect line numbers means that tools like pyflakes, pylint, mypy, lgtm, etc, need to reimplement parsing of f-strings.
msg289501 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2017-03-12 14:03
I'm adding another example here, where the lineno reporting is wrong:

  from ast import parse 
  n = parse('''

  def test():
     return f"{a}"
  ''')
  f = n.body[0].body[0].value.values[0]
  n = f.value
  print("name lineno", n.lineno)

In this example, the line number of the f-string inner variable is 1, while it should be 3.
As Mark Shannon said, this bug is affecting tools such as pyflakes and pylint.
History
Date User Action Args
2017-03-15 06:33:45mbdevplsetnosy: + mbdevpl
2017-03-14 17:26:10Jim Fasarakis-Hilliardsetnosy: + Jim Fasarakis-Hilliard
2017-03-12 14:03:38Claudiu.Popasetnosy: + Claudiu.Popa
messages: + msg289501
2017-01-20 10:23:25Mark.Shannonsetmessages: + msg285898
2017-01-20 10:20:19Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg285896
2016-12-23 22:47:57levkivskyisetnosy: + levkivskyi
2016-12-23 13:59:33serhiy.storchakasetdependencies: + Tracebacks should contain the first line of continuation lines
2016-12-23 13:55:37r.david.murraysetnosy: + r.david.murray
messages: + msg283878
2016-12-23 12:01:01Chi Hsuan Yencreate