Message 285896 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Mark.Shannon
Recipients	Mark.Shannon, eric.smith, levkivskyi, martin.panter, r.david.murray, yan12125
Date	2017-01-20.10:20:18
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1484907619.13.0.277166701964.issue29051@psf.upfronthosting.co.za>
In-reply-to

Content
This problem is the parsing of f-strings. The expressions in an f-string are not "eval"ed in the sense of the eval() function. They are evaluated exactly the same as any other Python expression. However the parsing of f-strings does not provide correct line numbers. This problem also manifests itself in the ast and tokenize modules. >>> m = ast.parse("""f''' ... { ... FOO ... } ... ''' ... """) >>> m.body[0].value.values[1].value.id 'FOO' >>> m.body[0].value.values[1].value.lineno 2 That 2 should be a 3, and yet eval(compile(m, "test2", "exec")) File "<stdin>", line 1, in <module> File "test2", line 5, in <module> NameError: name 'FOO' is not defined gives line 5 for the error, so not only are the line numbers wrong they are inconsistent. The problem is that the internals of the f-string are not tokenized and parsed using the normal mechanism, but in an ad-hoc fashion in Python-ast.c as demonstrated when tokenizing the source $ python3.6 -m tokenize test2 0,0-0,0: ENCODING 'utf-8' 1,0-5,3: STRING "f'''\n{\nFOO\n}\n'''" 5,3-5,4: NEWLINE '\n' 6,0-6,0: ENDMARKER '' The f-string could should be tokenized as something like: FSTRING_START f''' STRING_PART \n LEFT_BRACE { NEWLINE IDENTIFIER FOO NEWLINE RIGHT_BRACE } STRING_PART \n FSTRING_END ''' Although this would complicate the tokenizer, it would mean that the internals of f-strings could be made explicit in the grammar, and that the compiler could generate correct offsets.

This problem is the parsing of f-strings. 

The expressions in an f-string are not "eval"ed in the sense of the eval() function. They are evaluated exactly the same as any other Python expression. However the parsing of f-strings does not provide correct line numbers.

This problem also manifests itself in the ast and tokenize modules.

>>> m = ast.parse("""f'''
... {
... FOO
... }
... '''
... """)

>>> m.body[0].value.values[1].value.id
'FOO'
>>> m.body[0].value.values[1].value.lineno
2

That 2 should be a 3, and yet
eval(compile(m, "test2", "exec"))
  File "<stdin>", line 1, in <module>
  File "test2", line 5, in <module>
NameError: name 'FOO' is not defined

gives line 5 for the error, so not only are the line numbers wrong they are inconsistent.

The problem is that the internals of the f-string are not tokenized and parsed using the normal mechanism, but in an ad-hoc fashion in Python-ast.c as demonstrated when tokenizing the source

$ python3.6 -m tokenize test2
0,0-0,0:            ENCODING       'utf-8'        
1,0-5,3:            STRING         "f'''\n{\nFOO\n}\n'''"
5,3-5,4:            NEWLINE        '\n'           
6,0-6,0:            ENDMARKER      ''

The f-string could should be tokenized as something like:
FSTRING_START f'''
STRING_PART \n 
LEFT_BRACE {
NEWLINE
IDENTIFIER FOO
NEWLINE
RIGHT_BRACE }
STRING_PART \n
FSTRING_END '''

Although this would complicate the tokenizer, it would mean that the internals of f-strings could be made explicit in the grammar, and that the compiler could generate correct offsets.

History
Date	User	Action	Args
2017-01-20 10:20:19	Mark.Shannon	set	recipients: + Mark.Shannon, eric.smith, r.david.murray, martin.panter, levkivskyi, yan12125
2017-01-20 10:20:19	Mark.Shannon	set	messageid: <1484907619.13.0.277166701964.issue29051@psf.upfronthosting.co.za>
2017-01-20 10:20:19	Mark.Shannon	link	issue29051 messages
2017-01-20 10:20:18	Mark.Shannon	create