classification
Title: f-strings should be part of the Grammar
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, davidhalter, eric.smith
Priority: normal Keywords:

Created on 2018-06-03 12:59 by davidhalter, last changed 2019-12-01 19:35 by BTaskaya.

Messages (4)
msg318547 - (view) Author: David Halter (davidhalter) Date: 2018-06-03 12:59
Currently f-strings are a bit of a hack. They certainly work very well for users, but they are implemented in ast.c and therefore not part of the Python grammar and the tokenizer.

I want to change this. I wrote an alternative implementation of f-strings in parso (http://parso.readthedocs.io/en/latest/). The idea I have is to modify the Python grammar slightly (https://github.com/davidhalter/parso/blob/master/parso/python/grammar37.txt#L149):

fstring: FSTRING_START fstring_content* FSTRING_END
fstring_content: FSTRING_STRING | fstring_expr
fstring_conversion: '!' NAME
fstring_expr: '{' testlist [ fstring_conversion ] [ fstring_format_spec ] '}'
fstring_format_spec: ':' fstring_content*

We would push most of the hard work to the tokenizer. This obviously means that we have to add a lot of code there. I wrote a tokenizer in Python for parso here: in https://github.com/davidhalter/parso/blob/master/parso/python/tokenize.py. It is definitely working well. The biggest difference to the current tokenizer.c is that you have to work with stacks and be way more context-sensitive.

There were attempts to change the Grammar of f-strings like https://www.python.org/dev/peps/pep-0536/. It hasn't caught on, because it tried to change the semantics of f-strings. The implementation in parso has not changed the semantics of f-strings.

In a first step I would like to get this working for CPython and not tokenize.py. Modifying tokenize.py will not be part of my initial work here.

I have discussed this with Łukasz Langa, so if you guys have no objections I will start working on it. Please let me know if you support this or not.
msg318550 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-03 13:54
What is the goal here? Are you just trying to simplify ast.c?

My concern is that there are many, many edge cases, and that you'll be unknowingly changing the behavior of f-strings.

One of the goals of the f-string specification is for a simple third-party parser to be able to lexically recognize f-strings just like normal, raw, or byte strings. It should require no change to such a lexer except for adding "f" where "b", "r", or "u" is currently allowed. I do not want to break that design principle. There are plenty of examples in the wild where this design was leveraged.
msg318553 - (view) Author: David Halter (davidhalter) Date: 2018-06-03 14:22
As I wrote before, I'm not trying to change anything about the f-string behavior. It is a refactoring. If anyone wants to change the behavior, I feel like they should probably write a PEP anyway.

I personally don't like that f-strings get parsed multiple times. It just smells bad. Also f-strings are IMO not just strings. They should maybe look like strings for other tools to parse them. But they are more or less statements that get executed.

The code in ast.c is not bad. Don't get me wrong. I just think that it's the wrong approach.

Regarding the edge cases: I don't think there are that many. In the end the ast output will look similar anyway. All the backslashes, string literals and comments can be checked and rejected in the tokenizer already.
msg318556 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-03 14:53
I'm not completely opposed to it, but I need to understand the benefits and side effects.

And I wouldn't exactly describe the multiple passes over the string as "parsing", but I see your point.
History
Date User Action Args
2019-12-01 19:35:30BTaskayasetnosy: + BTaskaya
2018-06-03 14:53:09eric.smithsetmessages: + msg318556
2018-06-03 14:22:54davidhaltersetmessages: + msg318553
2018-06-03 13:54:14eric.smithsetmessages: + msg318550
2018-06-03 13:44:22berker.peksagsetnosy: + eric.smith
2018-06-03 12:59:38davidhaltercreate