This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: f-strings should be part of the Grammar
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, davidhalter, emilyemorehouse, eric.smith, lys.nikolaou, pablogsal, rhettinger
Priority: normal Keywords:

Created on 2018-06-03 12:59 by davidhalter, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg318547 - (view) Author: David Halter (davidhalter) Date: 2018-06-03 12:59
Currently f-strings are a bit of a hack. They certainly work very well for users, but they are implemented in ast.c and therefore not part of the Python grammar and the tokenizer.

I want to change this. I wrote an alternative implementation of f-strings in parso (http://parso.readthedocs.io/en/latest/). The idea I have is to modify the Python grammar slightly (https://github.com/davidhalter/parso/blob/master/parso/python/grammar37.txt#L149):

fstring: FSTRING_START fstring_content* FSTRING_END
fstring_content: FSTRING_STRING | fstring_expr
fstring_conversion: '!' NAME
fstring_expr: '{' testlist [ fstring_conversion ] [ fstring_format_spec ] '}'
fstring_format_spec: ':' fstring_content*

We would push most of the hard work to the tokenizer. This obviously means that we have to add a lot of code there. I wrote a tokenizer in Python for parso here: in https://github.com/davidhalter/parso/blob/master/parso/python/tokenize.py. It is definitely working well. The biggest difference to the current tokenizer.c is that you have to work with stacks and be way more context-sensitive.

There were attempts to change the Grammar of f-strings like https://www.python.org/dev/peps/pep-0536/. It hasn't caught on, because it tried to change the semantics of f-strings. The implementation in parso has not changed the semantics of f-strings.

In a first step I would like to get this working for CPython and not tokenize.py. Modifying tokenize.py will not be part of my initial work here.

I have discussed this with Łukasz Langa, so if you guys have no objections I will start working on it. Please let me know if you support this or not.
msg318550 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-03 13:54
What is the goal here? Are you just trying to simplify ast.c?

My concern is that there are many, many edge cases, and that you'll be unknowingly changing the behavior of f-strings.

One of the goals of the f-string specification is for a simple third-party parser to be able to lexically recognize f-strings just like normal, raw, or byte strings. It should require no change to such a lexer except for adding "f" where "b", "r", or "u" is currently allowed. I do not want to break that design principle. There are plenty of examples in the wild where this design was leveraged.
msg318553 - (view) Author: David Halter (davidhalter) Date: 2018-06-03 14:22
As I wrote before, I'm not trying to change anything about the f-string behavior. It is a refactoring. If anyone wants to change the behavior, I feel like they should probably write a PEP anyway.

I personally don't like that f-strings get parsed multiple times. It just smells bad. Also f-strings are IMO not just strings. They should maybe look like strings for other tools to parse them. But they are more or less statements that get executed.

The code in ast.c is not bad. Don't get me wrong. I just think that it's the wrong approach.

Regarding the edge cases: I don't think there are that many. In the end the ast output will look similar anyway. All the backslashes, string literals and comments can be checked and rejected in the tokenizer already.
msg318556 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-03 14:53
I'm not completely opposed to it, but I need to understand the benefits and side effects.

And I wouldn't exactly describe the multiple passes over the string as "parsing", but I see your point.
msg373414 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-07-09 18:07
I share Eric's concern about "unknowingly changing the behavior of f-strings."
msg379242 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-10-21 20:12
Just some notes to consider before work starts on this in earnest:

We need to decide what sort of changes we'll accept, if any. For at least the first round of this, I'm okay with "absolutely no change will be acceptable".

For example, here's a good change (IMO): allowing f'{"\n" if cond else ""}'. I'd like to be able to use backslashes inside strings that are in an expression.

A questionable change: f'{'foo'}'. Nesting the same type of quotes.

I think we should be explicit about what we will accept, because editors, etc. will need to adapt. In msg318550 I mention that some external tools use the same lexer they use for strings to lex f-strings. Are we okay with break that?

And the f-string '=' feature maybe be hard to support. Although if we are able to support it, then I think the same solution will be applicable to string annotations without unparsing them.
History
Date User Action Args
2022-04-11 14:59:01adminsetgithub: 77935
2020-10-21 20:12:18eric.smithsetnosy: + emilyemorehouse, pablogsal
messages: + msg379242
2020-10-20 13:31:56lys.nikolaousetnosy: + lys.nikolaou
2020-07-09 18:07:44rhettingersetnosy: + rhettinger
messages: + msg373414
2020-07-08 09:13:35eric.smithsetversions: + Python 3.10, - Python 3.8
2019-12-01 19:35:30BTaskayasetnosy: + BTaskaya
2018-06-03 14:53:09eric.smithsetmessages: + msg318556
2018-06-03 14:22:54davidhaltersetmessages: + msg318553
2018-06-03 13:54:14eric.smithsetmessages: + msg318550
2018-06-03 13:44:22berker.peksagsetnosy: + eric.smith
2018-06-03 12:59:38davidhaltercreate