Issue 33754: f-strings should be part of the Grammar

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77935

classification

Title:	f-strings should be part of the Grammar
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	BTaskaya, davidhalter, emilyemorehouse, eric.smith, lys.nikolaou, pablogsal, rhettinger
Priority:	normal	Keywords:

Created on 2018-06-03 12:59 by davidhalter, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg318547 - (view)	Author: David Halter (davidhalter)	Date: 2018-06-03 12:59
Currently f-strings are a bit of a hack. They certainly work very well for users, but they are implemented in ast.c and therefore not part of the Python grammar and the tokenizer. I want to change this. I wrote an alternative implementation of f-strings in parso (http://parso.readthedocs.io/en/latest/). The idea I have is to modify the Python grammar slightly (https://github.com/davidhalter/parso/blob/master/parso/python/grammar37.txt#L149): fstring: FSTRING_START fstring_content* FSTRING_END fstring_content: FSTRING_STRING \| fstring_expr fstring_conversion: '!' NAME fstring_expr: '{' testlist [ fstring_conversion ] [ fstring_format_spec ] '}' fstring_format_spec: ':' fstring_content* We would push most of the hard work to the tokenizer. This obviously means that we have to add a lot of code there. I wrote a tokenizer in Python for parso here: in https://github.com/davidhalter/parso/blob/master/parso/python/tokenize.py. It is definitely working well. The biggest difference to the current tokenizer.c is that you have to work with stacks and be way more context-sensitive. There were attempts to change the Grammar of f-strings like https://www.python.org/dev/peps/pep-0536/. It hasn't caught on, because it tried to change the semantics of f-strings. The implementation in parso has not changed the semantics of f-strings. In a first step I would like to get this working for CPython and not tokenize.py. Modifying tokenize.py will not be part of my initial work here. I have discussed this with Łukasz Langa, so if you guys have no objections I will start working on it. Please let me know if you support this or not.
msg318550 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2018-06-03 13:54
What is the goal here? Are you just trying to simplify ast.c? My concern is that there are many, many edge cases, and that you'll be unknowingly changing the behavior of f-strings. One of the goals of the f-string specification is for a simple third-party parser to be able to lexically recognize f-strings just like normal, raw, or byte strings. It should require no change to such a lexer except for adding "f" where "b", "r", or "u" is currently allowed. I do not want to break that design principle. There are plenty of examples in the wild where this design was leveraged.
msg318553 - (view)	Author: David Halter (davidhalter)	Date: 2018-06-03 14:22
As I wrote before, I'm not trying to change anything about the f-string behavior. It is a refactoring. If anyone wants to change the behavior, I feel like they should probably write a PEP anyway. I personally don't like that f-strings get parsed multiple times. It just smells bad. Also f-strings are IMO not just strings. They should maybe look like strings for other tools to parse them. But they are more or less statements that get executed. The code in ast.c is not bad. Don't get me wrong. I just think that it's the wrong approach. Regarding the edge cases: I don't think there are that many. In the end the ast output will look similar anyway. All the backslashes, string literals and comments can be checked and rejected in the tokenizer already.
msg318556 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2018-06-03 14:53
I'm not completely opposed to it, but I need to understand the benefits and side effects. And I wouldn't exactly describe the multiple passes over the string as "parsing", but I see your point.
msg373414 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-07-09 18:07
I share Eric's concern about "unknowingly changing the behavior of f-strings."
msg379242 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2020-10-21 20:12
Just some notes to consider before work starts on this in earnest: We need to decide what sort of changes we'll accept, if any. For at least the first round of this, I'm okay with "absolutely no change will be acceptable". For example, here's a good change (IMO): allowing f'{"\n" if cond else ""}'. I'd like to be able to use backslashes inside strings that are in an expression. A questionable change: f'{'foo'}'. Nesting the same type of quotes. I think we should be explicit about what we will accept, because editors, etc. will need to adapt. In msg318550 I mention that some external tools use the same lexer they use for strings to lex f-strings. Are we okay with break that? And the f-string '=' feature maybe be hard to support. Although if we are able to support it, then I think the same solution will be applicable to string annotations without unparsing them.

History
Date	User	Action	Args
2022-04-11 14:59:01	admin	set	github: 77935
2020-10-21 20:12:18	eric.smith	set	nosy: + emilyemorehouse, pablogsal messages: + msg379242
2020-10-20 13:31:56	lys.nikolaou	set	nosy: + lys.nikolaou
2020-07-09 18:07:44	rhettinger	set	nosy: + rhettinger messages: + msg373414
2020-07-08 09:13:35	eric.smith	set	versions: + Python 3.10, - Python 3.8
2019-12-01 19:35:30	BTaskaya	set	nosy: + BTaskaya
2018-06-03 14:53:09	eric.smith	set	messages: + msg318556
2018-06-03 14:22:54	davidhalter	set	messages: + msg318553
2018-06-03 13:54:14	eric.smith	set	messages: + msg318550
2018-06-03 13:44:22	berker.peksag	set	nosy: + eric.smith
2018-06-03 12:59:38	davidhalter	create