Issue 24965: Implement PEP 498: Literal String Formatting

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/69153

classification

Title:	Implement PEP 498: Literal String Formatting
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	eric.smith	Nosy List:	JelleZijlstra, Rosuav, barry, elvis, eric.smith, martin.panter, python-dev, yselivanov
Priority:	normal	Keywords:	patch

Created on 2015-08-30 17:47 by eric.smith, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pep-498-3.diff	eric.smith, 2015-09-09 23:57
pep-498-4.diff	eric.smith, 2015-09-10 21:13		review
pep-498-5.diff	eric.smith, 2015-09-12 21:58		review
pep-498-6.diff	eric.smith, 2015-09-16 07:12		review
pep-498-7.diff	eric.smith, 2015-09-16 14:16		review
pep-498-8.diff	eric.smith, 2015-09-18 01:22		review
pep-498-9.diff	eric.smith, 2015-09-18 12:13		review
pep-498-10.diff	eric.smith, 2015-09-19 10:49		review

Messages (39)
msg249362 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-08-30 17:46
See PEP 498. >>> f'New for Python {sys.version.split()[0]}' 'New for Python 3.6.0a0'
msg249364 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-08-30 17:51
One thing I've done in this implementation is to build up a string to pass to str.format(), instead of using the original string. This new string uses positional parameters instead of named parameters. I had originally proposed to add a string.interpolate() to do the heavy lifting here, which would have meant I could use the original string (as seen in the source code), and not build up a new string and pass it to str.format(). I still might do that, but for now, the approach using str.format() is good enough.
msg249365 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-08-30 17:59
Oops, I didn't really mean to include imporlib.h. Oh, well.
msg249475 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-01 11:29
Fixed validate_exprs bug.
msg249481 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-01 13:04
Make sure f-strings are identified as literals in error messages.
msg249810 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-09-04 19:01
New changeset a0194ec4195c by Eric V. Smith in branch 'default': Removed Implementation Limitations section. While the version of the code on http://bugs.python.org/issue24965 has the 255 expression limitation, I'm going to remove this limit. The i18n section was purely speculative. We can worry about it if/when we add i18n and i-strings. https://hg.python.org/peps/rev/a0194ec4195c
msg250343 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-09 23:57
This implements the accepted PEP 498. The only other real change I plan on making is to do dynamic memory allocation when building the expressions that make up a JoinedStr AST node. The code has all of the places to do that already laid out, it's just a matter of hooking it up. There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm not sure if we should accept the upper case version. I'd think not, but all of the other ones (b, r, and u) do. I need to do one more scan for memory leaks. I've rearranged some code since the last time I checked for leaks, and that's always a recipe for some sneaking in. And I need to write some more tests, mostly for syntax errors, but also for a few edge conditions. Comments welcome.
msg250344 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2015-09-10 01:07
On Sep 09, 2015, at 11:57 PM, Eric V. Smith wrote: >There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm >not sure if we should accept the upper case version. I'd think not, but all >of the other ones (b, r, and u) do. I think it should be consistent with the other prefixes. Shouldn't be a big deal to amend the PEP to describe this.
msg250354 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-10 08:04
I discussed it with Guido and added 'F' to the PEP.
msg250420 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-10 21:13
This version does dynamic allocation for the expression list, and fixes some memory leaks and early decrefs. I think it's complete, but I'll take some more passes through it checking for leaks.
msg250422 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-10 21:28
The good news is that the performance is pretty good, and finally I have a case where I can beat %-formatting: $ ./python.bat -mtimeit -s 'a=2' "'%s' % a" 1000000 loops, best of 3: 0.883 usec per loop $ ./python.bat -mtimeit -s 'a=2' '"{}".format(a)' 1000000 loops, best of 3: 1.16 usec per loop $ ./python.bat -mtimeit -s 'a=2' 'f"{a}"' 1000000 loops, best of 3: 0.792 usec per loop This example is mildly contrived, and the performance of f-strings is slightly worse than %-formatting once the f-strings contains both expressions and literals. I could speed it up significantly (I think) by adding opcodes for 2 things: calling __format__ and joining the strings together. Calling __format__ in an opcode could be a win because I could optimize for known types (str, int, float). Having a join opcode would be a win because I could use _PyUnicodeWriter instead of ''.join. I'm inclined to check this code in as-is, then optimize it later, if we think it's needed and if I get motivated. For reference, here's the ast and opcodes for f'a={a}': >>> ast.dump(ast.parse("f'a={a}'")) "Module(body=[Expr(value=JoinedStr(values=[Str(s='a='), FormattedValue(value=Name(id='a', ctx=Load()), conversion=0, format_spec=None)]))])" >>> dis.dis("f'a={a}'") 1 0 LOAD_CONST 0 ('') 3 LOAD_ATTR 0 (join) 6 LOAD_CONST 1 ('a=') 9 LOAD_NAME 1 (a) 12 LOAD_ATTR 2 (__format__) 15 LOAD_CONST 0 ('') 18 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 21 BUILD_LIST 2 24 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 27 RETURN_VALUE
msg250465 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-11 12:09
Another version of that AST that is better for my digestion: f'a={a}' Module(body=[Expr( value=JoinedStr(values=[ Str(s='a='), FormattedValue( value=Name(id='a', ctx=Load()), conversion=0, format_spec=None, ), ]), )]) I have been reading over the test cases, and left a bunch of suggestions for more edge cases etc. Some of them might reflect that I haven’t completely learnt how the inner Python expression syntax, outer string escaping syntax, {{curly bracket}} escaping, automatic concatenation, etc, are all meant to fit together.
msg250467 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-11 12:40
Thanks, Martin. I've posted my replies. I'll add some more tests, and work on the triple quoted string bug.
msg250485 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-11 18:17
Thanks again, Martin. I've found 4 bugs so far, based on your suggested tests. The ones I haven't fixed are: 'fur' strings don't work (something in the lexer), and triple quoted strings don't work correctly. I'm working on both of those, and should have an updated patch in the next day or so.
msg250493 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-11 19:48
It turns out 'fur' strings aren't a thing, because 'ur' strings aren't. From tokenizer.c: /* ur"" and ru"" are not supported */ And the PEP: https://www.python.org/dev/peps/pep-0414/#exclusion-of-raw-unicode-literals I'll add a test to make sure this fails. So I just need to work on the triple-quoted string problem.
msg250525 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-12 16:40
After discussing it with Guido, I've removed the ability to combine 'f' with 'u'.
msg250527 - (view)	Author: Jelle Zijlstra (JelleZijlstra) *	Date: 2015-09-12 17:29
I've started working on implementing this feature in Cython and I'd like to confirm a few edge cases: - f'{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '2' (string escape rules work even within the expressions) - f'{ '''foo''' }' is a syntax error - f'{ """foo 'bar'""" }' is a syntax error
msg250528 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-12 17:36
Yes, Jelle, you are correct in all 3 cases. Remember that the steps are to extract the string from the source code, decode backslash escapes, and only then treat it as an f-string. For the first case, without the 'f' prefix: '{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '{ {1: 2}[1]}' Then, applying the 'f': f'{ {1: 2}[1]}' == '2'. For the last 2, since they're syntax errors without the 'f', they're also syntax errors with the 'f'. I'll have a new version, with tests for all of these cases, posted in the next few hours. You can leverage the tests.
msg250529 - (view)	Author: Jelle Zijlstra (JelleZijlstra) *	Date: 2015-09-12 20:34
Thanks! Here are a few more cases I came across with the existing implementation: >>> f"{'a\\'b'}" File "<stdin>", line 1 SyntaxError: missing '}' in format string expression I believe this is valid and should produce "a'b". >>> f"{x!s!s}" File "<stdin>", line 1 SyntaxError: single '}' encountered in format string Could use a better error message. >>> x = 3 >>> f"{x!s{y}}" '3y}' Not sure how this happened.
msg250530 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-12 21:24
This one has been fixed: >>> f"{'a\\'b'}" "a'b" This one was a bug that I previously fixed, that Martin pointed out: >>> f"{x!s!s}" File "<stdin>", line 1 SyntaxError: invalid character following conversion character And this is the same bug: >>> f"{x!s{y}}" File "<stdin>", line 1 SyntaxError: invalid character following conversion character I'm wrapping up my new code plus tests. I'll post it Real Soon Now. Thanks for your help.
msg250531 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-12 21:24
Regarding wrong error messages, I’ve learnt the hard way that it is often best to use assertRaisesRegex() instead of assertRaises(), to ensure that the actual exception you have in mind is being triggered, rather than a typo or something. Though that might upset your assertSyntaxErrors() helper.
msg250532 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-12 21:25
Agreed on checking the error messages better. Especially since even the simplest of errors (like leaving out a quote) results in a syntax error in parsing the string, not parsing inside the f-string. I'll look at it eventually.
msg250538 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-12 21:58
This patch fixes triple-quoted strings, plus a few bugs. I'm going to commit it tomorrow, barring any unforeseen issues.
msg250541 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-12 22:58
I'll probably ensure that all of the parsing errors contain "format string" or "f-string" or similar. That way the regex check is easier, and the user can search for it more easily. It remains to be seen how these are referenced in the documentation. "f-string" seems much easier to say and search for, but seems too slangy for the docs. But "format string" seems ambiguous and hard to search for. I guess time will tell.
msg250544 - (view)	Author: Jelle Zijlstra (JelleZijlstra) *	Date: 2015-09-13 00:15
Is this behavior intentional? >>> str = len >>> x = 'foo' >>> f'{x!s}' '3' >>> '{!s}'.format(x) 'foo' Or similarly: >>> import builtins >>> del builtins.repr >>> f'{x!r}' Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'repr' is not defined
msg250546 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-13 00:23
Both of those are known (to me!) byproducts of the current implementation. If my crazy idea of adding opcodes to speed up f-strings flies, then this issue will go away. I consider this a corner case that doesn't need to be addressed before committing this code. I wouldn't emulate it one way or the other just yet.
msg250548 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-13 04:21
I’m actually trying out your patch now. A couple strange errors and observations: >>> f"{'{'}" # Why is this allowed in an outer format expression-- '{' >>> f"{3:{'{'}>10}" # --but not inside a format specifier? SyntaxError: nesting of '{' in format specifier is not allowed >>> opening = "{"; f"{3:{opening}>10}" # Workaround '{{{{{{{{{3' >>> f"{3:{'}'}<10}" # Error message is very strange! SyntaxError: missing '}' in format string expression >>> f"{\x00}" # It seems this is treated as a null terminator File "<fstring>", line 1 ( ^ SyntaxError: unexpected EOF while parsing >>> f"{'s'!\x00:.<10}" # Default conversion is the null character? 's.........'
msg250557 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-13 11:39
On 9/13/2015 12:21 AM, Martin Panter wrote: >>>> f"{'{'}" # Why is this allowed in an outer format expression-- > '{' >>>> f"{3:{'{'}>10}" # --but not inside a format specifier? This is me being lazy about detecting recursion. I'll fix it. >>>> f"{\x00}" # It seems this is treated as a null terminator > File "<fstring>", line 1 > ( > ^ > SyntaxError: unexpected EOF while parsing This is a byproduct of using PyParser_ASTFromString. I'm not particularly included to do anything about it. Is there any practical use case? >>>> f"{'s'!\x00:.<10}" # Default conversion is the null character? > 's.........' Yes, that's the default. I'll switch to -1, which I think won't have this issue. Thanks for the review.
msg250598 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-14 02:42
Regarding the null terminator, I was mainly smoke testing your code. :) Maybe it would be too hard to support properly. Although I could imagine someone doing things like this: >>> d = {b"key\x00": "value"} >>> f"key={d[b'key\x00']}" # Oops, escape code at wrong level File "<fstring>", line 1 (d[b'key ^ SyntaxError: EOL while scanning string literal >>> rf"key={d[b'key\x00']}" # Corrected 'key=value' I also finished quickly reading over the C code, with a couple more review comments. But I am not familiar with the files involved to thoroughly review.
msg250821 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-16 07:12
I rewrote the format_spec parser to recursively call the f-string parser, so any oddness in what's allowed in a format_spec is gone. It took way longer than I thought, but the code is better for it.
msg250849 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-16 14:16
Simplified error handling, fixed 2 memory leaks. All tests now pass with no leaks. This should be the final version.
msg250926 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-18 00:36
Another strange error message (though maybe the new test changes you mentioned caught this): >>> f'{3:{10}' # Actually missing a closing bracket '}' File "<stdin>", line 1 SyntaxError: f-string: unexpected '}'
msg250927 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-18 00:40
> Martin Panter added the comment: > > Another strange error message (though maybe the new test changes you mentioned caught this): > >>>> f'{3:{10}' # Actually missing a closing bracket '}' > File "<stdin>", line 1 > SyntaxError: f-string: unexpected '}' Yes, I found that one, too. Sorry to waste your time on this, but I literally just finished the test changes 15 minutes ago.
msg250929 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-18 01:22
Hopefully the last version.
msg250931 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-09-18 04:11
I left a few more comments on Reitveld. Checking the error messages does make me feel a lot more comfortable though.
msg250970 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-18 12:13
Cleaned up the error handling in fstring_expression_compile so it's easier to verify and more robust in the face of future changes. Added a test for an un-doubled '}', which is an error in a top-level literal (and ends a nested expression). Modified existing tests to match.
msg251073 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-19 10:49
I changed the generated code to call: format(x [, spec]) instead of: x.__format__(spec) The reason is that the correct way to call __format__ is actually: type(x).__format__(x, spec) That is, the __format__ lookup is done on the type, not the instance. From the earlier example, the disassembled code is now: >>> dis.dis("f'a={a}'") 1 0 LOAD_CONST 0 ('') 3 LOAD_ATTR 0 (join) 6 LOAD_CONST 1 ('a=') 9 LOAD_GLOBAL 1 (format) 12 LOAD_NAME 2 (a) 15 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 18 BUILD_LIST 2 21 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 24 RETURN_VALUE The simplest way to make the lookup correctly is just to call format() itself, which does the right thing. I still have a concept of adding opcodes to handle FormattedValue and JoinedStr nodes, but that's an optimization for later, if ever.
msg251100 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-09-19 18:52
New changeset a10d37f04569 by Eric V. Smith in branch 'default': Issue #24965: Implement PEP 498 "Literal String Interpolation". Documentation is still needed, I'll open an issue for that. https://hg.python.org/cpython/rev/a10d37f04569
msg251102 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-09-19 18:57
Documentation task added as issue #25179. Thanks to Martin for the great code reviews.

History
Date	User	Action	Args
2022-04-11 14:58:20	admin	set	github: 69153
2015-09-19 18:57:44	eric.smith	set	status: open -> closed resolution: fixed messages: + msg251102
2015-09-19 18:52:03	python-dev	set	messages: + msg251100
2015-09-19 10:49:42	eric.smith	set	files: + pep-498-10.diff messages: + msg251073
2015-09-18 13:37:15	Rosuav	set	nosy: + Rosuav
2015-09-18 12:13:55	eric.smith	set	files: + pep-498-9.diff messages: + msg250970
2015-09-18 04:11:09	martin.panter	set	messages: + msg250931
2015-09-18 01:22:32	eric.smith	set	files: + pep-498-8.diff messages: + msg250929
2015-09-18 00:40:16	eric.smith	set	messages: + msg250927
2015-09-18 00:36:49	martin.panter	set	messages: + msg250926
2015-09-16 14:16:29	eric.smith	set	files: + pep-498-7.diff messages: + msg250849
2015-09-16 07:12:37	eric.smith	set	files: + pep-498-6.diff messages: + msg250821
2015-09-14 02:43:00	martin.panter	set	messages: + msg250598
2015-09-13 15:45:05	yselivanov	set	nosy: + elvis
2015-09-13 11:39:43	eric.smith	set	messages: + msg250557
2015-09-13 04:21:21	martin.panter	set	messages: + msg250548
2015-09-13 00:23:08	eric.smith	set	messages: + msg250546
2015-09-13 00:15:36	JelleZijlstra	set	messages: + msg250544
2015-09-12 22:58:11	eric.smith	set	messages: + msg250541
2015-09-12 21:58:18	eric.smith	set	files: + pep-498-5.diff messages: + msg250538
2015-09-12 21:25:55	eric.smith	set	messages: + msg250532
2015-09-12 21:24:24	martin.panter	set	messages: + msg250531
2015-09-12 21:24:18	eric.smith	set	messages: + msg250530
2015-09-12 20:34:25	JelleZijlstra	set	messages: + msg250529
2015-09-12 17:36:01	eric.smith	set	messages: + msg250528
2015-09-12 17:29:59	JelleZijlstra	set	nosy: + JelleZijlstra messages: + msg250527
2015-09-12 16:40:47	eric.smith	set	messages: + msg250525
2015-09-11 19:48:28	eric.smith	set	messages: + msg250493
2015-09-11 18:17:14	eric.smith	set	messages: + msg250485
2015-09-11 12:40:40	eric.smith	set	messages: + msg250467
2015-09-11 12:09:02	martin.panter	set	nosy: + martin.panter messages: + msg250465
2015-09-10 21:28:46	eric.smith	set	messages: + msg250422
2015-09-10 21:13:34	eric.smith	set	files: + pep-498-4.diff
2015-09-10 21:13:05	eric.smith	set	messages: + msg250420
2015-09-10 08:04:18	eric.smith	set	messages: + msg250354
2015-09-10 01:07:09	barry	set	messages: + msg250344
2015-09-09 23:57:42	eric.smith	set	files: + pep-498-3.diff messages: + msg250343
2015-09-09 23:17:31	eric.smith	set	files: - pep-498-2.diff
2015-09-09 23:17:23	eric.smith	set	files: - pep-498-1.diff
2015-09-09 23:17:17	eric.smith	set	files: - pep-498.diff
2015-09-04 19:01:19	python-dev	set	nosy: + python-dev messages: + msg249810
2015-09-01 16:37:36	yselivanov	set	nosy: + yselivanov
2015-09-01 13:04:41	eric.smith	set	files: + pep-498-2.diff messages: + msg249481
2015-09-01 11:30:07	eric.smith	set	files: + pep-498-1.diff messages: + msg249475
2015-08-30 19:26:27	barry	set	nosy: + barry
2015-08-30 17:59:59	eric.smith	set	messages: + msg249365
2015-08-30 17:51:10	eric.smith	set	messages: + msg249364
2015-08-30 17:47:00	eric.smith	create