Implement PEP 498: Literal String Formatting #69153

ericvsmith · 2015-08-30T17:47:01Z

BPO	24965
Nosy	@warsaw, @ericvsmith, @Rosuav, @vadmium, @1st1, @JelleZijlstra
Files	pep-498-3.diff pep-498-4.diff pep-498-5.diff pep-498-6.diff pep-498-7.diff pep-498-8.diff pep-498-9.diff pep-498-10.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ericvsmith'
closed_at = <Date 2015-09-19.18:57:44.348>
created_at = <Date 2015-08-30.17:47:00.728>
labels = ['interpreter-core', 'type-feature']
title = 'Implement PEP 498: Literal String Formatting'
updated_at = <Date 2015-09-19.18:57:44.346>
user = 'https://github.com/ericvsmith'

bugs.python.org fields:

activity = <Date 2015-09-19.18:57:44.346>
actor = 'eric.smith'
assignee = 'eric.smith'
closed = True
closed_date = <Date 2015-09-19.18:57:44.348>
closer = 'eric.smith'
components = ['Interpreter Core']
creation = <Date 2015-08-30.17:47:00.728>
creator = 'eric.smith'
dependencies = []
files = ['40421', '40430', '40447', '40480', '40484', '40495', '40502', '40515']
hgrepos = []
issue_num = 24965
keywords = ['patch']
message_count = 39.0
messages = ['249362', '249364', '249365', '249475', '249481', '249810', '250343', '250344', '250354', '250420', '250422', '250465', '250467', '250485', '250493', '250525', '250527', '250528', '250529', '250530', '250531', '250532', '250538', '250541', '250544', '250546', '250548', '250557', '250598', '250821', '250849', '250926', '250927', '250929', '250931', '250970', '251073', '251100', '251102']
nosy_count = 8.0
nosy_names = ['barry', 'eric.smith', 'python-dev', 'Rosuav', 'martin.panter', 'yselivanov', 'JelleZijlstra', 'elvis']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue24965'
versions = ['Python 3.6']

ericvsmith · 2015-08-30T17:46:52Z

See PEP-498.

>>> f'New for Python {sys.version.split()[0]}'
'New for Python 3.6.0a0'

ericvsmith · 2015-08-30T17:51:10Z

One thing I've done in this implementation is to build up a string to pass to str.format(), instead of using the original string. This new string uses positional parameters instead of named parameters.

I had originally proposed to add a string.interpolate() to do the heavy lifting here, which would have meant I could use the original string (as seen in the source code), and not build up a new string and pass it to str.format(). I still might do that, but for now, the approach using str.format() is good enough.

ericvsmith · 2015-08-30T17:59:59Z

Oops, I didn't really mean to include imporlib.h. Oh, well.

ericvsmith · 2015-09-01T11:29:55Z

Fixed validate_exprs bug.

ericvsmith · 2015-09-01T13:04:41Z

Make sure f-strings are identified as literals in error messages.

python-dev · 2015-09-04T19:01:20Z

New changeset a0194ec4195c by Eric V. Smith in branch 'default':
Removed Implementation Limitations section. While the version of the code on http://bugs.python.org/issue24965 has the 255 expression limitation, I'm going to remove this limit. The i18n section was purely speculative. We can worry about it if/when we add i18n and i-strings.
https://hg.python.org/peps/rev/a0194ec4195c

ericvsmith · 2015-09-09T23:57:42Z

This implements the accepted PEP-498. The only other real change I plan on making is to do dynamic memory allocation when building the expressions that make up a JoinedStr AST node. The code has all of the places to do that already laid out, it's just a matter of hooking it up.

There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm not sure if we should accept the upper case version. I'd think not, but all of the other ones (b, r, and u) do.

I need to do one more scan for memory leaks. I've rearranged some code since the last time I checked for leaks, and that's always a recipe for some sneaking in.

And I need to write some more tests, mostly for syntax errors, but also for a few edge conditions.

Comments welcome.

warsaw · 2015-09-10T01:07:09Z

On Sep 09, 2015, at 11:57 PM, Eric V. Smith wrote:

There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm
not sure if we should accept the upper case version. I'd think not, but all
of the other ones (b, r, and u) do.

I think it should be consistent with the other prefixes. Shouldn't be a big
deal to amend the PEP to describe this.

ericvsmith · 2015-09-10T08:04:19Z

I discussed it with Guido and added 'F' to the PEP.

ericvsmith · 2015-09-10T21:13:06Z

This version does dynamic allocation for the expression list, and fixes some memory leaks and early decrefs.

I think it's complete, but I'll take some more passes through it checking for leaks.

ericvsmith · 2015-09-10T21:28:46Z

The good news is that the performance is pretty good, and finally I have a case where I can beat %-formatting:

$ ./python.bat -mtimeit -s 'a=2' "'%s' % a"
1000000 loops, best of 3: 0.883 usec per loop

$ ./python.bat -mtimeit -s 'a=2' '"{}".format(a)'
1000000 loops, best of 3: 1.16 usec per loop

$ ./python.bat -mtimeit -s 'a=2' 'f"{a}"'
1000000 loops, best of 3: 0.792 usec per loop

This example is mildly contrived, and the performance of f-strings is slightly worse than %-formatting once the f-strings contains both expressions and literals.

I could speed it up significantly (I think) by adding opcodes for 2 things: calling __format__ and joining the strings together. Calling __format__ in an opcode could be a win because I could optimize for known types (str, int, float). Having a join opcode would be a win because I could use _PyUnicodeWriter instead of ''.join.

I'm inclined to check this code in as-is, then optimize it later, if we think it's needed and if I get motivated.

For reference, here's the ast and opcodes for f'a={a}':

>>> ast.dump(ast.parse("f'a={a}'"))
"Module(body=[Expr(value=JoinedStr(values=[Str(s='a='), FormattedValue(value=Name(id='a', ctx=Load()), conversion=0, format_spec=None)]))])"

>>> dis.dis("f'a={a}'")
  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('a=')
              9 LOAD_NAME                1 (a)
             12 LOAD_ATTR                2 (__format__)
             15 LOAD_CONST               0 ('')
             18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             21 BUILD_LIST               2
             24 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             27 RETURN_VALUE

vadmium · 2015-09-11T12:09:02Z

Another version of that AST that is better for my digestion:

f'a={a}'

Module(body=[Expr(
value=JoinedStr(values=[
Str(s='a='),
FormattedValue(
value=Name(id='a', ctx=Load()),
conversion=0,
format_spec=None,
),
]),
)])

I have been reading over the test cases, and left a bunch of suggestions for more edge cases etc. Some of them might reflect that I haven’t completely learnt how the inner Python expression syntax, outer string escaping syntax, {{curly bracket}} escaping, automatic concatenation, etc, are all meant to fit together.

ericvsmith · 2015-09-11T12:40:41Z

Thanks, Martin. I've posted my replies. I'll add some more tests, and work on the triple quoted string bug.

ericvsmith · 2015-09-11T18:17:14Z

Thanks again, Martin. I've found 4 bugs so far, based on your suggested tests. The ones I haven't fixed are: 'fur' strings don't work (something in the lexer), and triple quoted strings don't work correctly. I'm working on both of those, and should have an updated patch in the next day or so.

ericvsmith · 2015-09-11T19:48:28Z

It turns out 'fur' strings aren't a thing, because 'ur' strings aren't.

From tokenizer.c:
/* ur"" and ru"" are not supported */

And the PEP:
https://www.python.org/dev/peps/pep-0414/#exclusion-of-raw-unicode-literals

I'll add a test to make sure this fails.

So I just need to work on the triple-quoted string problem.

ericvsmith · 2015-09-12T16:40:47Z

After discussing it with Guido, I've removed the ability to combine 'f' with 'u'.

JelleZijlstra · 2015-09-12T17:29:59Z

I've started working on implementing this feature in Cython and I'd like to confirm a few edge cases:

f'{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '2' (string escape rules work even within the expressions)
f'{ '''foo''' }' is a syntax error
f'{ """foo 'bar'""" }' is a syntax error

ericvsmith · 2015-09-12T17:36:01Z

Yes, Jelle, you are correct in all 3 cases. Remember that the steps are to extract the string from the source code, decode backslash escapes, and only then treat it as an f-string.

For the first case, without the 'f' prefix:
'{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '{ {1: 2}[1]}'

Then, applying the 'f':
f'{ {1: 2}[1]}' == '2'.

For the last 2, since they're syntax errors without the 'f', they're also syntax errors with the 'f'.

I'll have a new version, with tests for all of these cases, posted in the next few hours. You can leverage the tests.

JelleZijlstra · 2015-09-12T20:34:25Z

Thanks! Here are a few more cases I came across with the existing implementation:

>>> f"{'a\\'b'}"
  File "<stdin>", line 1
SyntaxError: missing '}' in format string expression

I believe this is valid and should produce "a'b".

>>> f"{x!s!s}"
  File "<stdin>", line 1
SyntaxError: single '}' encountered in format string

Could use a better error message.

>>> x = 3
>>> f"{x!s{y}}"
'3y}'

Not sure how this happened.

ericvsmith · 2015-09-12T21:24:18Z

This one has been fixed:
>>> f"{'a\\'b'}"
"a'b"

This one was a bug that I previously fixed, that Martin pointed out:
>>> f"{x!s!s}"
  File "<stdin>", line 1
SyntaxError: invalid character following conversion character

And this is the same bug:
>>> f"{x!s{y}}"
  File "<stdin>", line 1
SyntaxError: invalid character following conversion character

I'm wrapping up my new code plus tests. I'll post it Real Soon Now.

Thanks for your help.

vadmium · 2015-09-12T21:24:25Z

Regarding wrong error messages, I’ve learnt the hard way that it is often best to use assertRaisesRegex() instead of assertRaises(), to ensure that the actual exception you have in mind is being triggered, rather than a typo or something. Though that might upset your assertSyntaxErrors() helper.

ericvsmith · 2015-09-12T21:25:56Z

Agreed on checking the error messages better. Especially since even the simplest of errors (like leaving out a quote) results in a syntax error in parsing the string, not parsing inside the f-string.

I'll look at it eventually.

ericvsmith · 2015-09-12T21:58:11Z

This patch fixes triple-quoted strings, plus a few bugs. I'm going to commit it tomorrow, barring any unforeseen issues.

ericvsmith · 2015-09-12T22:58:12Z

I'll probably ensure that all of the parsing errors contain "format string" or "f-string" or similar. That way the regex check is easier, and the user can search for it more easily.

It remains to be seen how these are referenced in the documentation. "f-string" seems much easier to say and search for, but seems too slangy for the docs. But "format string" seems ambiguous and hard to search for. I guess time will tell.

JelleZijlstra · 2015-09-13T00:15:36Z

Is this behavior intentional?

>>> str = len
>>> x = 'foo'
>>> f'{x!s}'
'3'
>>> '{!s}'.format(x)
'foo'

Or similarly:

>>> import builtins
>>> del builtins.repr
>>> f'{x!r}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'repr' is not defined

ericvsmith · 2015-09-13T00:23:08Z

Both of those are known (to me!) byproducts of the current implementation. If my crazy idea of adding opcodes to speed up f-strings flies, then this issue will go away. I consider this a corner case that doesn't need to be addressed before committing this code. I wouldn't emulate it one way or the other just yet.

vadmium · 2015-09-13T04:21:21Z

I’m actually trying out your patch now. A couple strange errors and observations:

>>> f"{'{'}"  # Why is this allowed in an outer format expression--
'{'
>>> f"{3:{'{'}>10}"  # --but not inside a format specifier?
SyntaxError: nesting of '{' in format specifier is not allowed
>>> opening = "{"; f"{3:{opening}>10}"  # Workaround
'{{{{{{{{{3'
>>> f"{3:{'}'}<10}"  # Error message is very strange!
SyntaxError: missing '}' in format string expression
>>> f"{\x00}"  # It seems this is treated as a null terminator
  File "<fstring>", line 1
    (
    ^
SyntaxError: unexpected EOF while parsing
>>> f"{'s'!\x00:.<10}"  # Default conversion is the null character?
's.........'

ericvsmith · 2015-09-13T11:39:43Z

On 9/13/2015 12:21 AM, Martin Panter wrote:
>>>> f"{'{'}"  # Why is this allowed in an outer format expression--
> '{'
>>>> f"{3:{'{'}>10}"  # --but not inside a format specifier?

This is me being lazy about detecting recursion. I'll fix it.

>>> f"{\x00}" # It seems this is treated as a null terminator
File "<fstring>", line 1
(
^
SyntaxError: unexpected EOF while parsing

This is a byproduct of using PyParser_ASTFromString. I'm not particularly included to do anything about it. Is there any practical use case?

>>> f"{'s'!\x00:.<10}" # Default conversion is the null character?
's.........'

Yes, that's the default. I'll switch to -1, which I think won't have this issue.

Thanks for the review.

vadmium · 2015-09-14T02:42:59Z

Regarding the null terminator, I was mainly smoke testing your code. :) Maybe it would be too hard to support properly. Although I could imagine someone doing things like this:

>>> d = {b"key\x00": "value"}
>>> f"key={d[b'key\x00']}"  # Oops, escape code at wrong level
  File "<fstring>", line 1
    (d[b'key
           ^
SyntaxError: EOL while scanning string literal
>>> rf"key={d[b'key\x00']}"  # Corrected
'key=value'

I also finished quickly reading over the C code, with a couple more review comments. But I am not familiar with the files involved to thoroughly review.

ericvsmith · 2015-09-16T07:12:29Z

I rewrote the format_spec parser to recursively call the f-string parser, so any oddness in what's allowed in a format_spec is gone.

It took way longer than I thought, but the code is better for it.

ericvsmith · 2015-09-16T14:16:29Z

Simplified error handling, fixed 2 memory leaks.

All tests now pass with no leaks.

This should be the final version.

vadmium · 2015-09-18T00:36:49Z

Another strange error message (though maybe the new test changes you mentioned caught this):

>>> f'{3:{10}'  # Actually missing a closing bracket '}'
  File "<stdin>", line 1
SyntaxError: f-string: unexpected '}'

ericvsmith · 2015-09-18T00:40:16Z

Martin Panter added the comment:

Another strange error message (though maybe the new test changes you mentioned caught this):

>>> f'{3:{10}' # Actually missing a closing bracket '}'
File "<stdin>", line 1
SyntaxError: f-string: unexpected '}'

Yes, I found that one, too. Sorry to waste your time on this, but I literally just finished the test changes 15 minutes ago.

ericvsmith · 2015-09-18T01:22:23Z

Hopefully the last version.

vadmium · 2015-09-18T04:11:10Z

I left a few more comments on Reitveld. Checking the error messages does make me feel a lot more comfortable though.

ericvsmith · 2015-09-18T12:13:55Z

Cleaned up the error handling in fstring_expression_compile so it's easier to verify and more robust in the face of future changes.

Added a test for an un-doubled '}', which is an error in a top-level literal (and ends a nested expression). Modified existing tests to match.

ericvsmith · 2015-09-19T10:49:30Z

I changed the generated code to call:
format(x [, spec])

instead of:
x.__format__(spec)

The reason is that the correct way to call __format__ is actually:
type(x).__format__(x, spec)

That is, the __format__ lookup is done on the type, not the instance. From the earlier example, the disassembled code is now:

>>> dis.dis("f'a={a}'")
  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('a=')
              9 LOAD_GLOBAL              1 (format)
             12 LOAD_NAME                2 (a)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 BUILD_LIST               2
             21 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             24 RETURN_VALUE

The simplest way to make the lookup correctly is just to call format() itself, which does the right thing.

I still have a concept of adding opcodes to handle FormattedValue and JoinedStr nodes, but that's an optimization for later, if ever.

python-dev · 2015-09-19T18:52:04Z

New changeset a10d37f04569 by Eric V. Smith in branch 'default':
Issue bpo-24965: Implement PEP-498 "Literal String Interpolation". Documentation is still needed, I'll open an issue for that.
https://hg.python.org/cpython/rev/a10d37f04569

ericvsmith · 2015-09-19T18:57:44Z

Documentation task added as issue bpo-25179.

Thanks to Martin for the great code reviews.

ericvsmith self-assigned this Aug 30, 2015

ericvsmith added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Aug 30, 2015

ericvsmith closed this as completed Sep 19, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Implement PEP 498: Literal String Formatting #69153

Implement PEP 498: Literal String Formatting #69153

Comments

ericvsmith commented Aug 30, 2015

ericvsmith commented Aug 30, 2015

ericvsmith commented Aug 30, 2015

ericvsmith commented Aug 30, 2015

ericvsmith commented Sep 1, 2015

ericvsmith commented Sep 1, 2015

python-dev mannequin commented Sep 4, 2015

ericvsmith commented Sep 9, 2015

warsaw commented Sep 10, 2015

ericvsmith commented Sep 10, 2015

ericvsmith commented Sep 10, 2015

ericvsmith commented Sep 10, 2015

vadmium commented Sep 11, 2015

ericvsmith commented Sep 11, 2015

ericvsmith commented Sep 11, 2015

ericvsmith commented Sep 11, 2015

ericvsmith commented Sep 12, 2015

JelleZijlstra commented Sep 12, 2015

ericvsmith commented Sep 12, 2015

JelleZijlstra commented Sep 12, 2015

ericvsmith commented Sep 12, 2015

vadmium commented Sep 12, 2015

ericvsmith commented Sep 12, 2015

ericvsmith commented Sep 12, 2015

ericvsmith commented Sep 12, 2015

JelleZijlstra commented Sep 13, 2015

ericvsmith commented Sep 13, 2015

vadmium commented Sep 13, 2015

ericvsmith commented Sep 13, 2015

vadmium commented Sep 14, 2015

ericvsmith commented Sep 16, 2015

ericvsmith commented Sep 16, 2015

vadmium commented Sep 18, 2015

ericvsmith commented Sep 18, 2015

ericvsmith commented Sep 18, 2015

vadmium commented Sep 18, 2015

ericvsmith commented Sep 18, 2015

ericvsmith commented Sep 19, 2015

python-dev mannequin commented Sep 19, 2015

ericvsmith commented Sep 19, 2015