classification
Title: Implement PEP 498: Literal String Formatting
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: Jelle Zijlstra, Rosuav, barry, elvis, eric.smith, martin.panter, python-dev, yselivanov
Priority: normal Keywords: patch

Created on 2015-08-30 17:47 by eric.smith, last changed 2015-09-19 18:57 by eric.smith. This issue is now closed.

Files
File name Uploaded Description Edit
pep-498-3.diff eric.smith, 2015-09-09 23:57
pep-498-4.diff eric.smith, 2015-09-10 21:13 review
pep-498-5.diff eric.smith, 2015-09-12 21:58 review
pep-498-6.diff eric.smith, 2015-09-16 07:12 review
pep-498-7.diff eric.smith, 2015-09-16 14:16 review
pep-498-8.diff eric.smith, 2015-09-18 01:22 review
pep-498-9.diff eric.smith, 2015-09-18 12:13 review
pep-498-10.diff eric.smith, 2015-09-19 10:49 review
Messages (39)
msg249362 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-08-30 17:46
See PEP 498.

>>> f'New for Python {sys.version.split()[0]}'
'New for Python 3.6.0a0'
msg249364 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-08-30 17:51
One thing I've done in this implementation is to build up a string to pass to str.format(), instead of using the original string. This new string uses positional parameters instead of named parameters.

I had originally proposed to add a string.interpolate() to do the heavy lifting here, which would have meant I could use the original string (as seen in the source code), and not build up a new string and pass it to str.format(). I still might do that, but for now, the approach using str.format() is good enough.
msg249365 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-08-30 17:59
Oops, I didn't really mean to include imporlib.h. Oh, well.
msg249475 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-01 11:29
Fixed validate_exprs bug.
msg249481 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-01 13:04
Make sure f-strings are identified as literals in error messages.
msg249810 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-04 19:01
New changeset a0194ec4195c by Eric V. Smith in branch 'default':
Removed Implementation Limitations section. While the version of the code on http://bugs.python.org/issue24965 has the 255 expression limitation, I'm going to remove this limit. The i18n section was purely speculative. We can worry about it if/when we add i18n and i-strings.
https://hg.python.org/peps/rev/a0194ec4195c
msg250343 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-09 23:57
This implements the accepted PEP 498. The only other real change I plan on making is to do dynamic memory allocation when building the expressions that make up a JoinedStr AST node. The code has all of the places to do that already laid out, it's just a matter of hooking it up.

There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm not sure if we should accept the upper case version. I'd think not, but all of the other ones (b, r, and u) do.

I need to do one more scan for memory leaks. I've rearranged some code since the last time I checked for leaks, and that's always a recipe for some sneaking in.

And I need to write some more tests, mostly for syntax errors, but also for a few edge conditions.

Comments welcome.
msg250344 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2015-09-10 01:07
On Sep 09, 2015, at 11:57 PM, Eric V. Smith wrote:

>There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm
>not sure if we should accept the upper case version. I'd think not, but all
>of the other ones (b, r, and u) do.

I think it should be consistent with the other prefixes.  Shouldn't be a big
deal to amend the PEP to describe this.
msg250354 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-10 08:04
I discussed it with Guido and added 'F' to the PEP.
msg250420 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-10 21:13
This version does dynamic allocation for the expression list, and fixes some memory leaks and early decrefs.

I think it's complete, but I'll take some more passes through it checking for leaks.
msg250422 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-10 21:28
The good news is that the performance is pretty good, and finally I have a case where I can beat %-formatting:

$ ./python.bat -mtimeit -s 'a=2' "'%s' % a"
1000000 loops, best of 3: 0.883 usec per loop

$ ./python.bat -mtimeit -s 'a=2' '"{}".format(a)'
1000000 loops, best of 3: 1.16 usec per loop

$ ./python.bat -mtimeit -s 'a=2' 'f"{a}"'
1000000 loops, best of 3: 0.792 usec per loop

This example is mildly contrived, and the performance of f-strings is slightly worse than %-formatting once the f-strings contains both expressions and literals.

I could speed it up significantly (I think) by adding opcodes for 2 things: calling __format__ and joining the strings together. Calling __format__ in an opcode could be a win because I could optimize for known types (str, int, float). Having a join opcode would be a win because I could use _PyUnicodeWriter instead of ''.join.

I'm inclined to check this code in as-is, then optimize it later, if we think it's needed and if I get motivated.

For reference, here's the ast and opcodes for f'a={a}':

>>> ast.dump(ast.parse("f'a={a}'"))
"Module(body=[Expr(value=JoinedStr(values=[Str(s='a='), FormattedValue(value=Name(id='a', ctx=Load()), conversion=0, format_spec=None)]))])"

>>> dis.dis("f'a={a}'")
  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('a=')
              9 LOAD_NAME                1 (a)
             12 LOAD_ATTR                2 (__format__)
             15 LOAD_CONST               0 ('')
             18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             21 BUILD_LIST               2
             24 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             27 RETURN_VALUE
msg250465 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-11 12:09
Another version of that AST that is better for my digestion:

f'a={a}'

Module(body=[Expr(
    value=JoinedStr(values=[
        Str(s='a='),
        FormattedValue(
            value=Name(id='a', ctx=Load()),
            conversion=0,
            format_spec=None,
        ),
    ]),
)])

I have been reading over the test cases, and left a bunch of suggestions for more edge cases etc. Some of them might reflect that I haven’t completely learnt how the inner Python expression syntax, outer string escaping syntax, {{curly bracket}} escaping, automatic concatenation, etc, are all meant to fit together.
msg250467 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-11 12:40
Thanks, Martin. I've posted my replies. I'll add some more tests, and work on the triple quoted string bug.
msg250485 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-11 18:17
Thanks again, Martin. I've found 4 bugs so far, based on your suggested tests. The ones I haven't fixed are: 'fur' strings don't work (something in the lexer), and triple quoted strings don't work correctly. I'm working on both of those, and should have an updated patch in the next day or so.
msg250493 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-11 19:48
It turns out 'fur' strings aren't a thing, because 'ur' strings aren't.

From tokenizer.c:
/* ur"" and ru"" are not supported */

And the PEP:
https://www.python.org/dev/peps/pep-0414/#exclusion-of-raw-unicode-literals

I'll add a test to make sure this fails.

So I just need to work on the triple-quoted string problem.
msg250525 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-12 16:40
After discussing it with Guido, I've removed the ability to combine 'f' with 'u'.
msg250527 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * (Python triager) Date: 2015-09-12 17:29
I've started working on implementing this feature in Cython and I'd like to confirm a few edge cases:

- f'{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '2' (string escape rules work even within the expressions)
- f'{ '''foo''' }' is a syntax error
- f'{ """foo 'bar'""" }' is a syntax error
msg250528 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-12 17:36
Yes, Jelle, you are correct in all 3 cases. Remember that the steps are to extract the string from the source code, decode backslash escapes, and only then treat it as an f-string.

For the first case, without the 'f' prefix:
'{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '{ {1: 2}[1]}'

Then, applying the 'f':
f'{ {1: 2}[1]}' == '2'.

For the last 2, since they're syntax errors without the 'f', they're also syntax errors with the 'f'.

I'll have a new version, with tests for all of these cases, posted in the next few hours. You can leverage the tests.
msg250529 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * (Python triager) Date: 2015-09-12 20:34
Thanks! Here are a few more cases I came across with the existing implementation:

>>> f"{'a\\'b'}"
  File "<stdin>", line 1
SyntaxError: missing '}' in format string expression

I believe this is valid and should produce "a'b".

>>> f"{x!s!s}"
  File "<stdin>", line 1
SyntaxError: single '}' encountered in format string

Could use a better error message.

>>> x = 3
>>> f"{x!s{y}}"
'3y}'

Not sure how this happened.
msg250530 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-12 21:24
This one has been fixed:
>>> f"{'a\\'b'}"
"a'b"

This one was a bug that I previously fixed, that Martin pointed out:
>>> f"{x!s!s}"
  File "<stdin>", line 1
SyntaxError: invalid character following conversion character

And this is the same bug:
>>> f"{x!s{y}}"
  File "<stdin>", line 1
SyntaxError: invalid character following conversion character

I'm wrapping up my new code plus tests. I'll post it Real Soon Now.

Thanks for your help.
msg250531 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-12 21:24
Regarding wrong error messages, I’ve learnt the hard way that it is often best to use assertRaisesRegex() instead of assertRaises(), to ensure that the actual exception you have in mind is being triggered, rather than a typo or something. Though that might upset your assertSyntaxErrors() helper.
msg250532 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-12 21:25
Agreed on checking the error messages better. Especially since even the simplest of errors (like leaving out a quote) results in a syntax error in parsing the string, not parsing inside the f-string.

I'll look at it eventually.
msg250538 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-12 21:58
This patch fixes triple-quoted strings, plus a few bugs. I'm going to commit it tomorrow, barring any unforeseen issues.
msg250541 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-12 22:58
I'll probably ensure that all of the parsing errors contain "format string" or "f-string" or similar. That way the regex check is easier, and the user can search for it more easily.

It remains to be seen how these are referenced in the documentation. "f-string" seems much easier to say and search for, but seems too slangy for the docs. But "format string" seems ambiguous and hard to search for. I guess time will tell.
msg250544 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * (Python triager) Date: 2015-09-13 00:15
Is this behavior intentional?

>>> str = len
>>> x = 'foo'
>>> f'{x!s}'
'3'
>>> '{!s}'.format(x)
'foo'

Or similarly:

>>> import builtins
>>> del builtins.repr
>>> f'{x!r}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'repr' is not defined
msg250546 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-13 00:23
Both of those are known (to me!) byproducts of the current implementation. If my crazy idea of adding opcodes to speed up f-strings flies, then this issue will go away. I consider this a corner case that doesn't need to be addressed before committing this code. I wouldn't emulate it one way or the other just yet.
msg250548 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-13 04:21
I’m actually trying out your patch now. A couple strange errors and observations:

>>> f"{'{'}"  # Why is this allowed in an outer format expression--
'{'
>>> f"{3:{'{'}>10}"  # --but not inside a format specifier?
SyntaxError: nesting of '{' in format specifier is not allowed
>>> opening = "{"; f"{3:{opening}>10}"  # Workaround
'{{{{{{{{{3'
>>> f"{3:{'}'}<10}"  # Error message is very strange!
SyntaxError: missing '}' in format string expression
>>> f"{\x00}"  # It seems this is treated as a null terminator
  File "<fstring>", line 1
    (
    ^
SyntaxError: unexpected EOF while parsing
>>> f"{'s'!\x00:.<10}"  # Default conversion is the null character?
's.........'
msg250557 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-13 11:39
On 9/13/2015 12:21 AM, Martin Panter wrote:
>>>> f"{'{'}"  # Why is this allowed in an outer format expression--
> '{'
>>>> f"{3:{'{'}>10}"  # --but not inside a format specifier?

This is me being lazy about detecting recursion. I'll fix it.

>>>> f"{\x00}"  # It seems this is treated as a null terminator
>   File "<fstring>", line 1
>     (
>     ^
> SyntaxError: unexpected EOF while parsing

This is a byproduct of using PyParser_ASTFromString. I'm not particularly included to do anything about it. Is there any practical use case?

>>>> f"{'s'!\x00:.<10}"  # Default conversion is the null character?
> 's.........'

Yes, that's the default. I'll switch to -1, which I think won't have this issue.

Thanks for the review.
msg250598 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-14 02:42
Regarding the null terminator, I was mainly smoke testing your code. :) Maybe it would be too hard to support properly. Although I could imagine someone doing things like this:

>>> d = {b"key\x00": "value"}
>>> f"key={d[b'key\x00']}"  # Oops, escape code at wrong level
  File "<fstring>", line 1
    (d[b'key
           ^
SyntaxError: EOL while scanning string literal
>>> rf"key={d[b'key\x00']}"  # Corrected
'key=value'

I also finished quickly reading over the C code, with a couple more review comments. But I am not familiar with the files involved to thoroughly review.
msg250821 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-16 07:12
I rewrote the format_spec parser to recursively call the f-string parser, so any oddness in what's allowed in a format_spec is gone.

It took way longer than I thought, but the code is better for it.
msg250849 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-16 14:16
Simplified error handling, fixed 2 memory leaks.

All tests now pass with no leaks.

This should be the final version.
msg250926 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-18 00:36
Another strange error message (though maybe the new test changes you mentioned caught this):

>>> f'{3:{10}'  # Actually missing a closing bracket '}'
  File "<stdin>", line 1
SyntaxError: f-string: unexpected '}'
msg250927 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-18 00:40
> Martin Panter added the comment:
> 
> Another strange error message (though maybe the new test changes you mentioned caught this):
> 
>>>> f'{3:{10}'  # Actually missing a closing bracket '}'
>   File "<stdin>", line 1
> SyntaxError: f-string: unexpected '}'

Yes, I found that one, too. Sorry to waste your time on this, but I literally just finished the test changes 15 minutes ago.
msg250929 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-18 01:22
Hopefully the last version.
msg250931 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-18 04:11
I left a few more comments on Reitveld. Checking the error messages does make me feel a lot more comfortable though.
msg250970 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-18 12:13
Cleaned up the error handling in fstring_expression_compile so it's easier to verify and more robust in the face of future changes.

Added a test for an un-doubled '}', which is an error in a top-level literal (and ends a nested expression). Modified existing tests to match.
msg251073 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-19 10:49
I changed the generated code to call:
format(x [, spec])

instead of:
x.__format__(spec)

The reason is that the correct way to call __format__ is actually:
type(x).__format__(x, spec)

That is, the __format__ lookup is done on the type, not the instance. From the earlier example, the disassembled code is now:

>>> dis.dis("f'a={a}'")
  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('a=')
              9 LOAD_GLOBAL              1 (format)
             12 LOAD_NAME                2 (a)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 BUILD_LIST               2
             21 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             24 RETURN_VALUE

The simplest way to make the lookup correctly is just to call format() itself, which does the right thing.

I still have a concept of adding opcodes to handle FormattedValue and JoinedStr nodes, but that's an optimization for later, if ever.
msg251100 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-19 18:52
New changeset a10d37f04569 by Eric V. Smith in branch 'default':
Issue #24965: Implement PEP 498 "Literal String Interpolation". Documentation is still needed, I'll open an issue for that.
https://hg.python.org/cpython/rev/a10d37f04569
msg251102 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015-09-19 18:57
Documentation task added as issue #25179.

Thanks to Martin for the great code reviews.
History
Date User Action Args
2015-09-19 18:57:44eric.smithsetstatus: open -> closed
resolution: fixed
messages: + msg251102
2015-09-19 18:52:03python-devsetmessages: + msg251100
2015-09-19 10:49:42eric.smithsetfiles: + pep-498-10.diff

messages: + msg251073
2015-09-18 13:37:15Rosuavsetnosy: + Rosuav
2015-09-18 12:13:55eric.smithsetfiles: + pep-498-9.diff

messages: + msg250970
2015-09-18 04:11:09martin.pantersetmessages: + msg250931
2015-09-18 01:22:32eric.smithsetfiles: + pep-498-8.diff

messages: + msg250929
2015-09-18 00:40:16eric.smithsetmessages: + msg250927
2015-09-18 00:36:49martin.pantersetmessages: + msg250926
2015-09-16 14:16:29eric.smithsetfiles: + pep-498-7.diff

messages: + msg250849
2015-09-16 07:12:37eric.smithsetfiles: + pep-498-6.diff

messages: + msg250821
2015-09-14 02:43:00martin.pantersetmessages: + msg250598
2015-09-13 15:45:05yselivanovsetnosy: + elvis
2015-09-13 11:39:43eric.smithsetmessages: + msg250557
2015-09-13 04:21:21martin.pantersetmessages: + msg250548
2015-09-13 00:23:08eric.smithsetmessages: + msg250546
2015-09-13 00:15:36Jelle Zijlstrasetmessages: + msg250544
2015-09-12 22:58:11eric.smithsetmessages: + msg250541
2015-09-12 21:58:18eric.smithsetfiles: + pep-498-5.diff

messages: + msg250538
2015-09-12 21:25:55eric.smithsetmessages: + msg250532
2015-09-12 21:24:24martin.pantersetmessages: + msg250531
2015-09-12 21:24:18eric.smithsetmessages: + msg250530
2015-09-12 20:34:25Jelle Zijlstrasetmessages: + msg250529
2015-09-12 17:36:01eric.smithsetmessages: + msg250528
2015-09-12 17:29:59Jelle Zijlstrasetnosy: + Jelle Zijlstra
messages: + msg250527
2015-09-12 16:40:47eric.smithsetmessages: + msg250525
2015-09-11 19:48:28eric.smithsetmessages: + msg250493
2015-09-11 18:17:14eric.smithsetmessages: + msg250485
2015-09-11 12:40:40eric.smithsetmessages: + msg250467
2015-09-11 12:09:02martin.pantersetnosy: + martin.panter
messages: + msg250465
2015-09-10 21:28:46eric.smithsetmessages: + msg250422
2015-09-10 21:13:34eric.smithsetfiles: + pep-498-4.diff
2015-09-10 21:13:05eric.smithsetmessages: + msg250420
2015-09-10 08:04:18eric.smithsetmessages: + msg250354
2015-09-10 01:07:09barrysetmessages: + msg250344
2015-09-09 23:57:42eric.smithsetfiles: + pep-498-3.diff

messages: + msg250343
2015-09-09 23:17:31eric.smithsetfiles: - pep-498-2.diff
2015-09-09 23:17:23eric.smithsetfiles: - pep-498-1.diff
2015-09-09 23:17:17eric.smithsetfiles: - pep-498.diff
2015-09-04 19:01:19python-devsetnosy: + python-dev
messages: + msg249810
2015-09-01 16:37:36yselivanovsetnosy: + yselivanov
2015-09-01 13:04:41eric.smithsetfiles: + pep-498-2.diff

messages: + msg249481
2015-09-01 11:30:07eric.smithsetfiles: + pep-498-1.diff

messages: + msg249475
2015-08-30 19:26:27barrysetnosy: + barry
2015-08-30 17:59:59eric.smithsetmessages: + msg249365
2015-08-30 17:51:10eric.smithsetmessages: + msg249364
2015-08-30 17:47:00eric.smithcreate