New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PEP 498: Literal String Formatting #69153
Comments
See PEP-498. >>> f'New for Python {sys.version.split()[0]}'
'New for Python 3.6.0a0' |
One thing I've done in this implementation is to build up a string to pass to str.format(), instead of using the original string. This new string uses positional parameters instead of named parameters. I had originally proposed to add a string.interpolate() to do the heavy lifting here, which would have meant I could use the original string (as seen in the source code), and not build up a new string and pass it to str.format(). I still might do that, but for now, the approach using str.format() is good enough. |
Oops, I didn't really mean to include imporlib.h. Oh, well. |
Fixed validate_exprs bug. |
Make sure f-strings are identified as literals in error messages. |
New changeset a0194ec4195c by Eric V. Smith in branch 'default': |
This implements the accepted PEP-498. The only other real change I plan on making is to do dynamic memory allocation when building the expressions that make up a JoinedStr AST node. The code has all of the places to do that already laid out, it's just a matter of hooking it up. There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm not sure if we should accept the upper case version. I'd think not, but all of the other ones (b, r, and u) do. I need to do one more scan for memory leaks. I've rearranged some code since the last time I checked for leaks, and that's always a recipe for some sneaking in. And I need to write some more tests, mostly for syntax errors, but also for a few edge conditions. Comments welcome. |
On Sep 09, 2015, at 11:57 PM, Eric V. Smith wrote:
I think it should be consistent with the other prefixes. Shouldn't be a big |
I discussed it with Guido and added 'F' to the PEP. |
This version does dynamic allocation for the expression list, and fixes some memory leaks and early decrefs. I think it's complete, but I'll take some more passes through it checking for leaks. |
The good news is that the performance is pretty good, and finally I have a case where I can beat %-formatting: $ ./python.bat -mtimeit -s 'a=2' "'%s' % a"
1000000 loops, best of 3: 0.883 usec per loop
$ ./python.bat -mtimeit -s 'a=2' '"{}".format(a)'
1000000 loops, best of 3: 1.16 usec per loop
$ ./python.bat -mtimeit -s 'a=2' 'f"{a}"'
1000000 loops, best of 3: 0.792 usec per loop This example is mildly contrived, and the performance of f-strings is slightly worse than %-formatting once the f-strings contains both expressions and literals. I could speed it up significantly (I think) by adding opcodes for 2 things: calling __format__ and joining the strings together. Calling __format__ in an opcode could be a win because I could optimize for known types (str, int, float). Having a join opcode would be a win because I could use _PyUnicodeWriter instead of ''.join. I'm inclined to check this code in as-is, then optimize it later, if we think it's needed and if I get motivated. For reference, here's the ast and opcodes for f'a={a}': >>> ast.dump(ast.parse("f'a={a}'"))
"Module(body=[Expr(value=JoinedStr(values=[Str(s='a='), FormattedValue(value=Name(id='a', ctx=Load()), conversion=0, format_spec=None)]))])"
>>> dis.dis("f'a={a}'")
1 0 LOAD_CONST 0 ('')
3 LOAD_ATTR 0 (join)
6 LOAD_CONST 1 ('a=')
9 LOAD_NAME 1 (a)
12 LOAD_ATTR 2 (__format__)
15 LOAD_CONST 0 ('')
18 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
21 BUILD_LIST 2
24 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
27 RETURN_VALUE |
Another version of that AST that is better for my digestion: f'a={a}' Module(body=[Expr( I have been reading over the test cases, and left a bunch of suggestions for more edge cases etc. Some of them might reflect that I haven’t completely learnt how the inner Python expression syntax, outer string escaping syntax, {{curly bracket}} escaping, automatic concatenation, etc, are all meant to fit together. |
Thanks, Martin. I've posted my replies. I'll add some more tests, and work on the triple quoted string bug. |
Thanks again, Martin. I've found 4 bugs so far, based on your suggested tests. The ones I haven't fixed are: 'fur' strings don't work (something in the lexer), and triple quoted strings don't work correctly. I'm working on both of those, and should have an updated patch in the next day or so. |
It turns out 'fur' strings aren't a thing, because 'ur' strings aren't. From tokenizer.c: And the PEP: I'll add a test to make sure this fails. So I just need to work on the triple-quoted string problem. |
After discussing it with Guido, I've removed the ability to combine 'f' with 'u'. |
I've started working on implementing this feature in Cython and I'd like to confirm a few edge cases:
|
Yes, Jelle, you are correct in all 3 cases. Remember that the steps are to extract the string from the source code, decode backslash escapes, and only then treat it as an f-string. For the first case, without the 'f' prefix: Then, applying the 'f': For the last 2, since they're syntax errors without the 'f', they're also syntax errors with the 'f'. I'll have a new version, with tests for all of these cases, posted in the next few hours. You can leverage the tests. |
Thanks! Here are a few more cases I came across with the existing implementation: >>> f"{'a\\'b'}"
File "<stdin>", line 1
SyntaxError: missing '}' in format string expression I believe this is valid and should produce "a'b". >>> f"{x!s!s}"
File "<stdin>", line 1
SyntaxError: single '}' encountered in format string Could use a better error message. >>> x = 3
>>> f"{x!s{y}}"
'3y}' Not sure how this happened. |
This one has been fixed:
>>> f"{'a\\'b'}"
"a'b"
This one was a bug that I previously fixed, that Martin pointed out:
>>> f"{x!s!s}"
File "<stdin>", line 1
SyntaxError: invalid character following conversion character
And this is the same bug:
>>> f"{x!s{y}}"
File "<stdin>", line 1
SyntaxError: invalid character following conversion character I'm wrapping up my new code plus tests. I'll post it Real Soon Now. Thanks for your help. |
Regarding wrong error messages, I’ve learnt the hard way that it is often best to use assertRaisesRegex() instead of assertRaises(), to ensure that the actual exception you have in mind is being triggered, rather than a typo or something. Though that might upset your assertSyntaxErrors() helper. |
Agreed on checking the error messages better. Especially since even the simplest of errors (like leaving out a quote) results in a syntax error in parsing the string, not parsing inside the f-string. I'll look at it eventually. |
This patch fixes triple-quoted strings, plus a few bugs. I'm going to commit it tomorrow, barring any unforeseen issues. |
I'll probably ensure that all of the parsing errors contain "format string" or "f-string" or similar. That way the regex check is easier, and the user can search for it more easily. It remains to be seen how these are referenced in the documentation. "f-string" seems much easier to say and search for, but seems too slangy for the docs. But "format string" seems ambiguous and hard to search for. I guess time will tell. |
Is this behavior intentional? >>> str = len
>>> x = 'foo'
>>> f'{x!s}'
'3'
>>> '{!s}'.format(x)
'foo' Or similarly: >>> import builtins
>>> del builtins.repr
>>> f'{x!r}'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'repr' is not defined |
Both of those are known (to me!) byproducts of the current implementation. If my crazy idea of adding opcodes to speed up f-strings flies, then this issue will go away. I consider this a corner case that doesn't need to be addressed before committing this code. I wouldn't emulate it one way or the other just yet. |
I’m actually trying out your patch now. A couple strange errors and observations: >>> f"{'{'}" # Why is this allowed in an outer format expression--
'{'
>>> f"{3:{'{'}>10}" # --but not inside a format specifier?
SyntaxError: nesting of '{' in format specifier is not allowed
>>> opening = "{"; f"{3:{opening}>10}" # Workaround
'{{{{{{{{{3'
>>> f"{3:{'}'}<10}" # Error message is very strange!
SyntaxError: missing '}' in format string expression
>>> f"{\x00}" # It seems this is treated as a null terminator
File "<fstring>", line 1
(
^
SyntaxError: unexpected EOF while parsing
>>> f"{'s'!\x00:.<10}" # Default conversion is the null character?
's.........' |
On 9/13/2015 12:21 AM, Martin Panter wrote:
>>>> f"{'{'}" # Why is this allowed in an outer format expression--
> '{'
>>>> f"{3:{'{'}>10}" # --but not inside a format specifier? This is me being lazy about detecting recursion. I'll fix it.
This is a byproduct of using PyParser_ASTFromString. I'm not particularly included to do anything about it. Is there any practical use case?
Yes, that's the default. I'll switch to -1, which I think won't have this issue. Thanks for the review. |
Regarding the null terminator, I was mainly smoke testing your code. :) Maybe it would be too hard to support properly. Although I could imagine someone doing things like this: >>> d = {b"key\x00": "value"}
>>> f"key={d[b'key\x00']}" # Oops, escape code at wrong level
File "<fstring>", line 1
(d[b'key
^
SyntaxError: EOL while scanning string literal
>>> rf"key={d[b'key\x00']}" # Corrected
'key=value' I also finished quickly reading over the C code, with a couple more review comments. But I am not familiar with the files involved to thoroughly review. |
I rewrote the format_spec parser to recursively call the f-string parser, so any oddness in what's allowed in a format_spec is gone. It took way longer than I thought, but the code is better for it. |
Simplified error handling, fixed 2 memory leaks. All tests now pass with no leaks. This should be the final version. |
Another strange error message (though maybe the new test changes you mentioned caught this): >>> f'{3:{10}' # Actually missing a closing bracket '}'
File "<stdin>", line 1
SyntaxError: f-string: unexpected '}' |
Yes, I found that one, too. Sorry to waste your time on this, but I literally just finished the test changes 15 minutes ago. |
Hopefully the last version. |
I left a few more comments on Reitveld. Checking the error messages does make me feel a lot more comfortable though. |
Cleaned up the error handling in fstring_expression_compile so it's easier to verify and more robust in the face of future changes. Added a test for an un-doubled '}', which is an error in a top-level literal (and ends a nested expression). Modified existing tests to match. |
I changed the generated code to call: instead of: The reason is that the correct way to call __format__ is actually: That is, the __format__ lookup is done on the type, not the instance. From the earlier example, the disassembled code is now: >>> dis.dis("f'a={a}'")
1 0 LOAD_CONST 0 ('')
3 LOAD_ATTR 0 (join)
6 LOAD_CONST 1 ('a=')
9 LOAD_GLOBAL 1 (format)
12 LOAD_NAME 2 (a)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 BUILD_LIST 2
21 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
24 RETURN_VALUE The simplest way to make the lookup correctly is just to call format() itself, which does the right thing. I still have a concept of adding opcodes to handle FormattedValue and JoinedStr nodes, but that's an optimization for later, if ever. |
New changeset a10d37f04569 by Eric V. Smith in branch 'default': |
Documentation task added as issue bpo-25179. Thanks to Martin for the great code reviews. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: