Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PEP 498: Literal String Formatting #69153

Closed
ericvsmith opened this issue Aug 30, 2015 · 39 comments
Closed

Implement PEP 498: Literal String Formatting #69153

ericvsmith opened this issue Aug 30, 2015 · 39 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@ericvsmith
Copy link
Member

BPO 24965
Nosy @warsaw, @ericvsmith, @Rosuav, @vadmium, @1st1, @JelleZijlstra
Files
  • pep-498-3.diff
  • pep-498-4.diff
  • pep-498-5.diff
  • pep-498-6.diff
  • pep-498-7.diff
  • pep-498-8.diff
  • pep-498-9.diff
  • pep-498-10.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2015-09-19.18:57:44.348>
    created_at = <Date 2015-08-30.17:47:00.728>
    labels = ['interpreter-core', 'type-feature']
    title = 'Implement PEP 498: Literal String Formatting'
    updated_at = <Date 2015-09-19.18:57:44.346>
    user = 'https://github.com/ericvsmith'

    bugs.python.org fields:

    activity = <Date 2015-09-19.18:57:44.346>
    actor = 'eric.smith'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2015-09-19.18:57:44.348>
    closer = 'eric.smith'
    components = ['Interpreter Core']
    creation = <Date 2015-08-30.17:47:00.728>
    creator = 'eric.smith'
    dependencies = []
    files = ['40421', '40430', '40447', '40480', '40484', '40495', '40502', '40515']
    hgrepos = []
    issue_num = 24965
    keywords = ['patch']
    message_count = 39.0
    messages = ['249362', '249364', '249365', '249475', '249481', '249810', '250343', '250344', '250354', '250420', '250422', '250465', '250467', '250485', '250493', '250525', '250527', '250528', '250529', '250530', '250531', '250532', '250538', '250541', '250544', '250546', '250548', '250557', '250598', '250821', '250849', '250926', '250927', '250929', '250931', '250970', '251073', '251100', '251102']
    nosy_count = 8.0
    nosy_names = ['barry', 'eric.smith', 'python-dev', 'Rosuav', 'martin.panter', 'yselivanov', 'JelleZijlstra', 'elvis']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24965'
    versions = ['Python 3.6']

    @ericvsmith
    Copy link
    Member Author

    See PEP-498.

    >>> f'New for Python {sys.version.split()[0]}'
    'New for Python 3.6.0a0'

    @ericvsmith ericvsmith self-assigned this Aug 30, 2015
    @ericvsmith ericvsmith added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Aug 30, 2015
    @ericvsmith
    Copy link
    Member Author

    One thing I've done in this implementation is to build up a string to pass to str.format(), instead of using the original string. This new string uses positional parameters instead of named parameters.

    I had originally proposed to add a string.interpolate() to do the heavy lifting here, which would have meant I could use the original string (as seen in the source code), and not build up a new string and pass it to str.format(). I still might do that, but for now, the approach using str.format() is good enough.

    @ericvsmith
    Copy link
    Member Author

    Oops, I didn't really mean to include imporlib.h. Oh, well.

    @ericvsmith
    Copy link
    Member Author

    Fixed validate_exprs bug.

    @ericvsmith
    Copy link
    Member Author

    Make sure f-strings are identified as literals in error messages.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 4, 2015

    New changeset a0194ec4195c by Eric V. Smith in branch 'default':
    Removed Implementation Limitations section. While the version of the code on http://bugs.python.org/issue24965 has the 255 expression limitation, I'm going to remove this limit. The i18n section was purely speculative. We can worry about it if/when we add i18n and i-strings.
    https://hg.python.org/peps/rev/a0194ec4195c

    @ericvsmith
    Copy link
    Member Author

    This implements the accepted PEP-498. The only other real change I plan on making is to do dynamic memory allocation when building the expressions that make up a JoinedStr AST node. The code has all of the places to do that already laid out, it's just a matter of hooking it up.

    There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm not sure if we should accept the upper case version. I'd think not, but all of the other ones (b, r, and u) do.

    I need to do one more scan for memory leaks. I've rearranged some code since the last time I checked for leaks, and that's always a recipe for some sneaking in.

    And I need to write some more tests, mostly for syntax errors, but also for a few edge conditions.

    Comments welcome.

    @warsaw
    Copy link
    Member

    warsaw commented Sep 10, 2015

    On Sep 09, 2015, at 11:57 PM, Eric V. Smith wrote:

    There's one nit where I accept 'f' and 'F', but the PEP just says 'f'. I'm
    not sure if we should accept the upper case version. I'd think not, but all
    of the other ones (b, r, and u) do.

    I think it should be consistent with the other prefixes. Shouldn't be a big
    deal to amend the PEP to describe this.

    @ericvsmith
    Copy link
    Member Author

    I discussed it with Guido and added 'F' to the PEP.

    @ericvsmith
    Copy link
    Member Author

    This version does dynamic allocation for the expression list, and fixes some memory leaks and early decrefs.

    I think it's complete, but I'll take some more passes through it checking for leaks.

    @ericvsmith
    Copy link
    Member Author

    The good news is that the performance is pretty good, and finally I have a case where I can beat %-formatting:

    $ ./python.bat -mtimeit -s 'a=2' "'%s' % a"
    1000000 loops, best of 3: 0.883 usec per loop
    
    $ ./python.bat -mtimeit -s 'a=2' '"{}".format(a)'
    1000000 loops, best of 3: 1.16 usec per loop
    
    $ ./python.bat -mtimeit -s 'a=2' 'f"{a}"'
    1000000 loops, best of 3: 0.792 usec per loop

    This example is mildly contrived, and the performance of f-strings is slightly worse than %-formatting once the f-strings contains both expressions and literals.

    I could speed it up significantly (I think) by adding opcodes for 2 things: calling __format__ and joining the strings together. Calling __format__ in an opcode could be a win because I could optimize for known types (str, int, float). Having a join opcode would be a win because I could use _PyUnicodeWriter instead of ''.join.

    I'm inclined to check this code in as-is, then optimize it later, if we think it's needed and if I get motivated.

    For reference, here's the ast and opcodes for f'a={a}':

    >>> ast.dump(ast.parse("f'a={a}'"))
    "Module(body=[Expr(value=JoinedStr(values=[Str(s='a='), FormattedValue(value=Name(id='a', ctx=Load()), conversion=0, format_spec=None)]))])"
    
    >>> dis.dis("f'a={a}'")
      1           0 LOAD_CONST               0 ('')
                  3 LOAD_ATTR                0 (join)
                  6 LOAD_CONST               1 ('a=')
                  9 LOAD_NAME                1 (a)
                 12 LOAD_ATTR                2 (__format__)
                 15 LOAD_CONST               0 ('')
                 18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 21 BUILD_LIST               2
                 24 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 27 RETURN_VALUE

    @vadmium
    Copy link
    Member

    vadmium commented Sep 11, 2015

    Another version of that AST that is better for my digestion:

    f'a={a}'

    Module(body=[Expr(
    value=JoinedStr(values=[
    Str(s='a='),
    FormattedValue(
    value=Name(id='a', ctx=Load()),
    conversion=0,
    format_spec=None,
    ),
    ]),
    )])

    I have been reading over the test cases, and left a bunch of suggestions for more edge cases etc. Some of them might reflect that I haven’t completely learnt how the inner Python expression syntax, outer string escaping syntax, {{curly bracket}} escaping, automatic concatenation, etc, are all meant to fit together.

    @ericvsmith
    Copy link
    Member Author

    Thanks, Martin. I've posted my replies. I'll add some more tests, and work on the triple quoted string bug.

    @ericvsmith
    Copy link
    Member Author

    Thanks again, Martin. I've found 4 bugs so far, based on your suggested tests. The ones I haven't fixed are: 'fur' strings don't work (something in the lexer), and triple quoted strings don't work correctly. I'm working on both of those, and should have an updated patch in the next day or so.

    @ericvsmith
    Copy link
    Member Author

    It turns out 'fur' strings aren't a thing, because 'ur' strings aren't.

    From tokenizer.c:
    /* ur"" and ru"" are not supported */

    And the PEP:
    https://www.python.org/dev/peps/pep-0414/#exclusion-of-raw-unicode-literals

    I'll add a test to make sure this fails.

    So I just need to work on the triple-quoted string problem.

    @ericvsmith
    Copy link
    Member Author

    After discussing it with Guido, I've removed the ability to combine 'f' with 'u'.

    @JelleZijlstra
    Copy link
    Member

    I've started working on implementing this feature in Cython and I'd like to confirm a few edge cases:

    • f'{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '2' (string escape rules work even within the expressions)
    • f'{ '''foo''' }' is a syntax error
    • f'{ """foo 'bar'""" }' is a syntax error

    @ericvsmith
    Copy link
    Member Author

    Yes, Jelle, you are correct in all 3 cases. Remember that the steps are to extract the string from the source code, decode backslash escapes, and only then treat it as an f-string.

    For the first case, without the 'f' prefix:
    '{ {1: 2\N{RIGHT CURLY BRACKET}[1]}' == '{ {1: 2}[1]}'

    Then, applying the 'f':
    f'{ {1: 2}[1]}' == '2'.

    For the last 2, since they're syntax errors without the 'f', they're also syntax errors with the 'f'.

    I'll have a new version, with tests for all of these cases, posted in the next few hours. You can leverage the tests.

    @JelleZijlstra
    Copy link
    Member

    Thanks! Here are a few more cases I came across with the existing implementation:

    >>> f"{'a\\'b'}"
      File "<stdin>", line 1
    SyntaxError: missing '}' in format string expression

    I believe this is valid and should produce "a'b".

    >>> f"{x!s!s}"
      File "<stdin>", line 1
    SyntaxError: single '}' encountered in format string

    Could use a better error message.

    >>> x = 3
    >>> f"{x!s{y}}"
    '3y}'

    Not sure how this happened.

    @ericvsmith
    Copy link
    Member Author

    This one has been fixed:
    >>> f"{'a\\'b'}"
    "a'b"
    
    This one was a bug that I previously fixed, that Martin pointed out:
    >>> f"{x!s!s}"
      File "<stdin>", line 1
    SyntaxError: invalid character following conversion character
    
    And this is the same bug:
    >>> f"{x!s{y}}"
      File "<stdin>", line 1
    SyntaxError: invalid character following conversion character

    I'm wrapping up my new code plus tests. I'll post it Real Soon Now.

    Thanks for your help.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 12, 2015

    Regarding wrong error messages, I’ve learnt the hard way that it is often best to use assertRaisesRegex() instead of assertRaises(), to ensure that the actual exception you have in mind is being triggered, rather than a typo or something. Though that might upset your assertSyntaxErrors() helper.

    @ericvsmith
    Copy link
    Member Author

    Agreed on checking the error messages better. Especially since even the simplest of errors (like leaving out a quote) results in a syntax error in parsing the string, not parsing inside the f-string.

    I'll look at it eventually.

    @ericvsmith
    Copy link
    Member Author

    This patch fixes triple-quoted strings, plus a few bugs. I'm going to commit it tomorrow, barring any unforeseen issues.

    @ericvsmith
    Copy link
    Member Author

    I'll probably ensure that all of the parsing errors contain "format string" or "f-string" or similar. That way the regex check is easier, and the user can search for it more easily.

    It remains to be seen how these are referenced in the documentation. "f-string" seems much easier to say and search for, but seems too slangy for the docs. But "format string" seems ambiguous and hard to search for. I guess time will tell.

    @JelleZijlstra
    Copy link
    Member

    Is this behavior intentional?

    >>> str = len
    >>> x = 'foo'
    >>> f'{x!s}'
    '3'
    >>> '{!s}'.format(x)
    'foo'

    Or similarly:

    >>> import builtins
    >>> del builtins.repr
    >>> f'{x!r}'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'repr' is not defined

    @ericvsmith
    Copy link
    Member Author

    Both of those are known (to me!) byproducts of the current implementation. If my crazy idea of adding opcodes to speed up f-strings flies, then this issue will go away. I consider this a corner case that doesn't need to be addressed before committing this code. I wouldn't emulate it one way or the other just yet.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 13, 2015

    I’m actually trying out your patch now. A couple strange errors and observations:

    >>> f"{'{'}"  # Why is this allowed in an outer format expression--
    '{'
    >>> f"{3:{'{'}>10}"  # --but not inside a format specifier?
    SyntaxError: nesting of '{' in format specifier is not allowed
    >>> opening = "{"; f"{3:{opening}>10}"  # Workaround
    '{{{{{{{{{3'
    >>> f"{3:{'}'}<10}"  # Error message is very strange!
    SyntaxError: missing '}' in format string expression
    >>> f"{\x00}"  # It seems this is treated as a null terminator
      File "<fstring>", line 1
        (
        ^
    SyntaxError: unexpected EOF while parsing
    >>> f"{'s'!\x00:.<10}"  # Default conversion is the null character?
    's.........'

    @ericvsmith
    Copy link
    Member Author

    On 9/13/2015 12:21 AM, Martin Panter wrote:
    >>>> f"{'{'}"  # Why is this allowed in an outer format expression--
    > '{'
    >>>> f"{3:{'{'}>10}"  # --but not inside a format specifier?

    This is me being lazy about detecting recursion. I'll fix it.

    >>> f"{\x00}" # It seems this is treated as a null terminator
    File "<fstring>", line 1
    (
    ^
    SyntaxError: unexpected EOF while parsing

    This is a byproduct of using PyParser_ASTFromString. I'm not particularly included to do anything about it. Is there any practical use case?

    >>> f"{'s'!\x00:.<10}" # Default conversion is the null character?
    's.........'

    Yes, that's the default. I'll switch to -1, which I think won't have this issue.

    Thanks for the review.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 14, 2015

    Regarding the null terminator, I was mainly smoke testing your code. :) Maybe it would be too hard to support properly. Although I could imagine someone doing things like this:

    >>> d = {b"key\x00": "value"}
    >>> f"key={d[b'key\x00']}"  # Oops, escape code at wrong level
      File "<fstring>", line 1
        (d[b'key
               ^
    SyntaxError: EOL while scanning string literal
    >>> rf"key={d[b'key\x00']}"  # Corrected
    'key=value'

    I also finished quickly reading over the C code, with a couple more review comments. But I am not familiar with the files involved to thoroughly review.

    @ericvsmith
    Copy link
    Member Author

    I rewrote the format_spec parser to recursively call the f-string parser, so any oddness in what's allowed in a format_spec is gone.

    It took way longer than I thought, but the code is better for it.

    @ericvsmith
    Copy link
    Member Author

    Simplified error handling, fixed 2 memory leaks.

    All tests now pass with no leaks.

    This should be the final version.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 18, 2015

    Another strange error message (though maybe the new test changes you mentioned caught this):

    >>> f'{3:{10}'  # Actually missing a closing bracket '}'
      File "<stdin>", line 1
    SyntaxError: f-string: unexpected '}'

    @ericvsmith
    Copy link
    Member Author

    Martin Panter added the comment:

    Another strange error message (though maybe the new test changes you mentioned caught this):

    >>> f'{3:{10}' # Actually missing a closing bracket '}'
    File "<stdin>", line 1
    SyntaxError: f-string: unexpected '}'

    Yes, I found that one, too. Sorry to waste your time on this, but I literally just finished the test changes 15 minutes ago.

    @ericvsmith
    Copy link
    Member Author

    Hopefully the last version.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 18, 2015

    I left a few more comments on Reitveld. Checking the error messages does make me feel a lot more comfortable though.

    @ericvsmith
    Copy link
    Member Author

    Cleaned up the error handling in fstring_expression_compile so it's easier to verify and more robust in the face of future changes.

    Added a test for an un-doubled '}', which is an error in a top-level literal (and ends a nested expression). Modified existing tests to match.

    @ericvsmith
    Copy link
    Member Author

    I changed the generated code to call:
    format(x [, spec])

    instead of:
    x.__format__(spec)

    The reason is that the correct way to call __format__ is actually:
    type(x).__format__(x, spec)

    That is, the __format__ lookup is done on the type, not the instance. From the earlier example, the disassembled code is now:

    >>> dis.dis("f'a={a}'")
      1           0 LOAD_CONST               0 ('')
                  3 LOAD_ATTR                0 (join)
                  6 LOAD_CONST               1 ('a=')
                  9 LOAD_GLOBAL              1 (format)
                 12 LOAD_NAME                2 (a)
                 15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 18 BUILD_LIST               2
                 21 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 24 RETURN_VALUE

    The simplest way to make the lookup correctly is just to call format() itself, which does the right thing.

    I still have a concept of adding opcodes to handle FormattedValue and JoinedStr nodes, but that's an optimization for later, if ever.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 19, 2015

    New changeset a10d37f04569 by Eric V. Smith in branch 'default':
    Issue bpo-24965: Implement PEP-498 "Literal String Interpolation". Documentation is still needed, I'll open an issue for that.
    https://hg.python.org/cpython/rev/a10d37f04569

    @ericvsmith
    Copy link
    Member Author

    Documentation task added as issue bpo-25179.

    Thanks to Martin for the great code reviews.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants