classification
Title: Compile time textwrap.dedent() equivalent for str or bytes literals
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Marco Sulla, gregory.p.smith, inada.naoki, josh.r, mbussonn, pablogsal, remi.lapeyre, rhettinger, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2019-05-13 18:40 by gregory.p.smith, last changed 2019-11-07 17:48 by Marco Sulla.

Pull Requests
URL Status Linked Edit
PR 13445 open remi.lapeyre, 2019-05-20 14:53
Messages (35)
msg342373 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-13 18:40
A Python pattern in code is to keep everything indented to look pretty while, yet when the triple quoted multiline string in question needs to not have leading whitespace, calling textwrap.dedent("""long multiline constant""") is a common pattern.

rather than doing this computation at runtime, this is something that'd make sense to do at compilation time.  A natural suggestion for this would be a new letter prefix for multiline string literals that triggers this.

Probably not worth "wasting" a letter on this, so I'll understand if we reject the idea, but it'd be nice to have rather than importing textwrap and calling it all over the place just for this purpose.

There are many workarounds but an actual syntax would enable writing code that looked like this:

```python
class Castle:
    def __init__(self, name, lyrics=None):
        if not lyrics:
            lyrics = df"""\
            We're knights of the round table
            We dance whene'er we're able
            We do routines and scenes
            With footwork impeccable.
            We dine well here in {name}
            We eat ham and jam and spam a lot.
            """
        self._name = name
        self._lyrics = lyrics
```

Without generating a larger temporary always in memory string literal in the code object that gets converted at runtime to the desired dedented form via a textwrap.dedent() call.  I chose "d" as the the letter to mean dedent.  I don't have a strong preference if we ever do make this a feature.
msg342407 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-05-14 00:40
I agree that this is a recurring need and would be nice to have.
msg342420 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-05-14 02:44
+1

There's a long thread on something similar here:

https://mail.python.org/pipermail/python-ideas/2018-March/049564.html

Carrying over into the following month:

https://mail.python.org/pipermail/python-ideas/2018-April/049582.html

Here's an even older thread:

https://mail.python.org/pipermail/python-ideas/2010-November/008589.html


In the more recent thread, I suggested that we give strings a dedent method. When called on a literal, the keyhole optimizer may do the dedent at compile time. Whether it does or not is a "quality of implementation" factor.

The idea is to avoid the combinational explosion of yet another string prefix:

    urd'...'  # unicode raw string dedent

while still making string dedents easily discoverable, and with a sufficiently good interpreter, string literals will be dedented at compile time avoiding any runtime cost:

https://mail.python.org/pipermail/python-ideas/2018-March/049578.html
msg342429 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-14 04:24
Oh good, I thought this had come up before.  Your method idea that could be optimized on literals makes a lot of sense, and is notably more readable than yet another letter prefix.
msg342477 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-14 15:33
Hi, I have been looking to get more acquainted with the peephole optimizer. Is it okay if I work on this?
msg342488 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-14 17:00
I'd say go for it.  We can't guarantee we'll accept the feature yet, but I think the .dedent() method with an optimization pass approach is worthwhile making a proof of concept of regardless.
msg342569 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-05-15 12:08
For the record, I just came across this proposed feature for Java:

https://openjdk.java.net/jeps/8222530

    Add text blocks to the Java language. A text block is a multi-line 
    string literal that avoids the need for most escape sequences, 
    automatically formats the string in predictable ways, and gives the 
    developer control over format when desired.

It seems to be similar to Python triple-quoted strings except that the 
compiler automatically dedents the string using a "re-indentation 
algorithm". (Which sounds to me similar to, if not identical, to that 
used by textwrap.)

The JEP proposal says:

    A study of Google's large internal repository of Java source code 
    showed that 98% of string literals, once converted to text blocks 
    and formatted appropriately, would require removal of incidental 
    white space. If Java introduced multi-line string solution without 
    support for automatically removing incidental white space, then many 
    developers would write a method to remove it themselves and/or lobby 
    for the String class to include a removal method.

which matches my own experience: *most* but not all of my indented 
triple-quotes strings start with incidental whitespace that I don't care 
about. But not quite all, so I think backwards compatibility requires 
that *by default* triple-quoted strings are not dedented.

Note that there are a couple of major difference between the JEP 
proposal and this:

- The JEP proposes to automatically dedent triple-quoted strings;
  this proposal requires an explicit call to .dedent().

- The JEP proposal allows the user to control the dedent by 
  indenting, or not, the trailing end-quote;

- however that means that in Java you won't be able to control
  the dedent if the string doesn't end with a final blank line;

- Should the dedent method accept an optional int argument
  specifying the number of spaces to dedent by? (Defaulting to
  None, meaning "dedent by the common indent".) If so, that won't
  affect the compile-time optimization so long as the argument is
  a literal.

- the JEP performs the dedent before backslash escapes are 
  interpreted; in this proposal backslash escapes will 
  occur before the dedent.

The JEP also mentions considering multi-line string literals as Swift 
and Rust do them:

https://github.com/apple/swift-evolution/blob/master/proposals/0168-multi-line-string-literals.md

https://stackoverflow.com/questions/29483365/what-is-the-syntax-for-a-multiline-string-literal

I mention these for completeness, not to suggest them as alternatives.
msg342600 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-15 20:54
Thanks, it's actually good to see this being a feature accepted in other languages.
msg342909 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-20 14:57
Hi @steven.daprano, @gregory.p.smith. I added the first version of my PR for review.

One issue with it is that in:

def f():
    return "   foo".dedent()

f will have both "   foo" and "foo" in its constants even if the first is not used anymore. Removing it requires looping over the code once more while marking the constants seen in a set and I was not sure if this was ok.
msg342914 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-05-20 15:21
Perform the optimization at the AST level, not in the peepholer.
msg342915 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-05-20 15:26
> One issue with it is that in:
> def f():
>     return "   foo".dedent()
> f will have both "   foo" and "foo" in its constants even if the first is not used anymore.

That seems to be what happens with other folded constants:

py> def f():
...     return 99.0 + 0.9
...
py> f.__code__.co_consts
(None, 99.0, 0.9, 99.9)

so I guess that this is okay for a first draft. One difference is that 
strings tend to be much larger than floats, so this will waste more 
memory. We ought to consider removing unused constants at some point.

(But not me, sorry, I don't have enough C.)

> Removing it requires looping over the code once more while marking 
> the constants seen in a set and I was not sure if this was ok.

That should probably be a new issue.
msg342916 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-05-20 15:28
Serhiy's message crossed with mine -- you should probably listen to
him over me :-)
msg342917 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-20 15:30
> Perform the optimization at the AST level, not in the peepholer.

Thanks, this makes more sense.
msg342918 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-20 15:31
> Serhiy's message crossed with mine.

And mine crossed with yours, sorry. I will update my PR shortly.
msg342927 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-20 16:27
Thanks @serhiy.storchaka, it's far easier to do here. I pushed the patch to the attached PR. Is there a reason the other optimisations in the Peephole optimizer are not done in the AST?
msg342928 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-05-20 16:30
The optimization that can be done in the AST is done in the AST.
msg342931 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-05-20 16:42
While the string method works pretty well, I do not think this is the best way. If 98% of multiline string will need deindenting, it is better to do it by default. For those 2% that do not need deintentation, it can be prohibited by adding the backslash followed by a newline at first position (except the start of the string). For example:

smile = '''\

 XX
 XX      X
          X
    XXX   X
          X
 XX      X
 XX

\
'''

Yes, this is breaking change. But we have import from __future__ and FutureWarning. The plan may be:

3.9. Implement "from __future__ import deindent".
3.11. Emit a FutureWarning for multiline literals that will be changed by dedending if "from __future__ import deindent" is not specified.
3.13. Make it the default behavior.
msg342938 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-05-20 18:11
> While the string method works pretty well, I do not think this is the best way.

Regardless of what we do for literals, a dedent() method will help for 
non-literals, so I think that this feature should go in even if we 
intend to change the default behaviour in the future:

> 3.9. Implement "from __future__ import deindent".
> 3.11. Emit a FutureWarning for multiline literals that will be changed by dedending if "from __future__ import deindent" is not specified.
> 3.13. Make it the default behavior.

And that gives us plenty of time to decide whether or not making it the 
default, rather than an explicit choice, is the right thing to do.
msg342962 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-20 22:33
Agreed, I'm in favor of going forward with this .dedent() optimization approach today.

If we were to attempt a default indented multi-line str and bytes literal behavior change in the future (a much harder decision to make as it is a breaking change), that is its own issue and probably PEP worthy.
msg342965 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2019-05-20 23:55
I've tried a bit PR 13455, I find this way nicer than textwrap.dedent(...), 
though I wonder if f-string readability (and expected behavior?) might suffer a tiny bit with the order of formatting the f-string vs dedenting. 

In the following it is clear that dedent is after formatting: 

>>> dedent(f"   {stuff}")

It might be unclear for the following especially if `.dedent()` get sold as zero-overhead at compile time.

>>> f"   {stuff}".dedent()

Could it be made clearer with the peephole optimiser (and tested, I don't believe it is now), that dedent applies after-formatting ?

Alternative modifications/suggestions/notes: 

   - I can also see how having dedent applied  **before** formatting with f-string could be useful or less surprising ( a d"" prefix could do that... just wondering what your actual goal is). 
   - Is this a supposed to deprecating textwrap.dedent ? Duck-typing and stuff, could textwrap.dedent work on non-str things and the current implementation not ( it assumes the `.dedent()` method exists) and thus be backward-incompatible ?
msg342968 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-05-21 01:05
> It might be unclear for the following especially if `.dedent()` get 
> sold as zero-overhead at compile time.

Oh, please, please, please PLEASE let's not over-sell this! There is no 
promise that dedent will be zero-overhead: it is a method, like any 
other method, which is called at runtime. Some implementations *might* 
*sometimes* be able to optimize that at compile-time, just as some 
implementations *might* *sometimes* be able to optimize away long 
complex arithmetic expressions and do them at compile time.

Such constant-folding optimizations can only occur with literals, since 
arbitrary expressions aren't known at compile-time. F-strings aren't 
string literals, they are executable code and can run thngs like this:

f"{'abc' if random.random() > 0.5 else 'xyz'}"

So we don't know how many spaces each line begins with until after the 
f-string is evaluated:

f"""{m:5d}
{n:5d}"""

Unless we over-sell the keyhole optimization part, there shouldn't be 
anything more confusing about dedent than this:

x, X = 'spam', 'eggs'
f"{x}".upper()
# returns 'SPAM' not 'eggs'

> Could it be made clearer with the peephole optimiser (and tested, I 
> don't believe it is now), that dedent applies after-formatting ?

We should certainly make that clear that 

Personally, I think we should soft-sell on the compile-time optimization 
until such time that the Steering Council decides it should be a 
mandatory language feature.

> Alternative modifications/suggestions/notes: 
> 
>    - I can also see how having dedent applied **before** formatting 
>    with f-string could be useful or less surprising ( a d"" prefix 
>    could do that... just wondering what your actual goal is).

I don't see how it will make any difference in the common case. And the 
idea here is to avoid yet another string prefix.

>    - Is this a supposed to deprecating textwrap.dedent ? 

I don't think so, but eventually it might.
msg342972 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2019-05-21 01:34
> Oh, please, please, please PLEASE let's not over-sell this! 

Sorry didn't wanted to give you a heart attack. The optimisation has been mentioned, and you never know what people get excited on.

> Such constant-folding ...

Well, in here we might get that, but I kind of want to see how this is taught or explain, what I want to avoid is tutorial or examples saying that `.dedent()` is "as if you hadn't put spaces in front".

> I don't think so, but eventually it might.

Ok, thanks.

Again just being cautious, and I see this is targeted 3.9 so plenty of time.
I believe this will be a net improvement on many codebases.
msg343961 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-05-30 09:53
Can we dedent docstring too?

Is there any string like inspect.cleandoc(s) != inspect.cleandoc(s.dedent())?
msg343991 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-30 18:00
I think dedenting docstring content by default would be a great thing to do.  But that's a separate issue, it isn't quite the same as .dedent() due to the first line.  I filed https://bugs.python.org/issue37102 to track that.
msg350162 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-22 06:17
We should consider dedicated syntax for compile-time dedenting:

    d"""\
      This would be left aligned
          
          but this would only have four spaces

      And this would be left-justified.
    """   # Am not sure what to do about this last line
msg356153 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-11-06 20:25
Another option not using a new letter: A triple-backtick token.


def foo():
    value = ```this is a
 
    long multi line string i don't want indented.
    ```

A discuss thread was started so I reconnected it with this issue.  See
 https://discuss.python.org/t/trimmed-multiline-string/2600/8
msg356154 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-06 20:36
I think it would be better to use use backtick quotes for f-strings instead of the f prefix. This would stress the special nature of f-strings (they are not literals, but expressions). But there was strong opposition to using backticks anywhere in Python syntax.
msg356155 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-11-06 20:59
If I can say my two cents:

1. I preferred that the default behaviour of multi-line was to dedent. But breaking old code, even if for a little percentage of code, IMHO is never a good idea. Py2->Py3 should have proved it.

2. ``` remembers me too much the Markdown for add a code block, not a text block

3. yes, the new prefix is really useless, because it's significant only for multiline strings. Anyway, if this solution is accepted, I propose `t` for `trim`.
msg356160 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-11-07 00:25
Is there a reason folks are supporting a textwrap.dedent-like behavior over the generally cleaner inspect.cleandoc behavior? The main advantage to the latter being that it handles:

    '''First
    Second
    Third
    '''

just fine (removing the common indentation from Second/Third), and produces identical results with:

    '''
    First
    Second
    Third
    '''

where textwrap.dedent behavior would leave the first string unmodified (because it removes the largest common indentation, and First has no leading indentation), and dedenting the second, but leaving a leading newline in place (where cleandoc removes it), that can only be avoided by using the typically discouraged line continuation character to make it:

    '''\
    First
    Second
    Third
    '''

cleandoc behavior means the choice of whether the text begins and ends on the same line at the triple quote doesn't matter, and most use cases seem like they'd benefit from that flexibility.
msg356162 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-11-07 01:54
.cleandoc is _probably_ more of what people want than .dedent?  I hadn't bothered to even try to pick between the two yet.
msg356182 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-11-07 10:22
Anyway there's something strange in string escaping and `inspect.cleandoc()`:

>>> a = """
... \nciao
...     bello
... \ ciao
... """
>>> print(inspect.cleandoc(a))
ciao
    bello
\ ciao
>>> print("\ ciao")
\ ciao

I expected:

>>> print(inspect.cleandoc(a))

ciao
bello
 ciao
>>> print("\ ciao")
 ciao
msg356193 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-11-07 15:37
Excuse me for the spam, but against make it the default behavior I have a simple consideration: what will expect a person that reads the code, that doesn't know Python?

IMHO it expects that the string is *exactly* like it's written. The fact that it will be de-dented it's a bit surprising.

For readability and for not breaking old code, I continue to be in favor of a letter before the multi-string. Maybe `d`, for de-dent, it's more appropriate than `t`, since it does not only trim the string.

But probably there's a better solution than the letter.
msg356198 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-07 16:19
The user expects what they read in the documentation of what they learn in other programming languages. If we update the documentation their expectation will change.

As for other programming languages, Bash has an option for stripping all leading tab characters from a here document, and in Julia triple-quoted strings are dedented (https://docs.julialang.org/en/v1/manual/strings/#Triple-Quoted-String-Literals-1). Since Julia is a competitor of Python in science applications, I think that significant fraction of Python users expected Python triple-quoted strings be dedented too, especially if they are dedented by help() and other tools.
msg356203 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-07 16:51
Julia syntax looks well thought out, so I suggest to borrow it.
msg356204 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-11-07 17:48
When Python started to emulate the other languages?

Who cares about what other languages do? Python uses `raise` instead of `throw`, even if `throw` is much more popular in the most used languages, only because `raise` in English has more sense. 

And IMHO a newbie that see a multi-string in the code does not read the documentation. It's evident that is a multi-string. And it expects that it acts as in English or any other written language, that is the text is *that* one that (s)he read.

On the contrary, if (s)he reads 

d"""
   Marco
   Sulla
"""

maybe (s)he thinks "this must be something different", and read the docs.
History
Date User Action Args
2019-11-07 17:48:33Marco Sullasetmessages: + msg356204
2019-11-07 16:51:55serhiy.storchakasetmessages: + msg356203
2019-11-07 16:19:28serhiy.storchakasetmessages: + msg356198
2019-11-07 15:37:20Marco Sullasetmessages: + msg356193
2019-11-07 10:22:32Marco Sullasetmessages: + msg356182
2019-11-07 01:54:41gregory.p.smithsetmessages: + msg356162
2019-11-07 00:25:47josh.rsetnosy: + josh.r
messages: + msg356160
2019-11-06 20:59:19Marco Sullasetnosy: + Marco Sulla
messages: + msg356155
2019-11-06 20:36:17serhiy.storchakasetmessages: + msg356154
2019-11-06 20:25:56gregory.p.smithsetmessages: + msg356153
2019-08-22 06:17:01rhettingersetmessages: + msg350162
2019-05-30 18:00:38gregory.p.smithsetmessages: + msg343991
2019-05-30 09:53:36inada.naokisetnosy: + inada.naoki
messages: + msg343961
2019-05-21 01:34:58mbussonnsetmessages: + msg342972
2019-05-21 01:05:19steven.dapranosetmessages: + msg342968
2019-05-20 23:55:50mbussonnsetmessages: + msg342965
2019-05-20 22:33:15gregory.p.smithsetmessages: + msg342962
2019-05-20 18:11:23steven.dapranosetmessages: + msg342938
2019-05-20 16:42:51serhiy.storchakasetmessages: + msg342931
2019-05-20 16:30:04serhiy.storchakasetmessages: + msg342928
2019-05-20 16:27:01remi.lapeyresetmessages: + msg342927
2019-05-20 15:31:39remi.lapeyresetmessages: + msg342918
2019-05-20 15:30:51remi.lapeyresetmessages: + msg342917
2019-05-20 15:28:43steven.dapranosetmessages: + msg342916
2019-05-20 15:26:36steven.dapranosetmessages: + msg342915
2019-05-20 15:21:12serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg342914
2019-05-20 14:57:55remi.lapeyresetmessages: + msg342909
2019-05-20 14:53:50remi.lapeyresetkeywords: + patch
stage: patch review
pull_requests: + pull_request13353
2019-05-15 20:54:25gregory.p.smithsetpriority: low -> normal

messages: + msg342600
2019-05-15 12:08:11steven.dapranosetmessages: + msg342569
2019-05-14 23:17:12pablogsalsetnosy: + pablogsal
2019-05-14 17:00:11gregory.p.smithsetmessages: + msg342488
2019-05-14 15:33:01remi.lapeyresetnosy: + remi.lapeyre
messages: + msg342477
2019-05-14 04:24:18gregory.p.smithsetmessages: + msg342429
2019-05-14 02:44:39steven.dapranosetnosy: + steven.daprano
messages: + msg342420
2019-05-14 02:31:26mbussonnsetnosy: + mbussonn
2019-05-14 00:40:36rhettingersetnosy: + rhettinger
messages: + msg342407
2019-05-13 18:40:31gregory.p.smithcreate