classification
Title: PEP 616: Add str.removeprefix and str.removesuffix methods
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, doerwalter, elazar, eric.smith, gvanrossum, miss-islington, rhettinger, steven.daprano, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2020-03-11 19:11 by Dennis Sweeney, last changed 2020-05-28 01:24 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pep-9999.rst Dennis Sweeney, 2020-03-20 04:26 Revised for typos
Pull Requests
URL Status Linked Edit
PR 18939 merged Dennis Sweeney, 2020-03-11 19:15
PR 19455 closed vstinner, 2020-04-10 13:06
PR 20473 merged elazar, 2020-05-28 00:40
PR 20474 merged miss-islington, 2020-05-28 00:41
Messages (29)
msg363958 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-03-11 19:11
Following discussion here ( https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/ ), there is a proposal to add new methods str.cutprefix and str.cutsuffix to alleviate the common misuse of str.lstrip and str.rstrip.

I think sticking with the most basic possible behavior

    def cutprefix(self: str, prefix: str) -> str:
        if self.startswith(prefix):
            return self[len(prefix):]
        # return a copy to work for bytearrays
        return self[:]

    def cutsuffix(self: str, suffix: str) -> str:
        if self.startswith(suffix):
            # handles the "[:-0]" issue
            return self[:len(self)-len(suffix)]
        return self[:]

would be best (refusing to guess in the face of ambiguous multiple arguments). Someone can do, e.g.

    >>> 'foo.tar.gz'.cutsuffix('.gz').cutsuffix('.tar')
    'foo'

to cut off multiple suffixes. More complicated behavior for multiple arguments could be added later, but it would be easy to make a mistake in prematurely generalizing right now.

In bikeshedding method names, I think that avoiding the word "strip" would be nice so users can have a consistent feeling that "'strip' means character sets; 'cut' means substrings".
msg364020 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-03-12 13:02
To be clear, are you only making a copy of the unchanged object if it is a mutable bytearray, not str or bytes?
msg364028 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-03-12 16:37
Yes:

    >>> x = "A"*10**6
    >>> x.cutprefix("B") is x
    True
    >>> x.cutprefix("") is x
    True

    >>> y = b"A"*10**6
    >>> y.cutprefix(b"B") is y
    True
    >>> y.cutprefix(b"") is y
    True

    >>> z = bytearray(b"A")*10**6
    >>> z.cutprefix(b"B") is z
    False
    >>> z.cutprefix(b"") is z
    False

I'm not sure whether this should be part of the spec or an implementation detail. The (str/bytes).replace method docs don't clarify this, but they have the same behavior:

    >>> x = "A"*10**6
    >>> x.replace("B", "C") is x
    True
    >>> x.replace("", "") is x
    True

    >>> y = b"A"*10**6
    >>> y.replace(b"B", b"C") is y
    True
    >>> y.replace(b"", b"") is y
    True

    >>> z = bytearray(b"A")*10**6
    >>> z.replace(b"B", b"C") is z
    False
    >>> z.replace(b"", b"") is z
    False
msg364277 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-03-16 02:52
Guido, do you support this API expansion?
msg364284 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-16 04:32
I stopped following the discussion at some point, but I think this is worth adding it -- I have seen this done over and over again, and apparently lots of other people have felt the need too.

I think these names are fine, and about the best we can do (keeping in line with the "feel" of the rest of the string API).

I like the behavior of returning a copy of the string if there's no match (as opposed to failing, which was also brought up).  If the original object is immutable this should return the original object, but that should be considered a CPython optimization (IIRC all the string methods are pretty careful about that), but not required by the spec.

FWIW the pseudo code has a copy/paste error: In cutsuffix() it should use endswith() rather than startswith().
msg364313 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-16 13:04
The proposed change will affect many builtin types: bytes, bytearray, str, but also other types like collections.UserString. Would it make sense to summarize what has been said in the python-ideas thread into a PEP? It may good to specify things like:

    >>> x = "A"*10**6
    >>> x.cutprefix("B") is x
    True

The specification can be just "that's an implementation detail" or "CPython implementation specific" :-)

I don't expect such PEP to be long nor controversial, but it may help to write it down.
msg364581 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-03-19 01:14
If no one has started, I can draft such a PEP.
msg364582 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-19 01:25
Sounds good.
msg364643 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-03-20 00:37
Here is a draft PEP -- I believe it needs a Core Developer sponsor now?
msg364657 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-20 08:27
The PEP is a good start. Can you try to convert it to a PR on https://github.com/python/peps/ ? It seems like the next available PEP number is 616. I would prefer to leave comments on a PR.
msg364664 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2020-03-20 10:42
IMHO the names don't fit Pythons current naming scheme, so what about naming them "lchop" and "rchop"?
msg364671 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-03-20 14:23
https://github.com/python/peps/pull/1332
msg364681 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-20 16:53
> https://github.com/python/peps/pull/1332

Thank you. And good luck for handling incoming discussions on the PEP ;-)
msg364701 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-20 18:28
Where should I leave comments on the PEP? Do you plan to post it on python-dev soon?
msg364703 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-03-20 18:52
Just posted it.
msg365036 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-26 00:25
Dennis Sweeney wrote https://www.python.org/dev/peps/pep-0616/
msg366879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 21:02
The documentation should explain well the difference between removeprefix()/removesuffix() and lstrip()/strip()/rstrip(), since it is the rationale of the PEP ;-)

An example that can be used to explain the difference:

>>> "Monty Python".removesuffix(" Python")
'Monty'
>>> "Monty Python".strip(" Python")
'M'
msg366882 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 21:11
When, I even expect that some people use .strip() whereas their intent was to use .lstrip():

>>> "Python vs Monty Python".strip("Python")
' vs Monty '

Again, strip() is used with a string whereas the real intent was to use removesuffix() which didn't exist ;-)

A note should be added to lstrip(), strip() and rstrip() documentation to point to removeprefix() and/or removesuffix().
msg366886 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-20 21:23
Please add an underscore to the names:  remove_prefix(). and remove_suffix().

The latter method causes a mental hiccup when first reading as removes-uffix, forcing mental backtracking to get to remove-suffix.

We had a similar problem with addinfourl initially being read as add-in-four-l before mentally backtracking to add-info-url.
msg366890 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 21:39
> Please add an underscore to the names:  remove_prefix(). and remove_suffix().

The PEP 616 was approved with removeprefix() and removesuffix() names. The rationale for the names can be even found in the PEP:
https://www.python.org/dev/peps/pep-0616/#alternative-method-names
msg366897 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-20 21:58
I disagree with the rationale given in the PEP.  The reason that "startswith" and "endswith" don't have underscores is that the aren't needed to disambiguate the text.  Our rules are to add underscores when it improves readability, which in this case it does.   Like casing conventions, these rules became prevent after the early modules were created (i.e. the older the module, the more likely that it doesn't follow modern conventions).

We only have one chance to get this right.  Take it from someone with experience with this particular problem.  I created imap() but later regretted the naming pattern when if came to ifilter() and islice() which sometimes cause mental hiccups initially being read as if-ilter and is-lice.
msg366960 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-04-22 00:11
I'm personally -0 for underscores -- they might slightly improve readability of the function name in isolation but may also add confusion about which methods have underscores.  Only one out of the 45 non-dunder str methods has an underscore right now:

    >>> meths = [x for x in dir(str) if not x.startswith('__')]
    >>> [x for x in meths if '_' in x]
    ['format_map']
    >>> [x for x in meths if '_' not in x]
    ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Maybe I'm wrong, but it seemed to me that most of the discussions to date had arrived at leaving out underscores.  Is there a process or appropriate channel to continue this discussion now that the PEP is accepted?
msg366964 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-04-22 01:14
Oops -- I now see the message on Python-Dev.
msg367049 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-22 21:05
New changeset a81849b0315277bb3937271174aaaa5059c0b445 by sweeneyde in branch 'master':
bpo-39939: Add str.removeprefix and str.removesuffix (GH-18939)
https://github.com/python/cpython/commit/a81849b0315277bb3937271174aaaa5059c0b445
msg367052 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-22 21:16
Well done Dennis Sweeney! You got a PEP approved and now the implementation is merged!

Maybe the documentation will need more reviews, but that can be done later. 

I prefer to get the implementation merged as soon as possible (it will likely be part of the next 3.9.0a6), so more users can play with it before 3.9.0 final release.
msg367055 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2020-04-22 22:14
There's a failure here:

    https://buildbot.python.org/all/#/builders/64/builds/656

    Failed subtests:
    test_killed_child - test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest

    Traceback (most recent call last):
    ...
    OSError: [Errno 9] Bad file descriptor

This should be unrelated to the patch, right?
msg367056 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-22 22:21
> This should be unrelated to the patch, right?

It's unrelated. It smells like bpo-39995.
msg370158 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 00:41
New changeset 56853d8ec6ed89bf5a9b81c3781a4df46ac391d3 by Elazar Gershuni in branch 'master':
 bpo-39939: Fix removeprefix issue number in the What's New in Python 3.9 (GH-20473)
https://github.com/python/cpython/commit/56853d8ec6ed89bf5a9b81c3781a4df46ac391d3
msg370160 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 01:24
New changeset de6b6841098e1a5967cb7a50b665ca7473d0ddad by Miss Islington (bot) in branch '3.9':
bpo-39939: Fix removeprefix issue number in the What's New in Python 3.9 (GH-20473) (GH-20474)
https://github.com/python/cpython/commit/de6b6841098e1a5967cb7a50b665ca7473d0ddad
History
Date User Action Args
2020-05-28 01:24:39vstinnersetmessages: + msg370160
2020-05-28 00:41:41miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request19727
2020-05-28 00:41:37vstinnersetmessages: + msg370158
2020-05-28 00:40:35elazarsetnosy: + elazar

pull_requests: + pull_request19726
2020-04-22 22:21:47vstinnersetmessages: + msg367056
2020-04-22 22:14:45Dennis Sweeneysetmessages: + msg367055
2020-04-22 21:16:57vstinnersetstatus: open -> closed
title: PEP 616: Add str methods to remove prefix or suffix -> PEP 616: Add str.removeprefix and str.removesuffix methods
messages: + msg367052

resolution: fixed
stage: patch review -> resolved
2020-04-22 21:05:51vstinnersetmessages: + msg367049
2020-04-22 01:14:10Dennis Sweeneysetmessages: + msg366964
2020-04-22 00:11:11Dennis Sweeneysetmessages: + msg366960
2020-04-20 21:58:26rhettingersetmessages: + msg366897
2020-04-20 21:39:52vstinnersetmessages: + msg366890
2020-04-20 21:23:18rhettingersetmessages: + msg366886
2020-04-20 21:11:38vstinnersetmessages: + msg366882
2020-04-20 21:02:00vstinnersetmessages: + msg366879
2020-04-10 13:06:46vstinnersetpull_requests: + pull_request18809
2020-03-26 00:25:59vstinnersetmessages: + msg365036
title: Add str methods to remove prefixes or suffixes -> PEP 616: Add str methods to remove prefix or suffix
2020-03-20 18:52:58Dennis Sweeneysetmessages: + msg364703
2020-03-20 18:28:54vstinnersetmessages: + msg364701
2020-03-20 16:53:23vstinnersetmessages: + msg364681
2020-03-20 14:23:10Dennis Sweeneysetmessages: + msg364671
2020-03-20 10:42:55doerwaltersetnosy: + doerwalter
messages: + msg364664
2020-03-20 08:27:46vstinnersetmessages: + msg364657
2020-03-20 04:26:34Dennis Sweeneysetfiles: - pep-9999.rst
2020-03-20 04:26:06Dennis Sweeneysetfiles: + pep-9999.rst
2020-03-20 00:37:09Dennis Sweeneysetfiles: + pep-9999.rst

messages: + msg364643
2020-03-19 01:25:17gvanrossumsetmessages: + msg364582
2020-03-19 01:14:42Dennis Sweeneysetmessages: + msg364581
2020-03-16 13:04:33vstinnersetmessages: + msg364313
2020-03-16 04:32:55gvanrossumsetmessages: + msg364284
2020-03-16 02:52:46rhettingersetnosy: + rhettinger, gvanrossum
messages: + msg364277
2020-03-12 16:37:24Dennis Sweeneysetmessages: + msg364028
2020-03-12 13:02:17steven.dapranosetnosy: + steven.daprano
messages: + msg364020
2020-03-12 01:34:02xtreaksetnosy: + xtreak
2020-03-12 01:14:09vstinnersetnosy: + vstinner
2020-03-11 19:34:39eric.smithsetnosy: + eric.smith
2020-03-11 19:15:51Dennis Sweeneysetkeywords: + patch
stage: patch review
pull_requests: + pull_request18292
2020-03-11 19:11:38Dennis Sweeneycreate