Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 616: Add str.removeprefix and str.removesuffix methods #84120

Closed
sweeneyde opened this issue Mar 11, 2020 · 29 comments
Closed

PEP 616: Add str.removeprefix and str.removesuffix methods #84120

sweeneyde opened this issue Mar 11, 2020 · 29 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@sweeneyde
Copy link
Member

BPO 39939
Nosy @gvanrossum, @doerwalter, @rhettinger, @vstinner, @ericvsmith, @stevendaprano, @elazarg, @miss-islington, @tirkarthi, @sweeneyde
PRs
  • bpo-39939: Add str.removeprefix and str.removesuffix #18939
  • WIP: bpo-39939: Use removeprefix() and removesuffix() #19455
  • bpo-39939: Fix typo in docs: removeprefix is issue 39939 #20473
  • [3.9] bpo-39939: Fix removeprefix issue number in the What's New in Python 3.9 (GH-20473) #20474
  • Files
  • pep-9999.rst: Revised for typos
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-22.21:16:57.018>
    created_at = <Date 2020-03-11.19:11:38.177>
    labels = ['interpreter-core', 'type-feature', '3.9']
    title = 'PEP 616: Add str.removeprefix and str.removesuffix methods'
    updated_at = <Date 2020-05-28.01:24:39.490>
    user = 'https://github.com/sweeneyde'

    bugs.python.org fields:

    activity = <Date 2020-05-28.01:24:39.490>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-22.21:16:57.018>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2020-03-11.19:11:38.177>
    creator = 'Dennis Sweeney'
    dependencies = []
    files = ['48989']
    hgrepos = []
    issue_num = 39939
    keywords = ['patch']
    message_count = 29.0
    messages = ['363958', '364020', '364028', '364277', '364284', '364313', '364581', '364582', '364643', '364657', '364664', '364671', '364681', '364701', '364703', '365036', '366879', '366882', '366886', '366890', '366897', '366960', '366964', '367049', '367052', '367055', '367056', '370158', '370160']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'doerwalter', 'rhettinger', 'vstinner', 'eric.smith', 'steven.daprano', 'elazar', 'miss-islington', 'xtreak', 'Dennis Sweeney']
    pr_nums = ['18939', '19455', '20473', '20474']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue39939'
    versions = ['Python 3.9']

    @sweeneyde
    Copy link
    Member Author

    Following discussion here ( https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/ ), there is a proposal to add new methods str.cutprefix and str.cutsuffix to alleviate the common misuse of str.lstrip and str.rstrip.

    I think sticking with the most basic possible behavior

        def cutprefix(self: str, prefix: str) -> str:
            if self.startswith(prefix):
                return self[len(prefix):]
            # return a copy to work for bytearrays
            return self[:]
    
        def cutsuffix(self: str, suffix: str) -> str:
            if self.startswith(suffix):
                # handles the "[:-0]" issue
                return self[:len(self)-len(suffix)]
            return self[:]

    would be best (refusing to guess in the face of ambiguous multiple arguments). Someone can do, e.g.

        >>> 'foo.tar.gz'.cutsuffix('.gz').cutsuffix('.tar')
        'foo'

    to cut off multiple suffixes. More complicated behavior for multiple arguments could be added later, but it would be easy to make a mistake in prematurely generalizing right now.

    In bikeshedding method names, I think that avoiding the word "strip" would be nice so users can have a consistent feeling that "'strip' means character sets; 'cut' means substrings".

    @sweeneyde sweeneyde added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Mar 11, 2020
    @stevendaprano
    Copy link
    Member

    To be clear, are you only making a copy of the unchanged object if it is a mutable bytearray, not str or bytes?

    @sweeneyde
    Copy link
    Member Author

    Yes:

        >>> x = "A"*10**6
        >>> x.cutprefix("B") is x
        True
        >>> x.cutprefix("") is x
        True
    
        >>> y = b"A"*10**6
        >>> y.cutprefix(b"B") is y
        True
        >>> y.cutprefix(b"") is y
        True
    
        >>> z = bytearray(b"A")*10**6
        >>> z.cutprefix(b"B") is z
        False
        >>> z.cutprefix(b"") is z
        False

    I'm not sure whether this should be part of the spec or an implementation detail. The (str/bytes).replace method docs don't clarify this, but they have the same behavior:

        >>> x = "A"*10**6
        >>> x.replace("B", "C") is x
        True
        >>> x.replace("", "") is x
        True
    
        >>> y = b"A"*10**6
        >>> y.replace(b"B", b"C") is y
        True
        >>> y.replace(b"", b"") is y
        True
    
        >>> z = bytearray(b"A")*10**6
        >>> z.replace(b"B", b"C") is z
        False
        >>> z.replace(b"", b"") is z
        False

    @rhettinger
    Copy link
    Contributor

    Guido, do you support this API expansion?

    @gvanrossum
    Copy link
    Member

    I stopped following the discussion at some point, but I think this is worth adding it -- I have seen this done over and over again, and apparently lots of other people have felt the need too.

    I think these names are fine, and about the best we can do (keeping in line with the "feel" of the rest of the string API).

    I like the behavior of returning a copy of the string if there's no match (as opposed to failing, which was also brought up). If the original object is immutable this should return the original object, but that should be considered a CPython optimization (IIRC all the string methods are pretty careful about that), but not required by the spec.

    FWIW the pseudo code has a copy/paste error: In cutsuffix() it should use endswith() rather than startswith().

    @vstinner
    Copy link
    Member

    The proposed change will affect many builtin types: bytes, bytearray, str, but also other types like collections.UserString. Would it make sense to summarize what has been said in the python-ideas thread into a PEP? It may good to specify things like:

        >>> x = "A"*10**6
        >>> x.cutprefix("B") is x
        True

    The specification can be just "that's an implementation detail" or "CPython implementation specific" :-)

    I don't expect such PEP to be long nor controversial, but it may help to write it down.

    @sweeneyde
    Copy link
    Member Author

    If no one has started, I can draft such a PEP.

    @gvanrossum
    Copy link
    Member

    Sounds good.

    @sweeneyde
    Copy link
    Member Author

    Here is a draft PEP -- I believe it needs a Core Developer sponsor now?

    @vstinner
    Copy link
    Member

    The PEP is a good start. Can you try to convert it to a PR on https://github.com/python/peps/ ? It seems like the next available PEP number is 616. I would prefer to leave comments on a PR.

    @doerwalter
    Copy link
    Contributor

    IMHO the names don't fit Pythons current naming scheme, so what about naming them "lchop" and "rchop"?

    @sweeneyde
    Copy link
    Member Author

    python/peps#1332

    @vstinner
    Copy link
    Member

    python/peps#1332

    Thank you. And good luck for handling incoming discussions on the PEP ;-)

    @vstinner
    Copy link
    Member

    Where should I leave comments on the PEP? Do you plan to post it on python-dev soon?

    @sweeneyde
    Copy link
    Member Author

    Just posted it.

    @vstinner
    Copy link
    Member

    Dennis Sweeney wrote https://www.python.org/dev/peps/pep-0616/

    @vstinner vstinner changed the title Add str methods to remove prefixes or suffixes PEP 616: Add str methods to remove prefix or suffix Mar 26, 2020
    @vstinner vstinner changed the title Add str methods to remove prefixes or suffixes PEP 616: Add str methods to remove prefix or suffix Mar 26, 2020
    @vstinner
    Copy link
    Member

    The documentation should explain well the difference between removeprefix()/removesuffix() and lstrip()/strip()/rstrip(), since it is the rationale of the PEP ;-)

    An example that can be used to explain the difference:

    >>> "Monty Python".removesuffix(" Python")
    'Monty'
    >>> "Monty Python".strip(" Python")
    'M'

    @vstinner
    Copy link
    Member

    When, I even expect that some people use .strip() whereas their intent was to use .lstrip():

    >>> "Python vs Monty Python".strip("Python")
    ' vs Monty '

    Again, strip() is used with a string whereas the real intent was to use removesuffix() which didn't exist ;-)

    A note should be added to lstrip(), strip() and rstrip() documentation to point to removeprefix() and/or removesuffix().

    @rhettinger
    Copy link
    Contributor

    Please add an underscore to the names: remove_prefix(). and remove_suffix().

    The latter method causes a mental hiccup when first reading as removes-uffix, forcing mental backtracking to get to remove-suffix.

    We had a similar problem with addinfourl initially being read as add-in-four-l before mentally backtracking to add-info-url.

    @vstinner
    Copy link
    Member

    Please add an underscore to the names: remove_prefix(). and remove_suffix().

    The PEP-616 was approved with removeprefix() and removesuffix() names. The rationale for the names can be even found in the PEP:
    https://www.python.org/dev/peps/pep-0616/#alternative-method-names

    @rhettinger
    Copy link
    Contributor

    I disagree with the rationale given in the PEP. The reason that "startswith" and "endswith" don't have underscores is that the aren't needed to disambiguate the text. Our rules are to add underscores when it improves readability, which in this case it does. Like casing conventions, these rules became prevent after the early modules were created (i.e. the older the module, the more likely that it doesn't follow modern conventions).

    We only have one chance to get this right. Take it from someone with experience with this particular problem. I created imap() but later regretted the naming pattern when if came to ifilter() and islice() which sometimes cause mental hiccups initially being read as if-ilter and is-lice.

    @sweeneyde
    Copy link
    Member Author

    I'm personally -0 for underscores -- they might slightly improve readability of the function name in isolation but may also add confusion about which methods have underscores. Only one out of the 45 non-dunder str methods has an underscore right now:

        >>> meths = [x for x in dir(str) if not x.startswith('__')]
        >>> [x for x in meths if '_' in x]
        ['format_map']
        >>> [x for x in meths if '_' not in x]
        ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

    Maybe I'm wrong, but it seemed to me that most of the discussions to date had arrived at leaving out underscores. Is there a process or appropriate channel to continue this discussion now that the PEP is accepted?

    @sweeneyde
    Copy link
    Member Author

    Oops -- I now see the message on Python-Dev.

    @vstinner
    Copy link
    Member

    New changeset a81849b by sweeneyde in branch 'master':
    bpo-39939: Add str.removeprefix and str.removesuffix (GH-18939)
    a81849b

    @vstinner
    Copy link
    Member

    Well done Dennis Sweeney! You got a PEP approved and now the implementation is merged!

    Maybe the documentation will need more reviews, but that can be done later.

    I prefer to get the implementation merged as soon as possible (it will likely be part of the next 3.9.0a6), so more users can play with it before 3.9.0 final release.

    @vstinner vstinner changed the title PEP 616: Add str methods to remove prefix or suffix PEP 616: Add str.removeprefix and str.removesuffix methods Apr 22, 2020
    @vstinner vstinner changed the title PEP 616: Add str methods to remove prefix or suffix PEP 616: Add str.removeprefix and str.removesuffix methods Apr 22, 2020
    @sweeneyde
    Copy link
    Member Author

    There's a failure here:

    https://buildbot.python.org/all/#/builders/64/builds/656
    
    Failed subtests:
    test_killed_child - test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest
    
        Traceback (most recent call last):
        ...
        OSError: [Errno 9] Bad file descriptor

    This should be unrelated to the patch, right?

    @vstinner
    Copy link
    Member

    This should be unrelated to the patch, right?

    It's unrelated. It smells like bpo-39995.

    @vstinner
    Copy link
    Member

    New changeset 56853d8 by Elazar Gershuni in branch 'master':
    bpo-39939: Fix removeprefix issue number in the What's New in Python 3.9 (GH-20473)
    56853d8

    @vstinner
    Copy link
    Member

    New changeset de6b684 by Miss Islington (bot) in branch '3.9':
    bpo-39939: Fix removeprefix issue number in the What's New in Python 3.9 (GH-20473) (GH-20474)
    de6b684

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants