Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace and empty strings #72216

Closed
StphaneHenriot mannequin opened this issue Sep 8, 2016 · 21 comments
Closed

Replace and empty strings #72216

StphaneHenriot mannequin opened this issue Sep 8, 2016 · 21 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@StphaneHenriot
Copy link
Mannequin

StphaneHenriot mannequin commented Sep 8, 2016

BPO 28029
Nosy @birkenfeld, @rhettinger, @vstinner, @ericvw, @bitdancer, @vadmium, @serhiy-storchaka
PRs
  • bpo-28029: Make "".replace("", s, n) returning s for any n != 0. #16981
  • Files
  • issue28029.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-30.10:06:16.291>
    created_at = <Date 2016-09-08.20:30:41.603>
    labels = ['interpreter-core', 'type-bug', '3.9']
    title = 'Replace and empty strings'
    updated_at = <Date 2020-04-02.05:39:20.912>
    user = 'https://bugs.python.org/StphaneHenriot'

    bugs.python.org fields:

    activity = <Date 2020-04-02.05:39:20.912>
    actor = 'v+python'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-30.10:06:16.291>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-09-08.20:30:41.603>
    creator = 'St\xc3\xa9phane Henriot'
    dependencies = []
    files = ['44477']
    hgrepos = []
    issue_num = 28029
    keywords = ['patch']
    message_count = 21.0
    messages = ['275140', '275147', '275149', '275152', '275159', '275162', '275168', '275183', '275225', '279414', '355556', '355558', '355562', '355566', '355627', '355628', '355703', '355704', '355707', '365570', '365571']
    nosy_count = 10.0
    nosy_names = ['georg.brandl', 'rhettinger', 'vstinner', 'ericvw', 'v+python', 'r.david.murray', 'SilentGhost', 'martin.panter', 'serhiy.storchaka', 'St\xc3\xa9phane Henriot']
    pr_nums = ['16981']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28029'
    versions = ['Python 3.9']

    @StphaneHenriot
    Copy link
    Mannequin Author

    StphaneHenriot mannequin commented Sep 8, 2016

    A few days ago, the following behavior surprised me.

    >>> "".replace("", "prefix", 1)
    ''
    >>> "".replace("", "prefix")
    'prefix'

    It seems to me this edge case isn't correctly documented/implemented. I tested with python 2.7, 3.4 and 3.5, I guess it applies for other versions as well.

    Here are a few elements, for reference.

    « string.replace(s, old, new[, maxreplace])
    Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced. »

    >>> "" in ""
    True
    >>> "".count("")
    1
    >>> "".find("")
    0

    https://hg.python.org/cpython/file/2.7/Objects/stringobject.c#l2750
    https://hg.python.org/cpython/file/default/Objects/unicodeobject.c#l10479

    Thanks,

    Stéphane.

    @StphaneHenriot StphaneHenriot mannequin added the type-bug An unexpected behavior, bug, or error label Sep 8, 2016
    @SilentGhost SilentGhost mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 8, 2016
    @StphaneHenriot
    Copy link
    Mannequin Author

    StphaneHenriot mannequin commented Sep 8, 2016

    For what it's worth, here is the behavior in PyPy 2.2.1

    >>>> "".replace("", "prefix", 1)
    'prefix'
    >>>> "".replace("", "prefix")
    'prefix'

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Sep 8, 2016

    Just the least intrusive patch. Also, to me PyPy behaviour doesn't seem correct.

    @StphaneHenriot
    Copy link
    Mannequin Author

    StphaneHenriot mannequin commented Sep 8, 2016

    Thanks for your help.

    However, I'm not sure I would agree, regarding the correct behavior.

    I guess the main question is « What is an occurrence? ».

    Are you not convinced that in, count and find indicate occurrences?
    To my understanding, the empty string occurs basically everywhere in strings (including beginning and end):

    >>> "blabla".replace("", "ε")
    'εbεlεaεbεlεaε'

    To give a bit more (useless?) context, I was implementing a GCD algorithm based on replacements by Knuth (very beginning of TAOCP) and it looks like the current cpython implementation gives 0 for gcd(1, 1) :(

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2016

    str.replace("", whatever) doesn't make sense. Why not raising an exception in this case?

    @serhiy-storchaka
    Copy link
    Member

    Agreed with Stéphane. Since count() returns 1, resulting string should contain 1 replacement.

    Using regular expressions:

    >>> re.sub('', 'x', '')
    'x'
    >>> re.sub('', 'x', '', 1)
    'x'

    @bitdancer
    Copy link
    Member

    Victor: because it would break backward compatibility to raise an error where one wasn't raised before. There's not enough motivation for that. I'm not sure about the 1 case; it would again be a change in behavior. Probably we should just document it as there isn't enough reason to risk breaking working code.

    @StphaneHenriot
    Copy link
    Mannequin Author

    StphaneHenriot mannequin commented Sep 8, 2016

    I understand it might be a rather rare case.

    Nevertheless, don't you think the inconsistency Serhiy pointed out makes it look like a bug needing a fix?

    @vadmium
    Copy link
    Member

    vadmium commented Sep 9, 2016

    There may be related discussion in bpo-24243, also about searching for empty strings. A while ago I meant to add documetation and tests for that, but I lost momentum after cleaning up the existing tests.

    Some of the behaviours are undocumented and surprising, but if you look at the implementation it is clear they are not accidental. E.g. I think there is a function called replace_interleave() or something.

    IMO you can find an unlimited number of instances of an empty string at index zero of any string. So calls like "ABC".strip("") are sensible to raise an exception, and the interleave mode of "ABC".count("") is unexpected. But I don’t see a big need to change this existing behaviour as long as it is documented.

    @serhiy-storchaka
    Copy link
    Member

    Interestingly, initially string.replace() was implemented in terms of split/join. string.replace(s, '', s2, n) returned s for any s, s2 and n.

    After making replace() a method of str, in aca7b5eaf5e8 (1999-10-12), it became raising ValueError for empty pattern string.

    Since 762dd09edb83 (bpo-599128, 2002-08-23) it supports zero-length pattern string. str.replace('', s1, s2, n) returned '' for any s1, s2 and n.

    New implementation added in 41809406a35e (2006-05-25) made str.replace('', '', 'x', -1) returning 'x' while str.replace('', '', 'x', 1) still returns ''.

    As you can see, the behavior of replacing with empty pattern is not stable. It was changed several times. Maybe it is a time for new change.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Oct 25, 2016
    @vstinner
    Copy link
    Member

    The current behavior is really surprising.

    >>> "".replace("", "|")
    '|'
    >>> "".replace("", "|", -1)
    '|'

    vs

    >>> "".replace("", "|", 0)
    ''
    >>> "".replace("", "|", 1)
    ''
    >>> "".replace("", "|", 1000)
    ''

    I always expect "|".

    ---

    This behavior makes sense to me:

    >>> "abc".replace("", "|")
    '|a|b|c|'
    >>> "abc".replace("", "|", -1)
    '|a|b|c|'
    >>> "abc".replace("", "|", 0)
    'abc'
    >>> "abc".replace("", "|", 1)
    '|abc'
    >>> "abc".replace("", "|", 100)
    '|a|b|c|'

    @vstinner
    Copy link
    Member

    Well, in fact, I would expect that "".replace(...) would always return "": for any argument passed to replace().

    @serhiy-storchaka
    Copy link
    Member

    What result would you expect of "" in "" and "".count("")?

    @vstinner
    Copy link
    Member

    What result would you expect of "" in "" and "".count("")?

    I don't suggest to change these operator and method:

    >>> "" in ""
    True
    >>> "".count("")
    1

    @serhiy-storchaka
    Copy link
    Member

    There are expected some relations between str methods. For example,

    s.replace(a, b, n) == s.replace(a, b) if n >= s.count(a)
    len(s.replace(a, b, n)) == len(s) + (len(b)-len(a)) * n if 0 <= n <= s.count(a)
    len(s.replace(a, b, n)) == len(s) + (len(b)-len(a)) * s.count(a) if n >= s.count(a)

    Inconsistency between "".replace("", s, n) and "".replace("", s) is just a bug, and it should be fixed in the most consistent way. There are precedences, the behavior of replace() already was changed 3 times in the past. I think that chances to break some code are tiny, we just fix inconsistency why can puzzle users.

    @serhiy-storchaka
    Copy link
    Member

    Note that no tests were affected by this change.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 865c3b2 by Serhiy Storchaka in branch 'master':
    bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981)
    865c3b2

    @serhiy-storchaka
    Copy link
    Member

    Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported.

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Oct 30, 2019
    @vstinner
    Copy link
    Member

    Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported.

    Yeah, I agree. It would be a mistake to change the Python behavior in a minor release.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Apr 2, 2020

    Thanks Stèphańe and Serhiy, I just discovered this strange behavior in 3.8, and wondered how my logic was wrong, until I pinpointed the inconsistent behaviour of str.replace with an empty first parameter and replace count of 1.

    Glad to see it is fixed in 3.9.

    I guess for x.replace( a, b, c ) the workaround would be

    x.replace( a, b, c ) if a else x.replace( a, b )

    At least for recent versions of Python 3.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Apr 2, 2020

    Nope:

    I guess for x.replace( a, b, c ) the workaround would be

    x.replace( a, b, c ) if x else x.replace( a, b )

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants