classification
Title: Replace and empty strings
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, Stéphane Henriot, ericvw, georg.brandl, martin.panter, r.david.murray, rhettinger, serhiy.storchaka, v+python, vstinner
Priority: normal Keywords: patch

Created on 2016-09-08 20:30 by Stéphane Henriot, last changed 2020-04-02 05:39 by v+python. This issue is now closed.

Files
File name Uploaded Description Edit
issue28029.diff SilentGhost, 2016-09-08 20:44 review
Pull Requests
URL Status Linked Edit
PR 16981 merged serhiy.storchaka, 2019-10-29 07:40
Messages (21)
msg275140 - (view) Author: Stéphane Henriot (Stéphane Henriot) Date: 2016-09-08 20:30
A few days ago, the following behavior surprised me.

>>> "".replace("", "prefix", 1)
''
>>> "".replace("", "prefix")
'prefix'

It seems to me this edge case isn't correctly documented/implemented. I tested with python 2.7, 3.4 and 3.5, I guess it applies for other versions as well.

Here are a few elements, for reference.

« string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced. »

>>> "" in ""
True
>>> "".count("")
1
>>> "".find("")
0

https://hg.python.org/cpython/file/2.7/Objects/stringobject.c#l2750
https://hg.python.org/cpython/file/default/Objects/unicodeobject.c#l10479

Thanks,

Stéphane.
msg275147 - (view) Author: Stéphane Henriot (Stéphane Henriot) Date: 2016-09-08 20:43
For what it's worth, here is the behavior in PyPy 2.2.1

>>>> "".replace("", "prefix", 1)
'prefix'
>>>> "".replace("", "prefix")
'prefix'
msg275149 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-09-08 20:46
Just the least intrusive patch. Also, to me PyPy behaviour doesn't seem correct.
msg275152 - (view) Author: Stéphane Henriot (Stéphane Henriot) Date: 2016-09-08 20:54
Thanks for your help.

However, I'm not sure I would agree, regarding the correct behavior.

I guess the main question is « What is an occurrence? ».

Are you not convinced that in, count and find indicate occurrences?
To my understanding, the empty string occurs basically everywhere in strings (including beginning and end):

>>> "blabla".replace("", "ε")
'εbεlεaεbεlεaε'

To give a bit more (useless?) context, I was implementing a GCD algorithm based on replacements by Knuth (very beginning of TAOCP) and it looks like the current cpython implementation gives 0 for gcd(1, 1) :(
msg275159 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-08 21:24
str.replace("", whatever) doesn't make sense. Why not raising an exception in this case?
msg275162 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-08 21:26
Agreed with Stéphane. Since count() returns 1, resulting string should contain 1 replacement.

Using regular expressions:

>>> re.sub('', 'x', '')
'x'
>>> re.sub('', 'x', '', 1)
'x'
msg275168 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-08 21:30
Victor: because it would break backward compatibility to raise an error where one wasn't raised before.  There's not enough motivation for that.  I'm not sure about the 1 case; it would again be a change in behavior.  Probably we should just document it as there isn't enough reason to risk breaking working code.
msg275183 - (view) Author: Stéphane Henriot (Stéphane Henriot) Date: 2016-09-08 22:16
I understand it might be a rather rare case.

Nevertheless, don't you think the inconsistency Serhiy pointed out makes it look like a bug needing a fix?
msg275225 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-09 00:29
There may be related discussion in Issue 24243, also about searching for empty strings. A while ago I meant to add documetation and tests for that, but I lost momentum after cleaning up the existing tests.

Some of the behaviours are undocumented and surprising, but if you look at the implementation it is clear they are not accidental. E.g. I think there is a function called replace_interleave() or something.

IMO you can find an unlimited number of instances of an empty string at index zero of any string. So calls like "ABC".strip("") are sensible to raise an exception, and the interleave mode of "ABC".count("") is unexpected. But I don’t see a big need to change this existing behaviour as long as it is documented.
msg279414 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-25 15:24
Interestingly, initially string.replace() was implemented in terms of split/join. string.replace(s, '', s2, n) returned s for any s, s2 and n.

After making replace() a method of str, in aca7b5eaf5e8 (1999-10-12), it became raising ValueError for empty pattern string.

Since 762dd09edb83 (issue599128, 2002-08-23) it supports zero-length pattern string. str.replace('', s1, s2, n) returned '' for any s1, s2 and n.

New implementation added in 41809406a35e (2006-05-25) made str.replace('', '', 'x', -1) returning 'x' while str.replace('', '', 'x', 1) still returns ''.

As you can see, the behavior of replacing with empty pattern is not stable. It was changed several times. Maybe it is a time for new change.
msg355556 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-28 15:58
The current behavior is really surprising.

>>> "".replace("", "|")
'|'
>>> "".replace("", "|", -1)
'|'

vs

>>> "".replace("", "|", 0)
''
>>> "".replace("", "|", 1)
''
>>> "".replace("", "|", 1000)
''

I always expect "|".

---

This behavior makes sense to me:

>>> "abc".replace("", "|")
'|a|b|c|'
>>> "abc".replace("", "|", -1)
'|a|b|c|'
>>> "abc".replace("", "|", 0)
'abc'
>>> "abc".replace("", "|", 1)
'|abc'
>>> "abc".replace("", "|", 100)
'|a|b|c|'
msg355558 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-28 16:07
Well, in fact, I would expect that "".replace(...) would always return "": for any argument passed to replace().
msg355562 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-28 16:19
What result would you expect of `"" in ""` and `"".count("")`?
msg355566 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-28 16:50
> What result would you expect of `"" in ""` and `"".count("")`?

I don't suggest to change these operator and method:

>>> "" in ""
True
>>> "".count("")
1
msg355627 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-29 07:56
There are expected some relations between str methods. For example,

s.replace(a, b, n) == s.replace(a, b)   if n >= s.count(a)
len(s.replace(a, b, n)) == len(s) + (len(b)-len(a)) * n   if 0 <= n <= s.count(a)
len(s.replace(a, b, n)) == len(s) + (len(b)-len(a)) * s.count(a)   if n >= s.count(a)

Inconsistency between "".replace("", s, n) and "".replace("", s) is just a bug, and it should be fixed in the most consistent way. There are precedences, the behavior of replace() already was changed 3 times in the past. I think that chances to break some code are tiny, we just fix inconsistency why can puzzle users.
msg355628 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-29 07:57
Note that no tests were affected by this change.
msg355703 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-30 10:03
New changeset 865c3b257fe38154a4320c7ee6afb416f665b9c2 by Serhiy Storchaka in branch 'master':
bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981)
https://github.com/python/cpython/commit/865c3b257fe38154a4320c7ee6afb416f665b9c2
msg355704 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-30 10:06
Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported.
msg355707 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-30 11:32
> Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported.

Yeah, I agree. It would be a mistake to change the Python behavior in a minor release.
msg365570 - (view) Author: Glenn Linderman (v+python) * Date: 2020-04-02 05:30
Thanks Stèphańe and Serhiy, I just discovered this strange behavior in 3.8, and wondered how my logic was wrong, until I pinpointed the inconsistent behaviour of str.replace with an empty first parameter and replace count of 1.

Glad to see it is fixed in 3.9.

I guess for   x.replace( a, b, c )  the workaround would be

x.replace( a, b, c ) if a else x.replace( a, b )

At least for recent versions of Python 3.
msg365571 - (view) Author: Glenn Linderman (v+python) * Date: 2020-04-02 05:39
Nope:

I guess for   x.replace( a, b, c )  the workaround would be

x.replace( a, b, c ) if x else x.replace( a, b )
History
Date User Action Args
2020-04-02 05:39:20v+pythonsetmessages: + msg365571
2020-04-02 05:30:08v+pythonsetnosy: + v+python
messages: + msg365570
2019-10-30 11:32:55vstinnersetmessages: + msg355707
2019-10-30 10:06:16serhiy.storchakasetstatus: open -> closed
versions: + Python 3.9, - Python 2.7, Python 3.5, Python 3.6, Python 3.7
messages: + msg355704

resolution: fixed
stage: patch review -> resolved
2019-10-30 10:03:56serhiy.storchakasetmessages: + msg355703
2019-10-29 22:14:00rhettingersetmessages: - msg355681
2019-10-29 22:13:50rhettingersetmessages: - msg355679
2019-10-29 21:34:36serhiy.storchakasetmessages: + msg355681
2019-10-29 21:31:18rhettingersetnosy: + rhettinger
messages: + msg355679
2019-10-29 07:57:37serhiy.storchakasetmessages: + msg355628
2019-10-29 07:56:19serhiy.storchakasetmessages: + msg355627
2019-10-29 07:40:26serhiy.storchakasetpull_requests: + pull_request16507
2019-10-28 16:50:02vstinnersetmessages: + msg355566
2019-10-28 16:19:29serhiy.storchakasetmessages: + msg355562
2019-10-28 16:07:24vstinnersetmessages: + msg355558
2019-10-28 15:58:59vstinnersetmessages: + msg355556
2016-10-25 15:24:49serhiy.storchakasetmessages: + msg279414
versions: + Python 3.7
2016-09-09 17:26:42Stéphane Henriotsetnosy: + georg.brandl
2016-09-09 00:29:45martin.pantersetnosy: + martin.panter

messages: + msg275225
versions: + Python 2.7
2016-09-08 22:16:49Stéphane Henriotsetmessages: + msg275183
2016-09-08 21:30:29r.david.murraysetnosy: + r.david.murray
messages: + msg275168
2016-09-08 21:26:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg275162
2016-09-08 21:24:18vstinnersetnosy: + vstinner
messages: + msg275159
2016-09-08 20:58:52ericvwsetnosy: + ericvw
2016-09-08 20:54:18Stéphane Henriotsetmessages: + msg275152
2016-09-08 20:46:31SilentGhostsetnosy: + SilentGhost

messages: + msg275149
stage: needs patch -> patch review
2016-09-08 20:44:56SilentGhostsetfiles: + issue28029.diff
keywords: + patch
2016-09-08 20:43:25Stéphane Henriotsetmessages: + msg275147
2016-09-08 20:38:44SilentGhostsetstage: needs patch
components: + Interpreter Core
versions: + Python 3.5, Python 3.6
2016-09-08 20:30:41Stéphane Henriotcreate