msg275140 - (view) |
Author: Stéphane Henriot (Stéphane Henriot) |
Date: 2016-09-08 20:30 |
A few days ago, the following behavior surprised me.
>>> "".replace("", "prefix", 1)
''
>>> "".replace("", "prefix")
'prefix'
It seems to me this edge case isn't correctly documented/implemented. I tested with python 2.7, 3.4 and 3.5, I guess it applies for other versions as well.
Here are a few elements, for reference.
« string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced. »
>>> "" in ""
True
>>> "".count("")
1
>>> "".find("")
0
https://hg.python.org/cpython/file/2.7/Objects/stringobject.c#l2750
https://hg.python.org/cpython/file/default/Objects/unicodeobject.c#l10479
Thanks,
Stéphane.
|
msg275147 - (view) |
Author: Stéphane Henriot (Stéphane Henriot) |
Date: 2016-09-08 20:43 |
For what it's worth, here is the behavior in PyPy 2.2.1
>>>> "".replace("", "prefix", 1)
'prefix'
>>>> "".replace("", "prefix")
'prefix'
|
msg275149 - (view) |
Author: SilentGhost (SilentGhost) * |
Date: 2016-09-08 20:46 |
Just the least intrusive patch. Also, to me PyPy behaviour doesn't seem correct.
|
msg275152 - (view) |
Author: Stéphane Henriot (Stéphane Henriot) |
Date: 2016-09-08 20:54 |
Thanks for your help.
However, I'm not sure I would agree, regarding the correct behavior.
I guess the main question is « What is an occurrence? ».
Are you not convinced that in, count and find indicate occurrences?
To my understanding, the empty string occurs basically everywhere in strings (including beginning and end):
>>> "blabla".replace("", "ε")
'εbεlεaεbεlεaε'
To give a bit more (useless?) context, I was implementing a GCD algorithm based on replacements by Knuth (very beginning of TAOCP) and it looks like the current cpython implementation gives 0 for gcd(1, 1) :(
|
msg275159 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-09-08 21:24 |
str.replace("", whatever) doesn't make sense. Why not raising an exception in this case?
|
msg275162 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-09-08 21:26 |
Agreed with Stéphane. Since count() returns 1, resulting string should contain 1 replacement.
Using regular expressions:
>>> re.sub('', 'x', '')
'x'
>>> re.sub('', 'x', '', 1)
'x'
|
msg275168 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2016-09-08 21:30 |
Victor: because it would break backward compatibility to raise an error where one wasn't raised before. There's not enough motivation for that. I'm not sure about the 1 case; it would again be a change in behavior. Probably we should just document it as there isn't enough reason to risk breaking working code.
|
msg275183 - (view) |
Author: Stéphane Henriot (Stéphane Henriot) |
Date: 2016-09-08 22:16 |
I understand it might be a rather rare case.
Nevertheless, don't you think the inconsistency Serhiy pointed out makes it look like a bug needing a fix?
|
msg275225 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-09-09 00:29 |
There may be related discussion in Issue 24243, also about searching for empty strings. A while ago I meant to add documetation and tests for that, but I lost momentum after cleaning up the existing tests.
Some of the behaviours are undocumented and surprising, but if you look at the implementation it is clear they are not accidental. E.g. I think there is a function called replace_interleave() or something.
IMO you can find an unlimited number of instances of an empty string at index zero of any string. So calls like "ABC".strip("") are sensible to raise an exception, and the interleave mode of "ABC".count("") is unexpected. But I don’t see a big need to change this existing behaviour as long as it is documented.
|
msg279414 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-10-25 15:24 |
Interestingly, initially string.replace() was implemented in terms of split/join. string.replace(s, '', s2, n) returned s for any s, s2 and n.
After making replace() a method of str, in aca7b5eaf5e8 (1999-10-12), it became raising ValueError for empty pattern string.
Since 762dd09edb83 (issue599128, 2002-08-23) it supports zero-length pattern string. str.replace('', s1, s2, n) returned '' for any s1, s2 and n.
New implementation added in 41809406a35e (2006-05-25) made str.replace('', '', 'x', -1) returning 'x' while str.replace('', '', 'x', 1) still returns ''.
As you can see, the behavior of replacing with empty pattern is not stable. It was changed several times. Maybe it is a time for new change.
|
msg355556 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-10-28 15:58 |
The current behavior is really surprising.
>>> "".replace("", "|")
'|'
>>> "".replace("", "|", -1)
'|'
vs
>>> "".replace("", "|", 0)
''
>>> "".replace("", "|", 1)
''
>>> "".replace("", "|", 1000)
''
I always expect "|".
---
This behavior makes sense to me:
>>> "abc".replace("", "|")
'|a|b|c|'
>>> "abc".replace("", "|", -1)
'|a|b|c|'
>>> "abc".replace("", "|", 0)
'abc'
>>> "abc".replace("", "|", 1)
'|abc'
>>> "abc".replace("", "|", 100)
'|a|b|c|'
|
msg355558 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-10-28 16:07 |
Well, in fact, I would expect that "".replace(...) would always return "": for any argument passed to replace().
|
msg355562 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-10-28 16:19 |
What result would you expect of `"" in ""` and `"".count("")`?
|
msg355566 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-10-28 16:50 |
> What result would you expect of `"" in ""` and `"".count("")`?
I don't suggest to change these operator and method:
>>> "" in ""
True
>>> "".count("")
1
|
msg355627 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-10-29 07:56 |
There are expected some relations between str methods. For example,
s.replace(a, b, n) == s.replace(a, b) if n >= s.count(a)
len(s.replace(a, b, n)) == len(s) + (len(b)-len(a)) * n if 0 <= n <= s.count(a)
len(s.replace(a, b, n)) == len(s) + (len(b)-len(a)) * s.count(a) if n >= s.count(a)
Inconsistency between "".replace("", s, n) and "".replace("", s) is just a bug, and it should be fixed in the most consistent way. There are precedences, the behavior of replace() already was changed 3 times in the past. I think that chances to break some code are tiny, we just fix inconsistency why can puzzle users.
|
msg355628 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-10-29 07:57 |
Note that no tests were affected by this change.
|
msg355703 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-10-30 10:03 |
New changeset 865c3b257fe38154a4320c7ee6afb416f665b9c2 by Serhiy Storchaka in branch 'master':
bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981)
https://github.com/python/cpython/commit/865c3b257fe38154a4320c7ee6afb416f665b9c2
|
msg355704 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-10-30 10:06 |
Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported.
|
msg355707 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-10-30 11:32 |
> Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported.
Yeah, I agree. It would be a mistake to change the Python behavior in a minor release.
|
msg365570 - (view) |
Author: Glenn Linderman (v+python) * |
Date: 2020-04-02 05:30 |
Thanks Stèphańe and Serhiy, I just discovered this strange behavior in 3.8, and wondered how my logic was wrong, until I pinpointed the inconsistent behaviour of str.replace with an empty first parameter and replace count of 1.
Glad to see it is fixed in 3.9.
I guess for x.replace( a, b, c ) the workaround would be
x.replace( a, b, c ) if a else x.replace( a, b )
At least for recent versions of Python 3.
|
msg365571 - (view) |
Author: Glenn Linderman (v+python) * |
Date: 2020-04-02 05:39 |
Nope:
I guess for x.replace( a, b, c ) the workaround would be
x.replace( a, b, c ) if x else x.replace( a, b )
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:36 | admin | set | github: 72216 |
2020-04-02 05:39:20 | v+python | set | messages:
+ msg365571 |
2020-04-02 05:30:08 | v+python | set | nosy:
+ v+python messages:
+ msg365570
|
2019-10-30 11:32:55 | vstinner | set | messages:
+ msg355707 |
2019-10-30 10:06:16 | serhiy.storchaka | set | status: open -> closed versions:
+ Python 3.9, - Python 2.7, Python 3.5, Python 3.6, Python 3.7 messages:
+ msg355704
resolution: fixed stage: patch review -> resolved |
2019-10-30 10:03:56 | serhiy.storchaka | set | messages:
+ msg355703 |
2019-10-29 22:14:00 | rhettinger | set | messages:
- msg355681 |
2019-10-29 22:13:50 | rhettinger | set | messages:
- msg355679 |
2019-10-29 21:34:36 | serhiy.storchaka | set | messages:
+ msg355681 |
2019-10-29 21:31:18 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg355679
|
2019-10-29 07:57:37 | serhiy.storchaka | set | messages:
+ msg355628 |
2019-10-29 07:56:19 | serhiy.storchaka | set | messages:
+ msg355627 |
2019-10-29 07:40:26 | serhiy.storchaka | set | pull_requests:
+ pull_request16507 |
2019-10-28 16:50:02 | vstinner | set | messages:
+ msg355566 |
2019-10-28 16:19:29 | serhiy.storchaka | set | messages:
+ msg355562 |
2019-10-28 16:07:24 | vstinner | set | messages:
+ msg355558 |
2019-10-28 15:58:59 | vstinner | set | messages:
+ msg355556 |
2016-10-25 15:24:49 | serhiy.storchaka | set | messages:
+ msg279414 versions:
+ Python 3.7 |
2016-09-09 17:26:42 | Stéphane Henriot | set | nosy:
+ georg.brandl
|
2016-09-09 00:29:45 | martin.panter | set | nosy:
+ martin.panter
messages:
+ msg275225 versions:
+ Python 2.7 |
2016-09-08 22:16:49 | Stéphane Henriot | set | messages:
+ msg275183 |
2016-09-08 21:30:29 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg275168
|
2016-09-08 21:26:07 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg275162
|
2016-09-08 21:24:18 | vstinner | set | nosy:
+ vstinner messages:
+ msg275159
|
2016-09-08 20:58:52 | ericvw | set | nosy:
+ ericvw
|
2016-09-08 20:54:18 | Stéphane Henriot | set | messages:
+ msg275152 |
2016-09-08 20:46:31 | SilentGhost | set | nosy:
+ SilentGhost
messages:
+ msg275149 stage: needs patch -> patch review |
2016-09-08 20:44:56 | SilentGhost | set | files:
+ issue28029.diff keywords:
+ patch |
2016-09-08 20:43:25 | Stéphane Henriot | set | messages:
+ msg275147 |
2016-09-08 20:38:44 | SilentGhost | set | stage: needs patch components:
+ Interpreter Core versions:
+ Python 3.5, Python 3.6 |
2016-09-08 20:30:41 | Stéphane Henriot | create | |