New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance regression in string replace for 3.3 #60265
Comments
Quoting Steven D'Aprano on c.l.p. "But add a call to replace, and things are very different: [steve@ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')" Three times slower, even for pure-ASCII strings. I get comparable results for Unicode. Notice how slow Python 2.7 is: [steve@ando ~]$ python2.7 -m timeit -s "s = u'你'*1000" "s.replace(u'你', u'a')" Even with the performance regression, it is still over twice as fast as Nevertheless, I think there is something here. The consequences are nowhere near as dramatic as jmf claims, but it does seem that replace() has taken a serious performance hit. Perhaps it is unavoidable, but perhaps not. If anyone else can confirm similar results, I think this should be raised as a performance regression. Quoting Serhiy Storchaka in response. "Yes, I confirm, it's a performance regression. It should be avoidable. |
My results aren't quite as dramatic as yours, but there does appear to be a regression: $ ./python -V
Python 2.7.3+
$ ./python -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 16.5 usec per loop
$ ./python -V
Python 3.3.0rc3+
$ ./python -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
10000 loops, best of 3: 22.7 usec per loop |
Python 3.3 is 2x faster than Python 3.2 to replace a character with $ python3.2 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='à'; s.replace(ch, after)"
100000 loops, best of 3: 3.62 usec per loop
$ python3.3 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='à'; s.replace(ch, after)"
1000000 loops, best of 3: 1.36 usec per loop
$ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='Ł'; s.replace(ch, after)"
100000 loops, best of 3: 3.15 usec per loop
$ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='Ł'; s.replace(ch, after)"
1000000 loops, best of 3: 1.91 usec per loop More seriously, I changed the algorithm of str.replace(before, after) In Python 3.2, the code looks like: for (i = 0; i < u->length; i++) {
if (u->str[i] == u1) {
if (--maxcount < 0)
break;
u->str[i] = u2;
}
} In Python 3.3, the code looks like: pos = findchar(sbuf, PyUnicode_KIND(self), slen, u1, 1);
if (pos < 0)
goto nothing;
...
while (--maxcount)
{
pos++;
src += pos * PyUnicode_KIND(self);
slen -= pos;
index += pos;
pos = findchar(src, PyUnicode_KIND(self), slen, u1, 1);
if (pos < 0)
break;
PyUnicode_WRITE(rkind, PyUnicode_DATA(u), index + pos, u2);
} |
I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:
In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase. Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark: python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)" -- An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine. |
You can hybridize them. First just compare chars and if not match then use |
Oh, you're patch is simple and it's amazing fast! I compare unicode with Common platform: Platform of campaign 2.7-bytes: Platform of campaign 2.7-unicode: Platform of campaign 3.2-wide: Platform of campaign 3.4: Platform of campaign 3.4-patch: ----------------+-----------------+-----------------+-----------------+-----------------+---------------- ** Compare 3.2, 3.4 and 3.4 patched: ----------------+-------------+-----------------+--------------- The patch should be completed to optimize also other Unicode kinds. |
I'm working on it. Here are benchmark scripts which I use. First tests regular strings (replace |
The performance numbers are very nice, but the patch needs a comment about the optimization, IMO. |
After much experimentation, I suggest the new patch. Benchmark results (time of replacing 1 of n character (ch1 to ch2) in 100000- Py3.2 Py3.3 patch n ch1 ch2 fill 231 (-13%) 3025 (-93%) 200 1 'a' 'b' 'c' On other platforms other numbers are possible. Especially I'm interested in |
I going speed up other cases for replace(), but for now I have only this patch. Is it good? Should I apply it to 3.3 as there is a 3.3 regression? |
As __ap__ says, it would be nice to have a comment. |
64-bit linux results: 3.2 3.3 patch 133 (-28%) 1343 (-93%) 96 1 'a' 'b' 'c' 133 (-23%) 1470 (-93%) 103 1 '\u010a' '\u010b' '\u010c' 1687 (-91%) 1362 (-89%) 150 1 '\U0001000a' '\U0001000b' '\U0001000c' |
64-bit windows results: 3.3 patched 925 (-90%) 97 1 'a' 'b' 'c' 1332 (-92%) 106 1 '\u010a' '\u010b' '\u010c' 1346 (-88%) 160 1 '\U0001000a' '\U0001000b' '\U0001000c' |
Oh, I thought I had already done this. |
str_replace_1char.patch: why not implementing replace_1char_inplace() in stringlib, with one version per character type (UCS1, UCS2, UCS4)? I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs two loops for str_replace_1char.patch, with a threshold of 10 different characters). Why do you changed your algorithm? Is str_replace_1char.patch algorithm more efficient than unicode_2.patch algorithm? Is the speedup really interesting? |
Because there are no benefits to do it. All three versions (UCS1, UCS2, and
Yes, UCS1-implementation in str_replace_1char.patch is more complicated, but
You can run benchmarks and compare results. str_replace_1char.patch provides |
How can we move this issu forward? I still prefer unicode_2.patch over str_replace_1char.patch because the code is simpler and so easier to maintain. str_replace_1char.patch has a "bug": replace_1char() does not use "pos" for the latin1 path. |
My experiments last September, before this was filed, showed that str.find (index) had most of the relative slowdown of str.replace. I assumed at that time that .replace used .find or .index to find substrings to replace, so that the fix for .replace would include speeding up .find/index. Looking at the patches, I see that I was wrong; I guess because .replace copies as it goes. I will open a separate issue sometime unless .find gets fixed here. For both .find and .replace, the regression was worse on Windows than on linux, so the patches should be tested on Windows also. If I can get vc++ express 2008 installed and working on my current substitute machine. I will give them a try. |
Here is an updated patch. Some comments added (I will be grateful for help in the improvement of these comments), an implementation moved to stringlib (a new file Objects/stringlib/replace.h added). unicode_2.patch optimizes only too special case and I consider this is not worth the effort. str_replace_1char*.patch cover a wider area and designed to be faster than 3.2 and 3.3 in most cases and not to be significant slower in corner cases. |
str_replace_1char_2.patch looks good to me. Just one nit: please add a reference to this issue in the comment (in replace.h). |
New changeset d396e0716bf4 by Serhiy Storchaka in branch 'default': |
Thanks to Ezio Melotti and Daniel Shahaf for their great help in correcting my clumsy wording. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: