Issue 13621: Unicode performance regression in python3.3 vs python3.2

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57830

classification

Title:	Unicode performance regression in python3.3 vs python3.2
Type:	performance	Stage:
Components:	Interpreter Core, Unicode	Versions:	Python 3.3

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	Boris.FELD, collinwinter, ezio.melotti, flox, loewis, pitrou, python-dev, serhiy.storchaka, vstinner
Priority:	normal	Keywords:

Created on 2011-12-17 17:34 by Boris.FELD, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
stringbench_log_cpython3.2	Boris.FELD, 2011-12-17 17:34	Stringbenchmark log for cpython3.2
stringbench_log_cpython3.3	Boris.FELD, 2011-12-17 17:35	String benchmark log for cpython3.3
compare.py	Boris.FELD, 2011-12-17 17:45	Script used to compute diff between two runs

Messages (12)
msg149681 - (view)	Author: Boris FELD (Boris.FELD) *	Date: 2011-12-17 17:34
Hello everyone, I juste tried to launch the stringbench on python3.2 and python3.3 dev versions and some unicode tests run slower in python3.3 than in python3.2. I cc the two raw output of both runs. I also extracted most interesting data (all the tests with more than 20% of performance regression): - ("A"1000).find("B") (1000): -30.379747% - "Hello\t \t".rstrip() (1000): -33.333333% - "this\nis\na\ntest\n".rsplit("\n") (1000): -23.437500% - "\nHello!\n".strip() (1000): -33.333333% - dna.split("ACTAT") (10): -21.066667% - "Andrew".endswith("w") (1000): -23.529412% - "...text.with.2000.lines...replace("\n", " ") (10): -37.668161% - "\t \tHello".rstrip() (1000): -33.333333% - ("A"1000).rpartition("A") (1000): -21.212121% - ("Here are some words. "2).split() (1000): -22.105263% - "Hello!\n".rstrip() (1000): -35.714286% - "B" in "A"1000 (1000): -32.089552% - "Hello!\n".strip() (1000): -35.714286% - "\nHello!".strip() (1000): -28.571429% - "this\nis\na\ntest\n".split("\n") (1000): -23.437500% - "Andrew".startswith("A") (1000): -20.588235% - "\nHello!".rstrip() (1000): -35.714286% - "Andrew".endswith("Andrew") (1000): -22.857143% - "Andrew".endswith("Anders") (1000): -23.529412% - "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (1000): -49.411765% - "Andrew".startswith("Anders") (1000): -23.529412% - "this--is--a--test--of--the--emergency--broadcast--system".split("--") (1000): -22.429907% - "Andrew"+"Dalke" (*1000): -23.076923%
msg149682 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2011-12-17 17:41
Thanks, this is a known issue. I'm not too worried, since they are fairly artificial. In the cases I've looked at, I don't think anything can be done about that.
msg149684 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-12-17 17:58
Sorted and grouped results. "replace", "find" and "concat" should be easy to fix, "format" is a little bit more complex, "strip" and "split" depends on "find" performance and require to scan the substring to ensure that the result is canonical (except if inputs are all ASCII, which is the case in these examples). replace: - "...text.with.2000.lines...replace("\n", " ") (10): -37.668161% - "...text.with.2000.lines...replace("\n", " ") (10): -37.668161% find: - ("A"1000).find("B") (1000): -30.379747% - "Andrew"+"Dalke" (1000): -23.076923%- ("A"1000).find("B") (1000): -30.379747% - "Andrew".startswith("A") (1000): -20.588235% - "Andrew".startswith("Anders") (1000): -23.529412% - "Andrew".startswith("A") (1000): -20.588235% - "Andrew".startswith("Anders") (1000): -23.529412% - "Andrew".endswith("w") (1000): -23.529412% - "Andrew".endswith("Andrew") (1000): -22.857143% - "Andrew".endswith("Anders") (1000): -23.529412% - "Andrew".endswith("w") (1000): -23.529412% - "Andrew".endswith("Andrew") (1000): -22.857143% - "Andrew".endswith("Anders") (1000): -23.529412% - "B" in "A"1000 (1000): -32.089552% - "B" in "A"1000 (1000): -32.089552% concat: - "Andrew"+"Dalke" (1000): -23.076923% format: - "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (1000): -49.411765% - "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (1000): -49.411765% strip: - "\nHello!\n".strip() (1000): -33.333333% - "Hello!\n".strip() (1000): -35.714286% - "\nHello!".strip() (1000): -28.571429% - "\nHello!\n".strip() (1000): -33.333333% - "Hello!\n".strip() (1000): -35.714286% - "\nHello!".strip() (1000): -28.571429% - "Hello\t \t".rstrip() (1000): -33.333333% - "\t \tHello".rstrip() (1000): -33.333333% - "Hello!\n".rstrip() (1000): -35.714286% - "\nHello!".rstrip() (1000): -35.714286% - "Hello\t \t".rstrip() (1000): -33.333333% - "\t \tHello".rstrip() (1000): -33.333333% - "Hello!\n".rstrip() (1000): -35.714286% - "\nHello!".rstrip() (1000): -35.714286% split: - dna.split("ACTAT") (10): -21.066667% - ("Here are some words. "2).split() (1000): -22.105263% - "this\nis\na\ntest\n".split("\n") (1000): -23.437500% - "this--is--a--test--of--the--emergency--broadcast--system".split("--") (1000): -22.429907% - dna.split("ACTAT") (10): -21.066667% - ("Here are some words. "2).split() (1000): -22.105263% - "this\nis\na\ntest\n".split("\n") (1000): -23.437500% - "this--is--a--test--of--the--emergency--broadcast--system".split("--") (1000): -22.429907% - "this\nis\na\ntest\n".rsplit("\n") (1000): -23.437500% - "this\nis\na\ntest\n".rsplit("\n") (1000): -23.437500% - ("A"1000).rpartition("A") (1000): -21.212121% - ("A"1000).rpartition("A") (1000): -21.212121%
msg149688 - (view)	Author: Boris FELD (Boris.FELD) *	Date: 2011-12-17 18:08
Forgot to describe my environment: Mac OS X 10.6.8 GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3) CPython3.3 revision ea421c534305 CPython3.2 revision 0b86da9d6964
msg149694 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-12-17 18:43
See also the issue #13623 for results on bytes.
msg149696 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-12-17 19:13
Just a note: performance reports shouldn't be assigned to the "benchmarks" category, except if the problem is in the benchmarks themselves.
msg149728 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-12-18 00:59
> "...text.with.2000.lines...replace("\n", " ") (10): -37.668161% I also noticed a difference between Python 3.2 and 3.3, but Python 3.3 is 13% faster* (and not slower). This benchmark is not really representative because stringbench only tests .replace() with ASCII. Replace requires to scan the result to check the next maximum character, except for ASCII. So expect a performance regression... except for ASCII.
msg149730 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-12-18 01:43
New changeset c802bfc8acfc by Victor Stinner in branch 'default': Issue #13621: Optimize str.replace(char1, char2) http://hg.python.org/cpython/rev/c802bfc8acfc
msg149731 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-12-18 01:47
> I also noticed a difference between Python 3.2 and 3.3, > but Python 3.3 is 13% faster (and not slower). Oops, I misused the timeit module, there is a regression. > New changeset c802bfc8acfc by Victor Stinner in branch 'default': > Issue #13621: Optimize str.replace(char1, char2) ./python -m timeit -s 'f=open("/tmp/README"); t=f.read(); f.close(); t.encode("ascii")' 't.replace("\n", " ")' Python 3.2: 6.44 usec Python 3.3 before: 11.6 usec Python 3.3 after: 2.77 usec
msg159451 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-04-27 12:07
> "Andrew"+"Dalke" (1000): -23.076923% /python -m timeit '"Andrew"+"Dalke"' gives me very close results with Python 3.2 (wide mode) and 3.3. Somethings like 0.15 vs 0.151 microseconds. But using longer (ASCII) strings, Python 3.3 is 2.6x faster: $ python3.2 -m timeit -s 'a="A"1000; b="B"1000' 'a+b' 1000000 loops, best of 3: 0.39 usec per loop $ python3.3 -m timeit -s 'a="A"1000; b="B"*1000' 'a+b' 10000000 loops, best of 3: 0.151 usec per loop
msg159461 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-04-27 14:34
But try ASCII+UCS2 or ASCII+UCS4.
msg159470 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-04-27 15:31
I'm closing this as "won't fix". The only way to get back the exact performance of 3.2 is to restore to the 3.2 implementation, which clearly is no option. I don't consider performance regressions in micro benchmarks inherently as a bug. If there is a specific regression which people think constitutes a real problem, a separate bug report should be submitted.

History
Date	User	Action	Args
2022-04-11 14:57:24	admin	set	github: 57830
2012-04-27 15:31:39	loewis	set	status: open -> closed resolution: wont fix messages: + msg159470
2012-04-27 14:34:51	serhiy.storchaka	set	messages: + msg159461
2012-04-27 12:07:16	vstinner	set	messages: + msg159451
2012-04-07 16:06:32	serhiy.storchaka	set	nosy: + serhiy.storchaka
2011-12-18 01:47:11	vstinner	set	messages: + msg149731
2011-12-18 01:43:40	python-dev	set	nosy: + python-dev messages: + msg149730
2011-12-18 00:59:59	vstinner	set	messages: + msg149728
2011-12-17 19:13:17	pitrou	set	assignee: collinwinter -> messages: + msg149696 components: + Interpreter Core, - Benchmarks versions: - Python 3.2
2011-12-17 19:02:13	ezio.melotti	set	nosy: + ezio.melotti components: + Unicode assignee: collinwinter
2011-12-17 18:56:37	vstinner	set	nosy: + flox
2011-12-17 18:43:19	vstinner	set	messages: + msg149694
2011-12-17 18:08:25	Boris.FELD	set	messages: + msg149688
2011-12-17 17:58:08	vstinner	set	messages: + msg149684
2011-12-17 17:45:19	Boris.FELD	set	files: + compare.py
2011-12-17 17:44:58	Boris.FELD	set	files: - stringbench.py
2011-12-17 17:43:39	pitrou	set	nosy: + pitrou, vstinner
2011-12-17 17:41:22	loewis	set	nosy: + loewis messages: + msg149682
2011-12-17 17:39:40	pitrou	set	assignee: collinwinter -> (no value)
2011-12-17 17:35:42	Boris.FELD	set	files: + stringbench.py
2011-12-17 17:35:12	Boris.FELD	set	files: + stringbench_log_cpython3.3
2011-12-17 17:34:48	Boris.FELD	create