Unicode performance regression in python3.3 vs python3.2 #57830

Lothiraldan · 2011-12-17T17:34:48Z

BPO	13621
Nosy	@loewis, @pitrou, @vstinner, @ezio-melotti, @florentx, @Lothiraldan, @serhiy-storchaka
Files	stringbench_log_cpython3.2: Stringbenchmark log for cpython3.2 stringbench_log_cpython3.3: String benchmark log for cpython3.3 compare.py: Script used to compute diff between two runs

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-04-27.15:31:39.330>
created_at = <Date 2011-12-17.17:34:48.283>
labels = ['interpreter-core', 'expert-unicode', 'performance']
title = 'Unicode performance regression in python3.3 vs python3.2'
updated_at = <Date 2012-04-27.15:31:39.329>
user = 'https://github.com/Lothiraldan'

bugs.python.org fields:

activity = <Date 2012-04-27.15:31:39.329>
actor = 'loewis'
assignee = 'none'
closed = True
closed_date = <Date 2012-04-27.15:31:39.330>
closer = 'loewis'
components = ['Interpreter Core', 'Unicode']
creation = <Date 2011-12-17.17:34:48.283>
creator = 'Boris.FELD'
dependencies = []
files = ['23991', '23992', '23994']
hgrepos = []
issue_num = 13621
keywords = []
message_count = 12.0
messages = ['149681', '149682', '149684', '149688', '149694', '149696', '149728', '149730', '149731', '159451', '159461', '159470']
nosy_count = 9.0
nosy_names = ['loewis', 'collinwinter', 'pitrou', 'vstinner', 'ezio.melotti', 'flox', 'Boris.FELD', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue13621'
versions = ['Python 3.3']

Lothiraldan · 2011-12-17T17:34:46Z

Hello everyone, I juste tried to launch the stringbench on python3.2 and python3.3 dev versions and some unicode tests run slower in python3.3 than in python3.2.

I cc the two raw output of both runs. I also extracted most interesting data (all the tests with more than 20% of performance regression):

("A"*1000).find("B") (*1000): -30.379747%
"Hello\t \t".rstrip() (*1000): -33.333333%
"this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%
"\nHello!\n".strip() (*1000): -33.333333%
dna.split("ACTAT") (*10): -21.066667%
"Andrew".endswith("w") (*1000): -23.529412%
"...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%
"\t \tHello".rstrip() (*1000): -33.333333%
("A"*1000).rpartition("A") (*1000): -21.212121%
("Here are some words. "*2).split() (*1000): -22.105263%
"Hello!\n".rstrip() (*1000): -35.714286%
"B" in "A"*1000 (*1000): -32.089552%
"Hello!\n".strip() (*1000): -35.714286%
"\nHello!".strip() (*1000): -28.571429%
"this\nis\na\ntest\n".split("\n") (*1000): -23.437500%
"Andrew".startswith("A") (*1000): -20.588235%
"\nHello!".rstrip() (*1000): -35.714286%
"Andrew".endswith("Andrew") (*1000): -22.857143%
"Andrew".endswith("Anders") (*1000): -23.529412%
"The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%
"Andrew".startswith("Anders") (*1000): -23.529412%
"this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%
"Andrew"+"Dalke" (*1000): -23.076923%

loewis · 2011-12-17T17:41:23Z

Thanks, this is a known issue. I'm not too worried, since they are fairly artificial. In the cases I've looked at, I don't think anything can be done about that.

vstinner · 2011-12-17T17:58:09Z

Sorted and grouped results. "replace", "find" and "concat" should be easy to fix, "format" is a little bit more complex, "strip" and "split" depends on "find" performance and require to scan the substring to ensure that the result is canonical (except if inputs are all ASCII, which is the case in these examples).

replace:

"...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%
"...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%

find:

("A"*1000).find("B") (*1000): -30.379747%
"Andrew"+"Dalke" (*1000): -23.076923%- ("A"*1000).find("B") (*1000): -30.379747%
"Andrew".startswith("A") (*1000): -20.588235%
"Andrew".startswith("Anders") (*1000): -23.529412%
"Andrew".startswith("A") (*1000): -20.588235%
"Andrew".startswith("Anders") (*1000): -23.529412%
"Andrew".endswith("w") (*1000): -23.529412%
"Andrew".endswith("Andrew") (*1000): -22.857143%
"Andrew".endswith("Anders") (*1000): -23.529412%
"Andrew".endswith("w") (*1000): -23.529412%
"Andrew".endswith("Andrew") (*1000): -22.857143%
"Andrew".endswith("Anders") (*1000): -23.529412%
"B" in "A"*1000 (*1000): -32.089552%
"B" in "A"*1000 (*1000): -32.089552%

concat:

"Andrew"+"Dalke" (*1000): -23.076923%

format:

"The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%
"The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%

strip:

"\nHello!\n".strip() (*1000): -33.333333%
"Hello!\n".strip() (*1000): -35.714286%
"\nHello!".strip() (*1000): -28.571429%
"\nHello!\n".strip() (*1000): -33.333333%
"Hello!\n".strip() (*1000): -35.714286%
"\nHello!".strip() (*1000): -28.571429%
"Hello\t \t".rstrip() (*1000): -33.333333%
"\t \tHello".rstrip() (*1000): -33.333333%
"Hello!\n".rstrip() (*1000): -35.714286%
"\nHello!".rstrip() (*1000): -35.714286%
"Hello\t \t".rstrip() (*1000): -33.333333%
"\t \tHello".rstrip() (*1000): -33.333333%
"Hello!\n".rstrip() (*1000): -35.714286%
"\nHello!".rstrip() (*1000): -35.714286%

split:

dna.split("ACTAT") (*10): -21.066667%
("Here are some words. "*2).split() (*1000): -22.105263%
"this\nis\na\ntest\n".split("\n") (*1000): -23.437500%
"this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%
dna.split("ACTAT") (*10): -21.066667%
("Here are some words. "*2).split() (*1000): -22.105263%
"this\nis\na\ntest\n".split("\n") (*1000): -23.437500%
"this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%
"this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%
"this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%
("A"*1000).rpartition("A") (*1000): -21.212121%
("A"*1000).rpartition("A") (*1000): -21.212121%

Lothiraldan · 2011-12-17T18:08:25Z

Forgot to describe my environment:
Mac OS X 10.6.8
GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
CPython3.3 revision ea421c534305
CPython3.2 revision 0b86da9d6964

vstinner · 2011-12-17T18:43:19Z

See also the issue bpo-13623 for results on bytes.

pitrou · 2011-12-17T19:13:17Z

Just a note: performance reports shouldn't be assigned to the "benchmarks" category, except if the problem is in the benchmarks themselves.

vstinner · 2011-12-18T00:59:59Z

"...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%

I also noticed a difference between Python 3.2 and 3.3, but Python 3.3 is 13% *faster* (and not slower). This benchmark is not really representative because stringbench only tests .replace() with ASCII. Replace requires to scan the result to check the next maximum character, except for ASCII. So expect a performance regression... except for ASCII.

python-dev · 2011-12-18T01:43:40Z

New changeset c802bfc8acfc by Victor Stinner in branch 'default':
Issue bpo-13621: Optimize str.replace(char1, char2)
http://hg.python.org/cpython/rev/c802bfc8acfc

vstinner · 2011-12-18T01:47:11Z

I also noticed a difference between Python 3.2 and 3.3,
but Python 3.3 is 13% *faster* (and not slower).

Oops, I misused the timeit module, there is a regression.

New changeset c802bfc8acfc by Victor Stinner in branch 'default':
Issue bpo-13621: Optimize str.replace(char1, char2)

./python -m timeit -s 'f=open("/tmp/README"); t=f.read(); f.close(); t.encode("ascii")' 't.replace("\n", " ")'

Python 3.2: 6.44 usec
Python 3.3 before: 11.6 usec
Python 3.3 after: 2.77 usec

vstinner · 2012-04-27T12:07:16Z

"Andrew"+"Dalke" (*1000): -23.076923%

/python -m timeit '"Andrew"+"Dalke"' gives me very close results with Python 3.2 (wide mode) and 3.3. Somethings like 0.15 vs 0.151 microseconds.

But using longer (ASCII) strings, Python 3.3 is 2.6x faster:

$ python3.2 -m timeit -s 'a="A"*1000; b="B"*1000' 'a+b'
1000000 loops, best of 3: 0.39 usec per loop
$ python3.3 -m timeit -s 'a="A"*1000; b="B"*1000' 'a+b'
10000000 loops, best of 3: 0.151 usec per loop

serhiy-storchaka · 2012-04-27T14:34:52Z

But try ASCII+UCS2 or ASCII+UCS4.

loewis · 2012-04-27T15:31:39Z

I'm closing this as "won't fix". The only way to get back the exact performance of 3.2 is to restore to the 3.2 implementation, which clearly is no option. I don't consider performance regressions in micro benchmarks inherently as a bug.

If there is a specific regression which people think constitutes a real problem, a separate bug report should be submitted.

Lothiraldan mannequin assigned collinwinter Dec 17, 2011

Lothiraldan mannequin added performance Performance or resource usage labels Dec 17, 2011

pitrou unassigned collinwinter Dec 17, 2011

ezio-melotti added the topic-unicode label Dec 17, 2011

ezio-melotti assigned collinwinter Dec 17, 2011

pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed performance Performance or resource usage labels Dec 17, 2011

pitrou unassigned collinwinter Dec 17, 2011

loewis mannequin closed this as completed Apr 27, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode performance regression in python3.3 vs python3.2 #57830

Unicode performance regression in python3.3 vs python3.2 #57830

Lothiraldan mannequin commented Dec 17, 2011

Lothiraldan mannequin commented Dec 17, 2011

loewis mannequin commented Dec 17, 2011

vstinner commented Dec 17, 2011

Lothiraldan mannequin commented Dec 17, 2011

vstinner commented Dec 17, 2011

pitrou commented Dec 17, 2011

vstinner commented Dec 18, 2011

python-dev mannequin commented Dec 18, 2011

vstinner commented Dec 18, 2011

vstinner commented Apr 27, 2012

serhiy-storchaka commented Apr 27, 2012

loewis mannequin commented Apr 27, 2012

Unicode performance regression in python3.3 vs python3.2 #57830

Unicode performance regression in python3.3 vs python3.2 #57830

Comments

Lothiraldan mannequin commented Dec 17, 2011

Lothiraldan mannequin commented Dec 17, 2011

loewis mannequin commented Dec 17, 2011

vstinner commented Dec 17, 2011

Lothiraldan mannequin commented Dec 17, 2011

vstinner commented Dec 17, 2011

pitrou commented Dec 17, 2011

vstinner commented Dec 18, 2011

python-dev mannequin commented Dec 18, 2011

vstinner commented Dec 18, 2011

vstinner commented Apr 27, 2012

serhiy-storchaka commented Apr 27, 2012

loewis mannequin commented Apr 27, 2012