classification
Title: Bytes performance regression in python3.3 vs python3.2
Type: performance Stage:
Components: Benchmarks Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: collinwinter Nosy List: Boris.FELD, collinwinter, ezio.melotti, flox, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2011-12-17 18:22 by Boris.FELD, last changed 2011-12-18 00:29 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
stringbench_log_cpython3.2 Boris.FELD, 2011-12-17 18:22 Stringbenchmark log for cpython3.2
compare.py Boris.FELD, 2011-12-17 18:23 Script used to compute diff between two runs
stringbench_log_cpython3.3 Boris.FELD, 2011-12-17 18:25 String benchmark log for cpython3.3
bytes_find.patch vstinner, 2011-12-17 23:33 review
bytes_find-2.patch vstinner, 2011-12-18 00:03 review
Messages (7)
msg149689 - (view) Author: Boris FELD (Boris.FELD) * Date: 2011-12-17 18:22
Hello everyone, I juste tried to launch the stringbench on python3.2 and python3.3 dev versions and some bytes tests run slower in python3.3 than in python3.2.

I cc the two raw output of both runs. I also extracted most interesting data (all the tests with more than 20% of performance regression):
- (b"A"*1000).rfind(b"A") (*1000): -70.103093%
- (b"A"*1000).find(b"B") (*1000): -48.372093%
- (b"A"*1000).rindex(b"A") (*1000): -68.888889%
- s=b"ABC"*33; (s+b"E"+(b"D"+s)*500).rfind(s+b"E") (*100): -28.982301%
- (b"C"+b"AB"*300).rfind(b"CA") (*1000): -29.565217%
- (b"AB"*1000).index(b"AB") (*1000): -68.539326%
- b"Andrew".endswith(b"w") (*1000): -21.212121%
- (b"A"*1000).index(b"A") (*1000): -71.111111%
- (b"BC"+b"AB"*300).rfind(b"BC") (*1000): -42.788462%
- b"Andrew".startswith(b"Andrew") (*1000): -20.588235%
- (b"AB"*1000).find(b"AB") (*1000): -69.318182%
- (b"AB"*1000).rfind(b"AB") (*1000): -69.791667%
- (b"A"*1000).rfind(b"B") (*1000): -37.988827%
- (b"AB"*300+"C").index(b"BC") (*1000): -28.750000%
- b"B" in b"A"*1000 (*1000): -24.479167%
- (b"AB"*300+"CA").find(b"CA") (*1000): -33.673469%
- (b"AB"*1000).rindex(b"AB") (*1000): -67.777778%
- (b"C"+"AB"*300).rindex(b"CA") (*1000): -29.017857%
- (b"AB"*300+"C").find(b"BC") (*1000): -28.451883%
- b"Andrew".startswith(b"A") (*1000): -21.212121%
- b"Andrew".startswith(b"Anders") (*1000): -21.212121%
- (b"A"*1000).partition(b"B") (*1000): -30.656934%
- (b"AB"*1000).rfind(b"CA") (*1000): -20.603015%
- (b"AB"*1000).rfind(b"BC") (*1000): -35.645472%
- (b"A"*1000).find(b"A") (*1000): -70.454545%

My environment is:
Mac OS X 10.6.8
GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
CPython3.3 revision ea421c534305
CPython3.2 revision 0b86da9d6964
msg149691 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 18:42
Grouped results.

find (first):

- (b"A"*1000).find(b"A")    : -70%
- (b"A"*1000).rfind(b"A")   : -70%
- (b"A"*1000).index(b"A")   : -71%
- (b"A"*1000).rindex(b"A")  : -68%

- (b"AB"*1000).index(b"AB") : -68%
- (b"AB"*1000).rindex(b"AB"): -67%
- (b"AB"*1000).find(b"AB")  : -69%
- (b"AB"*1000).rfind(b"AB") : -69%

- b"Andrew".startswith(b"Andrew"): -20%
- b"Andrew".startswith(b"A")     : -21%
- b"Andrew".startswith(b"Anders"): -21%

- b"Andrew".endswith(b"w"): -21%

find (last):

- (b"AB"*300+"CA").find(b"CA")  : -33%
- (b"C"+"AB"*300).rindex(b"CA") : -29%
- (b"AB"*300+"C").find(b"BC")   : -28%
- (b"AB"*300+"C").index(b"BC")  : -28%
- (b"C"+b"AB"*300).rfind(b"CA") : -29%
- (b"BC"+b"AB"*300).rfind(b"BC"): -42%
- s=b"ABC"*33; (s+b"E"+(b"D"+s)*500).rfind(s+b"E"): -28%

find (not found):

- (b"A"*1000).find(b"B")    : -48%
- (b"A"*1000).rfind(b"B")   : -37%
- (b"AB"*1000).rfind(b"CA") : -20%
- (b"AB"*1000).rfind(b"BC") : -35%

others:

- b"B" in b"A"*1000           : -24%
- (b"A"*1000).partition(b"B") : -30%
msg149693 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 18:43
See also the issue #13621 for results on Unicode.
msg149718 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 23:33
> (b"A"*1000).find(b"A")    : -70%

This one is a performance regression introduced by #12170. Attached patch checks object type before trying a conversion to size_t instead of catching an exception.
msg149720 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-18 00:00
bytes_find.patch only works for Python int, not object with the __index__ method. My new patch (bytes_find-2.patch) uses PyNumber_Check() instead of PyLong_Check() to be more generic. It fixes also a different issue: raise the same ValueError than bytes.find(-1) on overflow error.
msg149721 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-12-18 00:17
New changeset 75648db1b3f3 by Victor Stinner in branch 'default':
Issue #13623: Fix a performance regression introduced by issue #12170 in
http://hg.python.org/cpython/rev/75648db1b3f3
msg149725 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-18 00:29
I checked stringbench: there is no more performance regression (difference of more than 20%).
History
Date User Action Args
2011-12-18 00:29:18vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg149725
2011-12-18 00:17:54python-devsetnosy: + python-dev
messages: + msg149721
2011-12-18 00:03:27vstinnersetfiles: + bytes_find-2.patch
2011-12-18 00:03:17vstinnersetfiles: - bytes_find-2.patch
2011-12-18 00:00:05vstinnersetfiles: + bytes_find-2.patch

messages: + msg149720
2011-12-17 23:33:02vstinnersetfiles: + bytes_find.patch
keywords: + patch
messages: + msg149718
2011-12-17 19:02:28ezio.melottisetnosy: + ezio.melotti
2011-12-17 18:56:22vstinnersetnosy: + flox
2011-12-17 18:43:06vstinnersetmessages: + msg149693
2011-12-17 18:42:05vstinnersetnosy: + vstinner
messages: + msg149691
2011-12-17 18:25:58Boris.FELDsetfiles: + stringbench_log_cpython3.3
2011-12-17 18:25:46Boris.FELDsetfiles: - iobench_log_python3.3
2011-12-17 18:23:38Boris.FELDsetfiles: + compare.py
2011-12-17 18:23:24Boris.FELDsetfiles: + iobench_log_python3.3
2011-12-17 18:22:48Boris.FELDcreate