New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode performance regression in python3.3 vs python3.2 #57830
Comments
Hello everyone, I juste tried to launch the stringbench on python3.2 and python3.3 dev versions and some unicode tests run slower in python3.3 than in python3.2. I cc the two raw output of both runs. I also extracted most interesting data (all the tests with more than 20% of performance regression):
|
Thanks, this is a known issue. I'm not too worried, since they are fairly artificial. In the cases I've looked at, I don't think anything can be done about that. |
Sorted and grouped results. "replace", "find" and "concat" should be easy to fix, "format" is a little bit more complex, "strip" and "split" depends on "find" performance and require to scan the substring to ensure that the result is canonical (except if inputs are all ASCII, which is the case in these examples). replace:
find:
concat:
format:
strip:
split:
|
Forgot to describe my environment: |
See also the issue bpo-13623 for results on bytes. |
Just a note: performance reports shouldn't be assigned to the "benchmarks" category, except if the problem is in the benchmarks themselves. |
I also noticed a difference between Python 3.2 and 3.3, but Python 3.3 is 13% *faster* (and not slower). This benchmark is not really representative because stringbench only tests .replace() with ASCII. Replace requires to scan the result to check the next maximum character, except for ASCII. So expect a performance regression... except for ASCII. |
New changeset c802bfc8acfc by Victor Stinner in branch 'default': |
Oops, I misused the timeit module, there is a regression.
./python -m timeit -s 'f=open("/tmp/README"); t=f.read(); f.close(); t.encode("ascii")' 't.replace("\n", " ")' Python 3.2: 6.44 usec |
/python -m timeit '"Andrew"+"Dalke"' gives me very close results with Python 3.2 (wide mode) and 3.3. Somethings like 0.15 vs 0.151 microseconds. But using longer (ASCII) strings, Python 3.3 is 2.6x faster: $ python3.2 -m timeit -s 'a="A"*1000; b="B"*1000' 'a+b'
1000000 loops, best of 3: 0.39 usec per loop
$ python3.3 -m timeit -s 'a="A"*1000; b="B"*1000' 'a+b'
10000000 loops, best of 3: 0.151 usec per loop |
But try ASCII+UCS2 or ASCII+UCS4. |
I'm closing this as "won't fix". The only way to get back the exact performance of 3.2 is to restore to the 3.2 implementation, which clearly is no option. I don't consider performance regressions in micro benchmarks inherently as a bug. If there is a specific regression which people think constitutes a real problem, a separate bug report should be submitted. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: