Message 171378 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	BreamoreBoy
Recipients	BreamoreBoy
Date	2012-09-27.16:03:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1348761787.62.0.977797373088.issue16061@psf.upfronthosting.co.za>
In-reply-to

Content
Quoting Steven D'Aprano on c.l.p. "But add a call to replace, and things are very different: [steve@ando ~]$ python2.7 -m timeit -s "s = 'b'1000" "s.replace('b', 'a')" 100000 loops, best of 3: 9.3 usec per loop [steve@ando ~]$ python3.2 -m timeit -s "s = 'b'1000" "s.replace('b', 'a')" 100000 loops, best of 3: 5.43 usec per loop [steve@ando ~]$ python3.3 -m timeit -s "s = 'b'1000" "s.replace('b', 'a')" 100000 loops, best of 3: 18.3 usec per loop Three times slower, even for pure-ASCII strings. I get comparable results for Unicode. Notice how slow Python 2.7 is: [steve@ando ~]$ python2.7 -m timeit -s "s = u'你'1000" "s.replace(u'你', u'a')" 10000 loops, best of 3: 65.6 usec per loop [steve@ando ~]$ python3.2 -m timeit -s "s = '你'1000" "s.replace('你', 'a')" 100000 loops, best of 3: 2.79 usec per loop [steve@ando ~]$ python3.3 -m timeit -s "s = '你'1000" "s.replace('你', 'a')" 10000 loops, best of 3: 23.7 usec per loop Even with the performance regression, it is still over twice as fast as Python 2.7. Nevertheless, I think there is something here. The consequences are nowhere near as dramatic as jmf claims, but it does seem that replace() has taken a serious performance hit. Perhaps it is unavoidable, but perhaps not. If anyone else can confirm similar results, I think this should be raised as a performance regression. Quoting Serhiy Storchaka in response. "Yes, I confirm, it's a performance regression. It should be avoidable. Almost any PEP393 performance regression can be avoided. At least for such corner case. Just no one has yet optimized this case."

Quoting Steven D'Aprano on c.l.p.

"But add a call to replace, and things are very different:

[steve@ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 9.3 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 5.43 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 18.3 usec per loop


Three times slower, even for pure-ASCII strings. I get comparable results for Unicode. Notice how slow Python 2.7 is:

[steve@ando ~]$ python2.7 -m timeit -s "s = u'你'*1000" "s.replace(u'你', u'a')"
10000 loops, best of 3: 65.6 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
100000 loops, best of 3: 2.79 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
10000 loops, best of 3: 23.7 usec per loop

Even with the performance regression, it is still over twice as fast as
Python 2.7.

Nevertheless, I think there is something here. The consequences are nowhere near as dramatic as jmf claims, but it does seem that replace() has taken a serious performance hit. Perhaps it is unavoidable, but perhaps not.

If anyone else can confirm similar results, I think this should be raised as a performance regression.

Quoting Serhiy Storchaka in response.

"Yes, I confirm, it's a performance regression. It should be avoidable.
Almost any PEP393 performance regression can be avoided. At least for
such corner case. Just no one has yet optimized this case."

History
Date	User	Action	Args
2012-09-27 16:03:07	BreamoreBoy	set	recipients: + BreamoreBoy
2012-09-27 16:03:07	BreamoreBoy	set	messageid: <1348761787.62.0.977797373088.issue16061@psf.upfronthosting.co.za>
2012-09-27 16:03:07	BreamoreBoy	link	issue16061 messages
2012-09-27 16:03:06	BreamoreBoy	create