Issue 13126: find() slower than rfind()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57335

classification

Title:	find() slower than rfind()
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	flox, pitrou, python-dev, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2011-10-07 19:46 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
findperf.patch	pitrou, 2011-10-07 19:46		review

Messages (7)
msg145136 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-10-07 19:46
With some gcc versions, str.find() is slower than str.rfind(): - 11.22 0.0 s="ABC"33; ((s+"D")500+s+"E").find(s+"E") (100) - 4.56 0.0 s="ABC"33; ((s+"D")500+"E"+s).find("E"+s) (100) - 6.71 0.0 s="ABC"33; (s+"E") in ((s+"D")300+s+"E") (100) - 11.22 0.0 s="ABC"33; ((s+"D")500+s+"E").index(s+"E") (100) - 11.52 0.0 s="ABC"33; ((s+"D")500+s+"E").partition(s+"E") (100) - 8.79 0.0 s="ABC"33; ("E"+s+("D"+s)500).rfind("E"+s) (100) - 3.86 0.0 s="ABC"33; (s+"E"+("D"+s)500).rfind(s+"E") (100) - 8.80 0.0 s="ABC"33; ("E"+s+("D"+s)500).rindex("E"+s) (100) - 9.80 0.0 s="ABC"33; ("E"+s+("D"+s)500).rpartition("E"+s) (100) - 9.83 0.0 s="ABC"33; ("E"+s+("D"+s)500).rsplit("E"+s, 1) (100) - 11.56 0.0 s="ABC"33; ((s+"D")500+s+"E").split(s+"E", 1) (*100) Attached patch seems to fix it.
msg157846 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-04-09 14:29
I checked one example on a 32-bit system (you have a 64-bit?)), because I was afraid pessimization because of a lack of registers. str.find() is faster than str.rfind(), but the patch makes it even faster. But I would like to see the script and the results of benchmarking of the 1/2/3/20-character ascii/ucs1/ucs2/ucs4-substring in ascii/ucs1/ucs2/ucs4-string, in all possible combinations. May be, such benchmark scripts already exist?
msg157849 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-04-09 15:16
> But I would like to see the script and the results of benchmarking of > the 1/2/3/20-character ascii/ucs1/ucs2/ucs4-substring in ascii/ucs1 > /ucs2/ucs4-string, in all possible combinations. May be, such benchmark > scripts already exist? stringbench (the tool which produced those results) now exists in Tools/stringbench/stringbench.py.
msg157853 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-04-09 15:44
> stringbench (the tool which produced those results) now exists in Tools/stringbench/stringbench.py. Thank you, yesterday they were not.
msg157872 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-04-09 17:57
I used stringbench and self-writen script (see issue13165) for comparison and saw no convincing difference. The difference to str.find does not exceed accidental deviations for other functions which are not affected by the patch. Apparently, the accuracy of stringbench is not enough for a reliable measurement. ========== late match, 100 characters +8.6 +8.8 s="ABC"33; ((s+"D")500+s+"E").find(s+"E") (100) +0.1 +0.2 s="ABC"33; ((s+"D")500+"E"+s).find("E"+s) (100) +7.9 +7.4 s="ABC"33; (s+"E") in ((s+"D")300+s+"E") (100) +7.2 +7.3 s="ABC"33; ((s+"D")500+s+"E").index(s+"E") (100) +8.0 +7.9 s="ABC"33; ((s+"D")500+s+"E").partition(s+"E") (100) -4.3 -4.3 s="ABC"33; ("E"+s+("D"+s)500).rfind("E"+s) (100) -4.9 -6.9 s="ABC"33; (s+"E"+("D"+s)500).rfind(s+"E") (100) -3.0 -3.0 s="ABC"33; ("E"+s+("D"+s)500).rindex("E"+s) (100) -3.7 -4.2 s="ABC"33; ("E"+s+("D"+s)500).rpartition("E"+s) (100) -4.0 -2.6 s="ABC"33; ("E"+s+("D"+s)500).rsplit("E"+s, 1) (100) +28.0 +6.5 s="ABC"33; ((s+"D")500+s+"E").split(s+"E", 1) (*100)
msg186251 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-04-07 22:05
I still see a difference between find and rfind, even if the different is low (11%). $ ./python -m timeit -s 's="ABC"33; a=((s+"D")500+s+"E"); b=s+"E"' 'a.find(b)' 10000 loops, best of 3: 93.6 usec per loop $ ./python -m timeit -s 's="ABC"33; a=("E"+s+("D"+s)500); b="E"+s' 'a.rfind(b)' 10000 loops, best of 3: 84.3 usec per loop Patched Python: $ ./python -m timeit -s 's="ABC"33; a=((s+"D")500+s+"E"); b=s+"E"' 'a.find(b)' 10000 loops, best of 3: 84.7 usec per loop $ ./python -m timeit -s 's="ABC"33; a=("E"+s+("D"+s)500); b="E"+s' 'a.rfind(b)' 10000 loops, best of 3: 84.5 usec per loop I'm unable to explain why GCC (4.7 in my case) produces faster code with the patch, but the patch is simple and does not make the code (much) harder to understand. So Antoine, please go ahead and apply it.
msg186253 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-04-07 22:27
New changeset c5e2ea9e3aa7 by Victor Stinner in branch 'default': Close #13126: "Simplify" FASTSEARCH() code to help the compiler to emit more http://hg.python.org/cpython/rev/c5e2ea9e3aa7

History
Date	User	Action	Args
2022-04-11 14:57:22	admin	set	github: 57335
2013-04-07 22:27:13	python-dev	set	status: open -> closed nosy: + python-dev messages: + msg186253 resolution: fixed stage: patch review -> resolved
2013-04-07 22:05:29	vstinner	set	messages: + msg186251 versions: + Python 3.4, - Python 3.3
2012-04-09 17:57:15	serhiy.storchaka	set	messages: + msg157872
2012-04-09 15:44:49	serhiy.storchaka	set	messages: + msg157853
2012-04-09 15:16:21	pitrou	set	messages: + msg157849
2012-04-09 14:29:25	serhiy.storchaka	set	messages: + msg157846
2012-04-07 17:25:48	serhiy.storchaka	set	nosy: + serhiy.storchaka
2011-11-16 16:41:28	flox	set	priority: low -> normal stage: patch review
2011-10-09 21:09:13	flox	set	nosy: + flox
2011-10-07 19:46:35	pitrou	create