Message 237137 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	serhiy.storchaka
Date	2015-03-03.13:53:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1425390810.95.0.0960267231917.issue23573@psf.upfronthosting.co.za>
In-reply-to

Content
Currently str.find() and similar methods can make a copy of self or searched string if they have different kinds. In some cases this is redundant because the result can be known before trying to search. Longer string can't be found in shorter string and wider string can't be found in narrower string. Proposed patch avoid creating temporary widened copies in such corner cases. It also adds special cases for searching 1-character strings. Some sample microbenchmark results: $ ./python -m timeit -s "a = 'x'; b = 'x\U00012345'" -- "b.find(a)" Unpatched: 1000000 loops, best of 3: 1.92 usec per loop Patched: 1000000 loops, best of 3: 1.03 usec per loop $ ./python -m timeit -s "a = 'x'; b = 'x\U00012345'" -- "a in b" Unpatched: 1000000 loops, best of 3: 0.543 usec per loop Patched: 1000000 loops, best of 3: 0.25 usec per loop $ ./python -m timeit -s "a = '\U00012345'; b = 'x'1000" -- "b.find(a)" Unpatched: 100000 loops, best of 3: 4.58 usec per loop Patched: 1000000 loops, best of 3: 0.969 usec per loop $ ./python -m timeit -s "a = 'x'1000; b = '\U00012345'" -- "b.find(a)" Unpatched: 100000 loops, best of 3: 3.77 usec per loop Patched: 1000000 loops, best of 3: 0.97 usec per loop $ ./python -m timeit -s "a = 'x'*1000; b = '\U00012345'" -- "a in b" Unpatched: 100000 loops, best of 3: 2.4 usec per loop Patched: 1000000 loops, best of 3: 0.225 usec per loop

Currently str.find() and similar methods can make a copy of self or searched string if they have different kinds. In some cases this is redundant because the result can be known before trying to search. Longer string can't be found in shorter string and wider string can't be found in narrower string. Proposed patch avoid creating temporary widened copies in such corner cases. It also adds special cases for searching 1-character strings.

Some sample microbenchmark results:

$ ./python -m timeit -s "a = 'x'; b = 'x\U00012345'" -- "b.find(a)"
Unpatched: 1000000 loops, best of 3: 1.92 usec per loop
Patched:   1000000 loops, best of 3: 1.03 usec per loop

$ ./python -m timeit -s "a = 'x'; b = 'x\U00012345'" -- "a in b"
Unpatched: 1000000 loops, best of 3: 0.543 usec per loop
Patched:   1000000 loops, best of 3: 0.25 usec per loop

$ ./python -m timeit -s "a = '\U00012345'; b = 'x'*1000" -- "b.find(a)"
Unpatched: 100000 loops, best of 3: 4.58 usec per loop
Patched:   1000000 loops, best of 3: 0.969 usec per loop

$ ./python -m timeit -s "a = 'x'*1000; b = '\U00012345'" -- "b.find(a)"
Unpatched: 100000 loops, best of 3: 3.77 usec per loop
Patched:   1000000 loops, best of 3: 0.97 usec per loop

$ ./python -m timeit -s "a = 'x'*1000; b = '\U00012345'" -- "a in b"
Unpatched: 100000 loops, best of 3: 2.4 usec per loop
Patched:   1000000 loops, best of 3: 0.225 usec per loop

History
Date	User	Action	Args
2015-03-03 13:53:31	serhiy.storchaka	set	recipients: + serhiy.storchaka
2015-03-03 13:53:30	serhiy.storchaka	set	messageid: <1425390810.95.0.0960267231917.issue23573@psf.upfronthosting.co.za>
2015-03-03 13:53:30	serhiy.storchaka	link	issue23573 messages
2015-03-03 13:53:30	serhiy.storchaka	create