classification
Title: The optimization of string search can cause pessimization
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Ma Lin, azsorkin, inada.naoki, louielu, ned.deily, pitrou, python-dev, serhiy.storchaka, vstinner, xiang.zhang
Priority: low Keywords: patch

Created on 2015-08-07 06:11 by serhiy.storchaka, last changed 2017-03-30 06:21 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
find_char.patch serhiy.storchaka, 2015-11-13 22:14 review
find_char_false_positives.patch serhiy.storchaka, 2015-11-14 14:10 review
Pull Requests
URL Status Linked Edit
PR 505 merged serhiy.storchaka, 2017-03-06 09:32
Messages (17)
msg248179 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-08-07 06:11
Search in strings is highly optimized for common case. However for some input data the search in non-ascii string becomes unexpectedly slow. Compare:

$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"є" in s'
100000 loops, best of 3: 11.7 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"Є" in s'
1000 loops, best of 3: 769 usec per loop

It's because the lowest byte of the code of Ukrainian capital letter Є (U+0404) matches the highest byte of codes of most Cyrillic letters (U+04xx). There are similar issues with some other scripts.

I think we should use more robust optimization.
msg250410 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-10 19:20
> I think we should use more robust optimization.

What do you propose? I don't understand the purpose of the issue. Do you want to remove the optimization?
msg250419 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-10 20:39
I was going to provide another optimization (I had an idea), but it is not so easy as looked to me at first glance. This issue exists rather as a reminder to me. I should make a second attempt.
msg254627 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-13 22:14
Proposed patch makes the degenerate case less hard while preserves the optimization for common case.

$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("є")'
1000 loops, best of 3: 330 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("є")'
1000 loops, best of 3: 325 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
100 loops, best of 3: 7.81 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("Є")'
100 loops, best of 3: 8.5 msec per loop

$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("є")'
1000 loops, best of 3: 317 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("є")'
1000 loops, best of 3: 327 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
1000 loops, best of 3: 1.1 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("Є")'
1000 loops, best of 3: 964 usec per loop

The slowdown is decreased from 25 times to 3 times.

The idea is that after memchr found false positive, make a tens iterations of simple loop before calling memchr again. This splits the cost of the memchr call with a tens of characters.

The patch also makes a little refactoring. STRINGLIB(fastsearch_memchr_1char) now is renamed and split on two functions STRINGLIB(find_char) and STRINGLIB(rfind_char) with simpler interface. All preconditional checks are moved into these functions. These functions now are directly used in other files.
msg254653 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-14 12:51
> The patch also makes a little refactoring. STRINGLIB(fastsearch_memchr_1char) now is renamed and split on two functions STRINGLIB(find_char) and STRINGLIB(rfind_char) with simpler interface.

Could you please push this part of the change? It looks good to me.

--

The C library provides a wmemchr() function which can be used to search for a wchar_t character inside a wchar_t* string.

* for 16-bit wchar_t (ex: Windows, AIX), it can be used for Python UCS2 strings
* for 32-bit wchar_t (ex: Linux, Mac OS X), it can be used for Python UCS4 strings

I don't know if the type is signed or not. If the type is signed, we have to ensure that wmemchr() works as expected.

--

I didn't review carefully your new heuristic to search for a character.

Did you try to not use memchr() but a simple C loop for UCS-2 and UCS-4?
msg254659 - (view) Author: Roundup Robot (python-dev) Date: 2015-11-14 13:48
New changeset 1412be96faf0 by Serhiy Storchaka in branch 'default':
Issue #24821: Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
https://hg.python.org/cpython/rev/1412be96faf0
msg254661 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-14 14:10
> Could you please push this part of the change? It looks good to me.

Done. The patch that make an optimization now looks much simpler.

> The C library provides a wmemchr() function which can be used to search for a wchar_t character inside a wchar_t* string.

Yes, I know. But this is C99 function, hence we should add configure test for it, and I don't know if it exists on Windows. I did not test its performance.

> Did you try to not use memchr() but a simple C loop for UCS-2 and UCS-4?

A simple C loop is 25-50% slower. My patch tries to use memchr(), but if it founds false positive, runs a simple C loop for limited length. This makes a performance only insignificantly worse in the case of few false positives, but makes it much better in cases of often or grouped together false positives. The common case when there are no false positives is not affected.

MEMCHR_CUT_OFF is heuristic parameter. It is equal to the number of characters for which memchr()/memrchr() and a simple loop work the same time. On my machine it is 10-15 for UCS1 and 20-50 for UCS2 and UCS4 (depending on direction and character width).
msg269020 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-21 20:14
Ping.
msg274614 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-06 19:51
Could you please make a review Victor?
msg276001 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-12 08:46
I would commit the patch before beta 1, but:

1) The issue can be considered not as a new feature, but as a fix of performance bug (just we don't will to backport the fix to 3.5). I think the patch can be committed at the beta stage.

2) I would want to tune magic constants. I experimented on my two computers, and thing they are good enough, but would be nice to test them on other computers. I need to write automatic benchmark for running on foreign computers. This will take a time.
msg276005 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-09-12 09:01
If it has waited this long and is truly low priority, I think it can wait a little longer until 3.7.
msg290772 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-29 07:11
Ukrainian "Є" is not the only character suffered from this issue. I suppose much CJK characters are suffered too. The problem is occurred when the lower byte of searched character matches the upper byte of most  characters in the text. For example, searching "乎" (U+4e4e) in the sequence of characters U+4eXX (but not containing U+4e4e and U+4e4f):

$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乎")'

Unpatched:  Median +- std dev: 761 us +- 108 us
Patched:    Median +- std dev: 117 us +- 9 us

For comparison, searching "乏" (U+4e4f) in the same sequence:

$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乏")'

Unpatched:  Median +- std dev: 12.8 us +- 1.4 us
Patched:    Median +- std dev: 12.6 us +- 1.7 us

Sorry, I don't know Chinese or Japanese and can't provide more realistic examples.
msg290774 - (view) Author: Louie Lu (louielu) * Date: 2017-03-29 07:52
I can now only test on Python3.6, providing much meaningful sentence,
still trying to use perf on cpython master branch.

---------------------------------------------------

$ python -m perf timeit -s 's="一件乒乓事事亏, 不乏串連产業, 万丈一争今为举, 其乎哀哉"*1000' -- 's.find("乎")'

Median +- std dev: 228 ns +- 7 ns


$ python -m perf timeit -s 's="一件乒乓事事亏, 不乏串連产業, 万丈一争今为举, 其乎哀哉"*1000' -- 's.find("乏")'  

Median +- std dev: 143 ns +- 3 ns

---------------------------------------------------

$ python -m perf timeit -s 's="一件乒乓事事亏, 不乏串連产業, 万丈一争今为举, 其哀哉"*1000' -- 's.find("乎")'  
                                                                       ^^ (missing "乎")
Median +- std dev: 100 us +- 3 us

$ python -m perf timeit -s 's="一件乒乓事事亏, 不串連产業, 万丈一争今为举, 其乎哀哉"*1000' -- 's.find("乏")'
                                             ^^ (missing "乏")
Median +- std dev: 1.67 us +- 0.05 us
msg290775 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-03-29 08:19
I can't give a "realistic" example. A more meaningful example may be:

'。', '\u3002', the Chinese period which used in almost every paragraph,
'地', '\u5730', which is a common used word.

./python3 -m perf timeit  -s 's = "你好,我叫李雷。"*1000' 's.find("地")'
Mean +- std dev: 6.65 us +- 0.27 us
Mean +- std dev: 4.08 us +- 0.22 us

It works! :-)
msg290779 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-29 12:22
Would it completely kill performances to remove the optimization?
msg290781 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-29 13:32
Thank you for testing Louie. Thank you for your review and testing Xiang.

> Would it completely kill performances to remove the optimization?

Yes. The optimization is not used if the lowest byte is 0. You can try to search "\0" for getting the result without the optimization. On my netbook the optimization speeds up searching by 5.5 times.

Search with optimization:
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("Ж")'
Median +- std dev: 296 us +- 33 us

Search without optimization:
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("\0")'
Median +- std dev: 1.65 ms +- 0.10 ms

Search an unlucky character (unpatched):
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
Median +- std dev: 14.7 ms +- 1.8 ms

Search an unlucky character (patched):
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
Median +- std dev: 2.17 ms +- 0.24 ms
msg290819 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 06:11
New changeset 0a58f72762353768c7d26412e627ff196aac6c4e by Serhiy Storchaka in branch 'master':
bpo-24821: Fixed the slowing down to 25 times in the searching of some (#505)
https://github.com/python/cpython/commit/0a58f72762353768c7d26412e627ff196aac6c4e
History
Date User Action Args
2017-03-30 06:21:11serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-03-30 06:11:12serhiy.storchakasetmessages: + msg290819
2017-03-29 13:32:52serhiy.storchakasetmessages: + msg290781
2017-03-29 12:22:45vstinnersetmessages: + msg290779
2017-03-29 08:31:18Ma Linsetnosy: + Ma Lin
2017-03-29 08:19:53xiang.zhangsetmessages: + msg290775
2017-03-29 07:52:03louielusetnosy: + louielu
messages: + msg290774
2017-03-29 07:11:23serhiy.storchakasetnosy: + inada.naoki, xiang.zhang
messages: + msg290772
2017-03-06 09:32:05serhiy.storchakasetpull_requests: + pull_request413
2017-03-06 09:17:38serhiy.storchakasetversions: + Python 3.7, - Python 3.6
2016-09-12 09:01:47ned.deilysetmessages: + msg276005
2016-09-12 08:47:12serhiy.storchakasetnosy: + ned.deily
2016-09-12 08:46:56serhiy.storchakasetmessages: + msg276001
2016-09-06 19:51:12serhiy.storchakasetmessages: + msg274614
2016-06-21 20:14:58serhiy.storchakasetmessages: + msg269020
2015-11-14 14:10:59serhiy.storchakasetfiles: + find_char_false_positives.patch

messages: + msg254661
2015-11-14 13:48:56python-devsetnosy: + python-dev
messages: + msg254659
2015-11-14 12:51:44vstinnersetmessages: + msg254653
2015-11-13 22:14:44serhiy.storchakasetfiles: + find_char.patch
messages: + msg254627

keywords: + patch
resolution: remind -> (no value)
stage: needs patch -> patch review
2015-10-06 19:01:03azsorkinsetnosy: + azsorkin
2015-09-10 20:39:39serhiy.storchakasetresolution: remind
messages: + msg250419
2015-09-10 19:20:15vstinnersetmessages: + msg250410
2015-08-07 06:11:26serhiy.storchakacreate