msg248179 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-08-07 06:11 |
Search in strings is highly optimized for common case. However for some input data the search in non-ascii string becomes unexpectedly slow. Compare:
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"є" in s'
100000 loops, best of 3: 11.7 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"Є" in s'
1000 loops, best of 3: 769 usec per loop
It's because the lowest byte of the code of Ukrainian capital letter Є (U+0404) matches the highest byte of codes of most Cyrillic letters (U+04xx). There are similar issues with some other scripts.
I think we should use more robust optimization.
|
msg250410 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-09-10 19:20 |
> I think we should use more robust optimization.
What do you propose? I don't understand the purpose of the issue. Do you want to remove the optimization?
|
msg250419 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-09-10 20:39 |
I was going to provide another optimization (I had an idea), but it is not so easy as looked to me at first glance. This issue exists rather as a reminder to me. I should make a second attempt.
|
msg254627 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-11-13 22:14 |
Proposed patch makes the degenerate case less hard while preserves the optimization for common case.
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("є")'
1000 loops, best of 3: 330 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("є")'
1000 loops, best of 3: 325 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
100 loops, best of 3: 7.81 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("Є")'
100 loops, best of 3: 8.5 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("є")'
1000 loops, best of 3: 317 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("є")'
1000 loops, best of 3: 327 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
1000 loops, best of 3: 1.1 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("Є")'
1000 loops, best of 3: 964 usec per loop
The slowdown is decreased from 25 times to 3 times.
The idea is that after memchr found false positive, make a tens iterations of simple loop before calling memchr again. This splits the cost of the memchr call with a tens of characters.
The patch also makes a little refactoring. STRINGLIB(fastsearch_memchr_1char) now is renamed and split on two functions STRINGLIB(find_char) and STRINGLIB(rfind_char) with simpler interface. All preconditional checks are moved into these functions. These functions now are directly used in other files.
|
msg254653 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-11-14 12:51 |
> The patch also makes a little refactoring. STRINGLIB(fastsearch_memchr_1char) now is renamed and split on two functions STRINGLIB(find_char) and STRINGLIB(rfind_char) with simpler interface.
Could you please push this part of the change? It looks good to me.
--
The C library provides a wmemchr() function which can be used to search for a wchar_t character inside a wchar_t* string.
* for 16-bit wchar_t (ex: Windows, AIX), it can be used for Python UCS2 strings
* for 32-bit wchar_t (ex: Linux, Mac OS X), it can be used for Python UCS4 strings
I don't know if the type is signed or not. If the type is signed, we have to ensure that wmemchr() works as expected.
--
I didn't review carefully your new heuristic to search for a character.
Did you try to not use memchr() but a simple C loop for UCS-2 and UCS-4?
|
msg254659 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2015-11-14 13:48 |
New changeset 1412be96faf0 by Serhiy Storchaka in branch 'default':
Issue #24821: Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
https://hg.python.org/cpython/rev/1412be96faf0
|
msg254661 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-11-14 14:10 |
> Could you please push this part of the change? It looks good to me.
Done. The patch that make an optimization now looks much simpler.
> The C library provides a wmemchr() function which can be used to search for a wchar_t character inside a wchar_t* string.
Yes, I know. But this is C99 function, hence we should add configure test for it, and I don't know if it exists on Windows. I did not test its performance.
> Did you try to not use memchr() but a simple C loop for UCS-2 and UCS-4?
A simple C loop is 25-50% slower. My patch tries to use memchr(), but if it founds false positive, runs a simple C loop for limited length. This makes a performance only insignificantly worse in the case of few false positives, but makes it much better in cases of often or grouped together false positives. The common case when there are no false positives is not affected.
MEMCHR_CUT_OFF is heuristic parameter. It is equal to the number of characters for which memchr()/memrchr() and a simple loop work the same time. On my machine it is 10-15 for UCS1 and 20-50 for UCS2 and UCS4 (depending on direction and character width).
|
msg269020 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-06-21 20:14 |
Ping.
|
msg274614 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-09-06 19:51 |
Could you please make a review Victor?
|
msg276001 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-09-12 08:46 |
I would commit the patch before beta 1, but:
1) The issue can be considered not as a new feature, but as a fix of performance bug (just we don't will to backport the fix to 3.5). I think the patch can be committed at the beta stage.
2) I would want to tune magic constants. I experimented on my two computers, and thing they are good enough, but would be nice to test them on other computers. I need to write automatic benchmark for running on foreign computers. This will take a time.
|
msg276005 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2016-09-12 09:01 |
If it has waited this long and is truly low priority, I think it can wait a little longer until 3.7.
|
msg290772 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-29 07:11 |
Ukrainian "Є" is not the only character suffered from this issue. I suppose much CJK characters are suffered too. The problem is occurred when the lower byte of searched character matches the upper byte of most characters in the text. For example, searching "乎" (U+4e4e) in the sequence of characters U+4eXX (but not containing U+4e4e and U+4e4f):
$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乎")'
Unpatched: Median +- std dev: 761 us +- 108 us
Patched: Median +- std dev: 117 us +- 9 us
For comparison, searching "乏" (U+4e4f) in the same sequence:
$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乏")'
Unpatched: Median +- std dev: 12.8 us +- 1.4 us
Patched: Median +- std dev: 12.6 us +- 1.7 us
Sorry, I don't know Chinese or Japanese and can't provide more realistic examples.
|
msg290774 - (view) |
Author: Louie Lu (louielu) * |
Date: 2017-03-29 07:52 |
I can now only test on Python3.6, providing much meaningful sentence,
still trying to use perf on cpython master branch.
---------------------------------------------------
$ python -m perf timeit -s 's="一件乒乓事事亏, 不乏串連产業, 万丈一争今为举, 其乎哀哉"*1000' -- 's.find("乎")'
Median +- std dev: 228 ns +- 7 ns
$ python -m perf timeit -s 's="一件乒乓事事亏, 不乏串連产業, 万丈一争今为举, 其乎哀哉"*1000' -- 's.find("乏")'
Median +- std dev: 143 ns +- 3 ns
---------------------------------------------------
$ python -m perf timeit -s 's="一件乒乓事事亏, 不乏串連产業, 万丈一争今为举, 其哀哉"*1000' -- 's.find("乎")'
^^ (missing "乎")
Median +- std dev: 100 us +- 3 us
$ python -m perf timeit -s 's="一件乒乓事事亏, 不串連产業, 万丈一争今为举, 其乎哀哉"*1000' -- 's.find("乏")'
^^ (missing "乏")
Median +- std dev: 1.67 us +- 0.05 us
|
msg290775 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2017-03-29 08:19 |
I can't give a "realistic" example. A more meaningful example may be:
'。', '\u3002', the Chinese period which used in almost every paragraph,
'地', '\u5730', which is a common used word.
./python3 -m perf timeit -s 's = "你好,我叫李雷。"*1000' 's.find("地")'
Mean +- std dev: 6.65 us +- 0.27 us
Mean +- std dev: 4.08 us +- 0.22 us
It works! :-)
|
msg290779 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-03-29 12:22 |
Would it completely kill performances to remove the optimization?
|
msg290781 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-29 13:32 |
Thank you for testing Louie. Thank you for your review and testing Xiang.
> Would it completely kill performances to remove the optimization?
Yes. The optimization is not used if the lowest byte is 0. You can try to search "\0" for getting the result without the optimization. On my netbook the optimization speeds up searching by 5.5 times.
Search with optimization:
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("Ж")'
Median +- std dev: 296 us +- 33 us
Search without optimization:
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("\0")'
Median +- std dev: 1.65 ms +- 0.10 ms
Search an unlucky character (unpatched):
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
Median +- std dev: 14.7 ms +- 1.8 ms
Search an unlucky character (patched):
$ ./python -m perf timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
Median +- std dev: 2.17 ms +- 0.24 ms
|
msg290819 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-30 06:11 |
New changeset 0a58f72762353768c7d26412e627ff196aac6c4e by Serhiy Storchaka in branch 'master':
bpo-24821: Fixed the slowing down to 25 times in the searching of some (#505)
https://github.com/python/cpython/commit/0a58f72762353768c7d26412e627ff196aac6c4e
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:19 | admin | set | github: 69009 |
2017-03-30 06:21:11 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2017-03-30 06:11:12 | serhiy.storchaka | set | messages:
+ msg290819 |
2017-03-29 13:32:52 | serhiy.storchaka | set | messages:
+ msg290781 |
2017-03-29 12:22:45 | vstinner | set | messages:
+ msg290779 |
2017-03-29 08:31:18 | malin | set | nosy:
+ malin
|
2017-03-29 08:19:53 | xiang.zhang | set | messages:
+ msg290775 |
2017-03-29 07:52:03 | louielu | set | nosy:
+ louielu messages:
+ msg290774
|
2017-03-29 07:11:23 | serhiy.storchaka | set | nosy:
+ methane, xiang.zhang messages:
+ msg290772
|
2017-03-06 09:32:05 | serhiy.storchaka | set | pull_requests:
+ pull_request413 |
2017-03-06 09:17:38 | serhiy.storchaka | set | versions:
+ Python 3.7, - Python 3.6 |
2016-09-12 09:01:47 | ned.deily | set | messages:
+ msg276005 |
2016-09-12 08:47:12 | serhiy.storchaka | set | nosy:
+ ned.deily
|
2016-09-12 08:46:56 | serhiy.storchaka | set | messages:
+ msg276001 |
2016-09-06 19:51:12 | serhiy.storchaka | set | messages:
+ msg274614 |
2016-06-21 20:14:58 | serhiy.storchaka | set | messages:
+ msg269020 |
2015-11-14 14:10:59 | serhiy.storchaka | set | files:
+ find_char_false_positives.patch
messages:
+ msg254661 |
2015-11-14 13:48:56 | python-dev | set | nosy:
+ python-dev messages:
+ msg254659
|
2015-11-14 12:51:44 | vstinner | set | messages:
+ msg254653 |
2015-11-13 22:14:44 | serhiy.storchaka | set | files:
+ find_char.patch messages:
+ msg254627
keywords:
+ patch resolution: remind -> (no value) stage: needs patch -> patch review |
2015-10-06 19:01:03 | azsorkin | set | nosy:
+ azsorkin
|
2015-09-10 20:39:39 | serhiy.storchaka | set | resolution: remind messages:
+ msg250419 |
2015-09-10 19:20:15 | vstinner | set | messages:
+ msg250410 |
2015-08-07 06:11:26 | serhiy.storchaka | create | |