Message248179
Search in strings is highly optimized for common case. However for some input data the search in non-ascii string becomes unexpectedly slow. Compare:
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"є" in s'
100000 loops, best of 3: 11.7 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"Є" in s'
1000 loops, best of 3: 769 usec per loop
It's because the lowest byte of the code of Ukrainian capital letter Є (U+0404) matches the highest byte of codes of most Cyrillic letters (U+04xx). There are similar issues with some other scripts.
I think we should use more robust optimization. |
|
Date |
User |
Action |
Args |
2015-08-07 06:11:26 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, pitrou, vstinner |
2015-08-07 06:11:26 | serhiy.storchaka | set | messageid: <1438927886.02.0.407236930114.issue24821@psf.upfronthosting.co.za> |
2015-08-07 06:11:25 | serhiy.storchaka | link | issue24821 messages |
2015-08-07 06:11:24 | serhiy.storchaka | create | |
|