This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make str.count one character for latin1 string faster
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: serhiy.storchaka, vstinner, xiang.zhang
Priority: low Keywords: patch

Created on 2016-12-09 14:38 by xiang.zhang, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
latin1-str-count-one-character.patch xiang.zhang, 2016-12-09 14:38 review
Messages (5)
msg282784 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-09 14:38
I make a try to improve str.count one character for latin1 string. From benchmark, it could enhance the speed depending on the length of the string.

# short one, a little regression
./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefg"' 's.count("a")'
python: ..................... 190 ns +- 1 ns
python3: ..................... 197 ns +- 2 ns

Median +- std dev: [python] 190 ns +- 1 ns -> [python3] 197 ns +- 2 ns: 1.04x slower

# a little longer
./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghi"' 's.count("a")'
python: ..................... 192 ns +- 15 ns
python3: ..................... 184 ns +- 2 ns

Median +- std dev: [python] 192 ns +- 15 ns -> [python3] 184 ns +- 2 ns: 1.05x faster

# longer
./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghihijklmnopqrstuvwxyz~!@##$%^&*()-=_+{}|"' 's.count("a")'
python: ..................... 236 ns +- 12 ns
python3: ..................... 209 ns +- 9 ns

Median +- std dev: [python] 236 ns +- 12 ns -> [python3] 209 ns +- 9 ns: 1.13x faster

# very long
./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefgHijk"*100' 's.count("z")'
python: ..................... 1.02 us +- 0.00 us
python3: ..................... 610 ns +- 1 ns

Median +- std dev: [python] 1.02 us +- 0.00 us -> [python3] 610 ns +- 1 ns: 1.67x faster

And since str.replace may also go through the code path involving count, it's somewhat affected:

./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghihijklmnopqrstuvwxyz~!@##$%^&*()-=_+{}|"' 's.replace("a", "ccc")'
python: ..................... 318 ns +- 31 ns
python3: ..................... 298 ns +- 19 ns

Median +- std dev: [python] 318 ns +- 31 ns -> [python3] 298 ns +- 19 ns: 1.07x faster

For ucs2 and ucs4, the benchmarks are not appealing so leave them untouched.
msg282785 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-12-09 14:53
Hum, Serhiy is better than me to review such hardcore C code :-) I let him review the patch ;-)

> short one, a little regression

In stringlib, the usual solution is to use a threshold: use a dummy loop for less than N bytes, otherwise use the ultra-optimized loop. Serhiy event implemented a "dynamic" threshold in some functions, when too many false positive are found. I don't recall where.

"And since str.replace may also go through the code path involving count, it's somewhat affected: (...) 1.07x faster"

I'm not really excited by optimizing str.count, since I don't think that this function is commonly used. But if str.replace is made faster, I'm interested :-)

I understand that count() is only used when the old and new patterns of str.replace() have a different length.
msg282800 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-09 19:33
> I understand that count() is only used when the old and new patterns of str.replace() have a different length.

Yes. I thought it won't help much since str.replace get many operations. But for long string, looks good:

./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghihijklmnopqrstuvwxyz~!@##$%^&*()-=_+{}|"*100' 's.replace("a", "bc")'
python: ..................... 7.36 us +- 0.04 us
python3: ..................... 4.91 us +- 0.04 us

Median +- std dev: [python] 7.36 us +- 0.04 us -> [python3] 4.91 us +- 0.04 us: 1.50x faster  # 50% ??!! how?

And this patch also applies to bytes since they share codes.
msg282803 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-09 20:46
The code looks too complex. I think that it is not worth to complicate the code so much for optimizing just few non-critical string operations.

Many optimizations were rejected in the past due to high cost and low benefit.
msg283027 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-12 16:49
> The code looks too complex.

It is if looking at the patch alone. But the skill is used across stringlib and the only thing new is the bitwise operation.

str/bytes.count is not critical but not bad if it could be optimized and especially the effect is significant for long data.

I am not favour of rejecting it now but set priority to low.
History
Date User Action Args
2022-04-11 14:58:40adminsetgithub: 73107
2016-12-12 16:49:06xiang.zhangsetpriority: normal -> low

messages: + msg283027
2016-12-09 20:46:43serhiy.storchakasetmessages: + msg282803
2016-12-09 19:33:28xiang.zhangsetmessages: + msg282800
2016-12-09 14:53:28vstinnersetmessages: + msg282785
2016-12-09 14:38:24xiang.zhangcreate