Message 361284 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	methane
Recipients	methane, serhiy.storchaka, vstinner
Date	2020-02-03.12:10:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1580731803.25.0.179853581899.issue39087@roundup.psfhosted.org>
In-reply-to

Content
I am still not sure about we should add new API only for avoiding cache. * PyUnicode_AsUTF8String : When we need bytes or want to avoid cache. * PyUnicode_AsUTF8AndSize : When we need C string, and cache is acceptable. With PR-18327, PyUnicode_AsUTF8AndSize become 10+% faster than master branch, and same speed to PyUnicode_AsUTF8String. ## vs master $ ./python -m pyperf timeit --compare-to=../cpython/python --python-names master:patched -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello", "こんにちは")' master: ..................... 96.6 us +- 3.3 us patched: ..................... 83.3 us +- 0.3 us Mean +- std dev: [master] 96.6 us +- 3.3 us -> [patched] 83.3 us +- 0.3 us: 1.16x faster (-14%) ## vs AsUTF8String $ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello", "こんにちは")' ..................... Mean +- std dev: 83.2 us +- 0.2 us $ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8string as b' -- 'b(1000, "hello", "こんにちは")' ..................... Mean +- std dev: 81.9 us +- 2.1 us ## vs AsUTF8String (ASCII) If we can not accept cache, PyUnicode_AsUTF8String is slower than PyUnicode_AsUTF8 when the unicode is ASCII string. PyUnicode_GetUTF8Buffer helps only this case. $ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello", "world")' ..................... Mean +- std dev: 37.5 us +- 1.7 us $ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8string as b' -- 'b(1000, "hello", "world")' ..................... Mean +- std dev: 46.4 us +- 1.6 us

I am still not sure about we should add new API only for avoiding cache.

* PyUnicode_AsUTF8String : When we need bytes or want to avoid cache.
* PyUnicode_AsUTF8AndSize : When we need C string, and cache is acceptable.


With PR-18327, PyUnicode_AsUTF8AndSize become 10+% faster than master branch, and same speed to PyUnicode_AsUTF8String.


## vs master

$ ./python -m pyperf timeit --compare-to=../cpython/python --python-names master:patched -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello", "こんにちは")'
master: ..................... 96.6 us +- 3.3 us
patched: ..................... 83.3 us +- 0.3 us

Mean +- std dev: [master] 96.6 us +- 3.3 us -> [patched] 83.3 us +- 0.3 us: 1.16x faster (-14%)


## vs AsUTF8String

$ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello", "こんにちは")'
.....................
Mean +- std dev: 83.2 us +- 0.2 us

$ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8string as b' -- 'b(1000, "hello", "こんにちは")'
.....................
Mean +- std dev: 81.9 us +- 2.1 us


## vs AsUTF8String (ASCII)

If we can not accept cache, PyUnicode_AsUTF8String is slower than PyUnicode_AsUTF8 when the unicode is ASCII string.  PyUnicode_GetUTF8Buffer helps only this case.

$ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, "hello", "world")'
.....................
Mean +- std dev: 37.5 us +- 1.7 us

$ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8string as b' -- 'b(1000, "hello", "world")'
.....................
Mean +- std dev: 46.4 us +- 1.6 us

History
Date	User	Action	Args
2020-02-03 12:10:03	methane	set	recipients: + methane, vstinner, serhiy.storchaka
2020-02-03 12:10:03	methane	set	messageid: <1580731803.25.0.179853581899.issue39087@roundup.psfhosted.org>
2020-02-03 12:10:03	methane	link	issue39087 messages
2020-02-03 12:10:03	methane	create