classification
Title: Convert memoryview to Argument Clinic
Type: performance Stage: resolved
Components: Argument Clinic, Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: larry, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2020-07-09 19:48 by serhiy.storchaka, last changed 2020-07-19 11:44 by skrah. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 21421 merged serhiy.storchaka, 2020-07-09 19:50
Messages (5)
msg373425 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-09 19:48
The proposed PR converts Objects/memoryobject.c to Argument Clinic.

Advantages:

* Highly optimized code is used to parse arguments instead of slow PyArg_ParseTupleAndKeywords().
* All future improvements of argument parsing (better performance or errors handling) will be automatically applied to memoryobject.

Previously Argument Clinic was used for memoryview.hex().
msg373887 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-18 08:12
New changeset 80a50368c0e4dc9d56af0ce748dea35c9d96d23f by Serhiy Storchaka in branch 'master':
bpo-41262: Convert memoryview to Argument Clinic. (GH-21421)
https://github.com/python/cpython/commit/80a50368c0e4dc9d56af0ce748dea35c9d96d23f
msg373892 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-18 09:04
I cannot detect a speedup in test_buffer, which is a heavy user of memoryviews:

# before:
>>> a = [3.742, 3.589, 3.542, 3.495, 3.481, 3.620, 3.773, 3.755, 3.701, 3.661]
>>> sum(a) / 10
3.6358999999999995

# after
>>> b = [3.63, 3.596, 3.475, 3.43, 3.792, 3.58, 3.810, 3.52, 3.55, 3.690]
>>> sum(b) / 10
3.6072999999999995



A microbenchmark shows a speedup:

# before:
$ ./python -m timeit -s "b = b'1234567890'" "memoryview(b)"
2000000 loops, best of 5: 116 nsec per loop

# after:

./python -m timeit -s "b = b'1234567890'" "memoryview(b)"
5000000 loops, best of 5: 98 nsec per loop



As the original author, I'm not sure why I should put up with the
less readable code for such a gain.  For decimal I'm using the pi
benchmark, which, while small, is at least a real math function in
pure Python. 

Do you have other benchmarks?
msg373895 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-18 11:59
I do not have other benchmarks. memoryview was just one of few builtins which still use PyArg_ParseTupleAndKeywords() and I know how inefficient it is. Since Argument Clinic was already used for memoryview.hex() I did not see problems with converting the rest of memoryview methods to Argument Clinic.

Microbenchmarks:

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "b = b'abcdefgh'" "memoryview(b)"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 187 ns +- 6 ns -> [/home/serhiy/py/cpython-release2/python] 144 ns +- 4 ns: 1.30x faster (-23%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "b = b'abcdefgh'" "memoryview(object=b)"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 312 ns +- 7 ns -> [/home/serhiy/py/cpython-release2/python] 210 ns +- 6 ns: 1.49x faster (-33%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.cast('I')"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 167 ns +- 4 ns -> [/home/serhiy/py/cpython-release2/python] 104 ns +- 2 ns: 1.60x faster (-38%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.cast(format='I')"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 331 ns +- 6 ns -> [/home/serhiy/py/cpython-release2/python] 153 ns +- 7 ns: 2.16x faster (-54%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.cast('B', (2, 2, 2))"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 229 ns +- 5 ns -> [/home/serhiy/py/cpython-release2/python] 154 ns +- 4 ns: 1.48x faster (-32%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.cast(format='B', shape=(2, 2, 2))"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 494 ns +- 11 ns -> [/home/serhiy/py/cpython-release2/python] 198 ns +- 5 ns: 2.49x faster (-60%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.tobytes()"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 90.3 ns +- 3.2 ns -> [/home/serhiy/py/cpython-release2/python] 65.8 ns +- 1.9 ns: 1.37x faster (-27%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.tobytes('C')"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 183 ns +- 6 ns -> [/home/serhiy/py/cpython-release2/python] 129 ns +- 4 ns: 1.41x faster (-29%)

$ ./python -m pyperf timeit -q --compare-to=../cpython-release/python -s "m = memoryview(b'abcdefgh')" "m.tobytes(order='C')"
Mean +- std dev: [/home/serhiy/py/cpython-release/python] 340 ns +- 11 ns -> [/home/serhiy/py/cpython-release2/python] 174 ns +- 5 ns: 1.96x faster (-49%)
msg373962 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-07-19 11:44
Thanks, I can confirm that the speedup for memoryview.cast() is large in microbenchmarks.

Still, if the speedup cannot be seen in programs like test_buffer,
the amount of code that's being added for a speedup < 5% in real
world programs is quite substantial (if you count the generated
code and the fast_call machinery).

But we can leave it in.  Preemptively, if someone sees this and
thinks the flood gates are open for _decimal:  No, it won't be
converted to AC.
History
Date User Action Args
2020-07-19 11:44:46skrahsetstatus: open -> closed

messages: + msg373962
2020-07-18 11:59:08serhiy.storchakasetmessages: + msg373895
2020-07-18 09:04:07skrahsetstatus: closed -> open

messages: + msg373892
2020-07-18 08:12:58serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-07-18 08:12:08serhiy.storchakasetmessages: + msg373887
2020-07-09 19:50:12serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request20570
2020-07-09 19:49:12serhiy.storchakasetnosy: + larry
components: + Interpreter Core, Argument Clinic, - Extension Modules
2020-07-09 19:48:54serhiy.storchakacreate