Issue 27298: redundant iteration over digits in _PyLong_AsUnsignedLongMask

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71485

classification

Title:	redundant iteration over digits in _PyLong_AsUnsignedLongMask
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Oren Milman, mark.dickinson, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2016-06-11 21:20 by Oren Milman, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
proposedPatches.diff	Oren Milman, 2016-06-11 21:20	proposed patches diff file	review
patchedCPythonTestOutput.txt	Oren Milman, 2016-06-11 21:20	test output of CPython with my patches (tested on my PC)
CPythonTestOutput.txt	Oren Milman, 2016-06-11 21:21	test output of CPython without my patches (tested on my PC)
capiWrapperModule.c	Oren Milman, 2016-06-17 12:35	a simple C extension I used for micro-benchmarking
badMicroBenchProposedPatches.diff	Oren Milman, 2016-06-17 12:36	diff file of the proposed patches with the (overall) bad micro-benchmarks results	review
proposedPatches.diff	Oren Milman, 2016-06-17 12:37	proposed patches in Modules/_testcapimodule diff file	review

Pull Requests
URL	Status	Linked	Edit
PR 392	merged	Oren Milman, 2017-03-02 09:06

Messages (8)
msg268274 - (view)	Author: Oren Milman (Oren Milman) *	Date: 2016-06-11 21:20
------------ current state ------------ 1. In Objects/longobject.c in _PyLong_AsUnsignedLongMask, in case v is a multiple-digit int, _PyLong_AsUnsignedLongMask iterates over all of its digits (going from the most to the least significant digit) and does (for each digit) 'x = (x << PyLong_SHIFT) \| v->ob_digit[i];'. However, iterating over digits other than the 'ceil(sizeof(x) * 8.0 / PyLong_SHIFT)' least significant digits is redundant, because their bits would be shifted out of x anyway. With regard to relevant changes made in the past, the iteration over all of the digits of a multiple-digit v remained quite the same since _PyLong_AsUnsignedLongMask was added, in revision 28652. 2. In Modules/_testcapimodule.c, the function comment of test_k_code, and an error string inside that function, contain mistakes. With regard to relevant changes made in the past, these mistakes were there since test_k_code was added, in revision 28652. ------------ proposed changes ------------ 1. In _PyLong_AsUnsignedLongMask, in case v is a multiple-digit int, iterate only over the 'Py_MIN(i, sizeof(x) * 8 / PyLong_SHIFT + 1)' least significant digits. Obviously, 'sizeof(x) * 8 / PyLong_SHIFT + 1' might be off by one in case CPython is compiled with a PyLong_SHIFT other than 15 or 30. I suspect it's an overkill, but I added an assert, just in case. 2. Fix the function comment of test_k_code, and an error string inside that function. ------------ alternative changes ------------ 1. I have also considered iterating only over the 'Py_MIN(i, (Py_ssize_t)ceil(sizeof(x) * 8.0 / PyLong_SHIFT))' least significant digits. Even though this number of digits is guaranteed to be accurate, it cannot be calculated at compile time, so it would probably cause the optimization to overall introduce a performance penalty. ------------ diff ------------ The proposed patches diff file is attached. ------------ tests ------------ I built the patched CPython for x86, and played with it a little. Everything seemed to work as usual. In addition, I ran 'python_d.exe -m test -j3' (on my 64-bit Windows 10) with and without the patches, and got quite the same output. The outputs of both runs are attached.
msg268581 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-06-14 19:33
The accurate number of digits is (sizeof(x) * 8 + PyLong_SHIFT - 1) / PyLong_SHIFT or (sizeof(x) * 8 - 1) / PyLong_SHIFT + 1. There is a similar code in _PyLong_AsUnsignedLongLongMask(). Changes of Modules/_testcapimodule.c look not related to the implementation of _PyLong_AsUnsignedLongMask. Needed some results of microbenchmarking. The patch speeds up _PyLong_AsUnsignedLongLongMask() for very large integers, but in most cases this function is called for not such larger integers, and the patch adds additional check. We should ensure that this doesn't slow down the function.
msg268586 - (view)	Author: Oren Milman (Oren Milman) *	Date: 2016-06-14 20:44
Ah, that's a cool alternative to divide and ceil. I would change my patch accordingly. And would write the patch also for _PyLong_AsUnsignedLongLongMask, and work on some micro-benchmarking for it and _PyLong_AsUnsignedLongMask. Indeed _testcapimodule.c is not really related. Should I open another issue for that only? (I am often worried about opening many small issues.. Please let me know if I shouldn't be.) Now that you mention it, similar 'while (--i >= 0)' can be found in some other functions: 1. PyLong_AsLongAndOverflow 2. PyLong_AsSsize_t 3. PyLong_AsUnsignedLong 4. PyLong_AsSize_t 5. PyLong_AsLongLongAndOverflow I suspect we could optimize these 5 by first determining whether the Python int has too many bits, instead of shifting and checking for an overflow on each iteration. I would definitely give it a try when I am done with _PyLong_AsUnsignedLongLongMask and _PyLong_AsUnsignedLongMask :)
msg268721 - (view)	Author: Oren Milman (Oren Milman) *	Date: 2016-06-17 12:35
I did some micro-benchmarking, and it looks like bad news for my patch. I wrote a simple C extension in order to call PyLong_AsUnsignedLongMask and PyLong_AsUnsignedLongLongMask from Python code (attached). Then I ran the following micro-benchmarks using both the default CPython branch and my patched CPython: 1. small ints: python.exe -m timeit -s "import capiWrapper" "print('bug') if any(i != capiWrapper.asUnsignedLongMaskWrapper(i) for i in range(10 6)) else None" python.exe -m timeit -s "import capiWrapper" "print('bug') if any(i != capiWrapper.asUnsignedLongLongMaskWrapper(i) for i in range(10 6)) else None" with the following results: asUnsignedLongMaskWrapper: default: 312 msec, patched: 353 msec asUnsignedLongLongMaskWrapper: default: 319 msec, patched: 356 msec 2. big ints: python.exe -m timeit -s "import capiWrapper; bigInt = 10 1000" "print('bug') if any((i & 0xffffffff) != capiWrapper.asUnsignedLongMaskWrapper(i) for i in range(bigInt, bigInt + 10000)) else None" python.exe -m timeit -s "import capiWrapper; bigInt = 10 1000" "print('bug') if any((i & 0xffffffffffffffff) != capiWrapper.asUnsignedLongLongMaskWrapper(i) for i in range(bigInt, bigInt + 10000)) else None" with the following results: when bigInt = 10 1000: asUnsignedLongMaskWrapper: default: 23.1 msec, patched: 21.5 msec asUnsignedLongLongMaskWrapper: default: 24.1 msec, patched: 21.7 msec when bigInt = 10 150: asUnsignedLongMaskWrapper: default: 7.72 msec, patched: 7.82 msec asUnsignedLongLongMaskWrapper: default: 8.03 msec, patched: 7.99 msec To sum it up, my patch degrades performance for ints smaller than (approximately) 10 ** 150, and improves performance for bigger ints. Seems to me like it doesn't worth it. I attached the patches diff file anyway, for the sake of documentation. ========================================================================== That leaves us with the proposed changes in Modules/_testcapimodule.c. The diff file for these (hopefully my final diff file for this issue) is also attached.
msg288788 - (view)	Author: Oren Milman (Oren Milman) *	Date: 2017-03-02 09:13
I created a pull request (https://github.com/python/cpython/pull/392) to fix the mistakes in _testcapimodule, but didn't mention this issue in the pull request's title, as the issue mentioned these mistakes only as a side note.
msg288791 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-02 10:02
> To sum it up, my patch degrades performance for ints smaller than (approximately) 10 ** 150, and improves performance for bigger ints. IMHO Python language mostly handles integers smaller than 1 million, so the optimization is not a good idea :-) I had the same issue when I tried to modify Python to use GMP internally for long integers.
msg288796 - (view)	Author: Oren Milman (Oren Milman) *	Date: 2017-03-02 11:42
the pull request was merged, so I guess we can close this issue..
msg288798 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-03-02 11:53
Ok, thanks.

History
Date	User	Action	Args
2022-04-11 14:58:32	admin	set	github: 71485
2017-03-02 11:53:27	vstinner	set	status: open -> closed resolution: fixed messages: + msg288798 stage: resolved
2017-03-02 11:42:07	Oren Milman	set	messages: + msg288796
2017-03-02 10:02:55	vstinner	set	nosy: + vstinner messages: + msg288791
2017-03-02 09:13:22	Oren Milman	set	messages: + msg288788
2017-03-02 09:06:25	Oren Milman	set	pull_requests: + pull_request323
2016-09-23 07:44:26	Oren Milman	set	versions: + Python 3.7, - Python 3.6
2016-06-17 12:37:39	Oren Milman	set	files: + proposedPatches.diff
2016-06-17 12:36:39	Oren Milman	set	files: + badMicroBenchProposedPatches.diff
2016-06-17 12:35:25	Oren Milman	set	files: + capiWrapperModule.c messages: + msg268721
2016-06-14 20:44:50	Oren Milman	set	messages: + msg268586
2016-06-14 19:33:08	serhiy.storchaka	set	messages: + msg268581 versions: + Python 3.6
2016-06-11 21:21:49	Oren Milman	set	files: + CPythonTestOutput.txt
2016-06-11 21:21:11	serhiy.storchaka	set	nosy: + mark.dickinson, serhiy.storchaka
2016-06-11 21:20:55	Oren Milman	set	files: + patchedCPythonTestOutput.txt
2016-06-11 21:20:13	Oren Milman	create