Issue21420
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-05-02 21:54 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
pow2.patch | vstinner, 2014-05-02 21:54 | review | ||
bench_pow2.py | vstinner, 2014-05-02 21:54 | |||
pow2_specialized.patch | vstinner, 2014-05-03 12:45 | review |
Messages (8) | |||
---|---|---|---|
msg217800 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2014-05-02 21:54 | |
The algorithm for 2 ** n (long_pow) is slower than 1 << n algorithm (long_lshift). I propose to compute x**y as 1<<y if x==2 and y >= 0. Attached patch implements this idea. According to my microbenchmark, it is 4x faster to small power (2**0 .. 2**1024) and up to 340x faster for large power (2**(2**28)). |
|||
msg217801 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2014-05-02 21:54 | |
Results of bench_pow2.py on Linux. Common platform: CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug Timer: time.perf_counter Python unicode implementation: PEP 393 Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Bits: int=32, long=64, long long=64, size_t=64, void*=64 Platform of campaign orig: Date: 2014-05-02 23:45:29 SCM: hg revision=62438d1b11c7 tag=tip branch=default date="2014-05-02 23:26 +0200" Timer precision: 49 ns Python version: 3.5.0a0 (default:62438d1b11c7, May 2 2014, 23:45:22) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] Platform of campaign pow2: Python version: 3.5.0a0 (default:62438d1b11c7+, May 2 2014, 23:47:50) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] SCM: hg revision=62438d1b11c7+ tag=tip branch=default date="2014-05-02 23:26 +0200" Timer precision: 51 ns Date: 2014-05-02 23:48:03 ---------------+--------------+---------------- Tests | orig | pow2 ---------------+--------------+---------------- 2 ** 0 | 40 ns (*) | 44 ns (+9%) 2 ** 1 | 264 ns (*) | 62 ns (-76%) 2 ** 3 | 270 ns (*) | 62 ns (-77%) 2 ** 5 | 269 ns (*) | 62 ns (-77%) 2 ** 10 | 278 ns (*) | 63 ns (-78%) 2 ** 15 | 299 ns (*) | 62 ns (-79%) 2 ** 20 | 287 ns (*) | 62 ns (-78%) 2 ** 25 | 302 ns (*) | 62 ns (-79%) 2 ** 28 | 293 ns (*) | 62 ns (-79%) 2 ** (2 ** 0) | 264 ns (*) | 62 ns (-77%) 2 ** (2 ** 1) | 264 ns (*) | 62 ns (-76%) 2 ** (2 ** 3) | 264 ns (*) | 62 ns (-77%) 2 ** (2 ** 5) | 286 ns (*) | 68 ns (-76%) 2 ** (2 ** 10) | 740 ns (*) | 68 ns (-91%) 2 ** (2 ** 15) | 93.1 us (*) | 180 ns (-100%) 2 ** (2 ** 20) | 3.94 ms (*) | 3.89 us (-100%) 2 ** (2 ** 25) | 154 ms (*) | 213 us (-100%) 2 ** (2 ** 28) | 1.36 sec (*) | 4.01 ms (-100%) ---------------+--------------+---------------- Total | 1.52 sec (*) | 4.23 ms (-100%) ---------------+--------------+---------------- |
|||
msg217802 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2014-05-02 21:57 | |
Oh, the patch implements another micro-optimization: "(x << 0) is x" becomes True. I included it in the same patch to reduce the overhead for 2 ** 0. |
|||
msg217820 - (view) | Author: Stefan Behnel (scoder) * ![]() |
Date: 2014-05-03 10:17 | |
LGTM, can't see a case where this might go wrong (errors and type checks are handled before the added code). It also seems a sufficiently common case to optimise it internally. The 2**n spelling is easier to read and to get right than the shifting, so it's good to make both similarly fast to remove an excuse for micro optimisations in Python code. |
|||
msg217823 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2014-05-03 12:45 | |
pow2_specialized.patch: here is a different approach, a little bit faster. I wrote a dedicated long_pow2() function which computes 2**k. Arguments in favor of pow2_specialized.patch instead of pow2.patch: - it's a little bit faster - don't touch long_lshift(), so there is no risk of performance regression - it becomes easier to implement #21419 optimization: use calloc() instead of malloc() for very large power of 2, to avoid allocating physical memory pages for zeros Results of bench_pow2.py comparing original Python, pow2.pathc and pow2_specialized.patch: Common platform: Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) Python unicode implementation: PEP 393 Bits: int=32, long=64, long long=64, size_t=64, void*=64 Timer: time.perf_counter CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Platform of campaign orig: SCM: hg revision=62438d1b11c7 tag=tip branch=default date="2014-05-02 23:26 +0200" Date: 2014-05-03 14:42:56 Timer precision: 46 ns Python version: 3.5.0a0 (default:62438d1b11c7, May 3 2014, 14:41:50) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] Platform of campaign pow2: SCM: hg revision=62438d1b11c7+ tag=tip branch=default date="2014-05-02 23:26 +0200" Python version: 3.5.0a0 (default:62438d1b11c7+, May 3 2014, 14:41:13) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] Date: 2014-05-03 14:41:21 Timer precision: 51 ns Platform of campaign pow2_specialized: SCM: hg revision=62438d1b11c7+ tag=tip branch=default date="2014-05-02 23:26 +0200" Date: 2014-05-03 14:40:26 Python version: 3.5.0a0 (default:62438d1b11c7+, May 3 2014, 14:40:18) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] Timer precision: 46 ns ---------------+--------------------+--------------+----------------- Tests | orig | pow2 | pow2_specialized ---------------+--------------------+--------------+----------------- 2 ** 0 | 39 ns (*) | 44 ns (+11%) | 41 ns 2 ** 1 | 264 ns (+531%) | 62 ns (+48%) | 42 ns (*) 2 ** 3 | 269 ns (+554%) | 62 ns (+50%) | 41 ns (*) 2 ** 5 | 269 ns (+560%) | 62 ns (+51%) | 41 ns (*) 2 ** 10 | 279 ns (+410%) | 63 ns (+15%) | 55 ns (*) 2 ** 15 | 299 ns (+460%) | 62 ns (+16%) | 53 ns (*) 2 ** 20 | 287 ns (+426%) | 62 ns (+14%) | 55 ns (*) 2 ** 25 | 302 ns (+466%) | 62 ns (+16%) | 53 ns (*) 2 ** 28 | 293 ns (+438%) | 62 ns (+14%) | 55 ns (*) 2 ** (2 ** 0) | 264 ns (+549%) | 62 ns (+52%) | 41 ns (*) 2 ** (2 ** 1) | 264 ns (+524%) | 62 ns (+48%) | 42 ns (*) 2 ** (2 ** 3) | 263 ns (+548%) | 62 ns (+52%) | 41 ns (*) 2 ** (2 ** 5) | 283 ns (+346%) | 68 ns (+8%) | 63 ns (*) 2 ** (2 ** 10) | 715 ns (+950%) | 68 ns (*) | 69 ns 2 ** (2 ** 15) | 89.6 us (+51109%) | 175 ns (*) | 177 ns 2 ** (2 ** 20) | 3.89 ms (+100603%) | 3.86 us (*) | 3.86 us 2 ** (2 ** 25) | 152 ms (+66557%) | 240 us (+5%) | 228 us (*) 2 ** (2 ** 28) | 1.35 sec (+33851%) | 3.99 ms (*) | 3.99 ms ---------------+--------------------+--------------+----------------- Total | 1.51 sec (+35664%) | 4.23 ms | 4.22 ms (*) ---------------+--------------------+--------------+----------------- |
|||
msg217863 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2014-05-04 09:34 | |
Victor, can you demonstrate any cases of real code where this optimization makes a significant difference? There are many, many tiny optimisations we *could* be making in Objects/longobject.c; each of those potential optimisations adds to the cost of maintaining the code, detracts from readability, and potentially even slows down the common cases fractionally. In general, I think we should only be applying this sort of optimization when there's a clear benefit to real-world code. I don't think this one crosses that line. In the (I suspect rare) cases where a piece of real-world code is slowed down significantly due to a non-optimized 2**n, the code author still has the option of replacing that piece of code with 1<<n manually. And in some cases, that's probably the wrong optimization anyway: an expression like `x * 2**n` would be better hand-optimized to `x << n`. IOW, I'm -1 on making this change. |
|||
msg217897 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2014-05-04 22:03 | |
I agree with Mark, there doesn't seem to be a strong point in favour of this optimization. |
|||
msg218360 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2014-05-12 20:46 | |
My main motivation was to use calloc() (issue #21419) for 2**n. It looks like using calloc() for this is only useful for very rare cases (very large value of n: result larger than 1 MB). I close the issue. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:03 | admin | set | github: 65619 |
2014-05-12 20:46:21 | vstinner | set | status: open -> closed resolution: rejected messages: + msg218360 |
2014-05-11 12:01:29 | Arfrever | set | nosy:
+ Arfrever |
2014-05-04 22:03:15 | pitrou | set | messages: + msg217897 |
2014-05-04 09:34:29 | mark.dickinson | set | messages: + msg217863 |
2014-05-03 12:45:15 | vstinner | set | files:
+ pow2_specialized.patch messages: + msg217823 |
2014-05-03 10:17:54 | scoder | set | nosy:
+ scoder messages: + msg217820 |
2014-05-02 21:57:58 | vstinner | set | nosy:
+ mark.dickinson |
2014-05-02 21:57:35 | vstinner | set | nosy:
+ pitrou, serhiy.storchaka, josh.r messages: + msg217802 |
2014-05-02 21:54:54 | vstinner | set | files:
+ bench_pow2.py messages: + msg217801 |
2014-05-02 21:54:22 | vstinner | create |