New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use __builtin_clzl for bits_in_digit if available #73968
Comments
Baseline performance (9e6ac83): $ ./python -m timeit "12345678 == 12345678.0" Upcoming PR: $ ./python -m timeit "12345678 == 12345678.0"
10000000 loops, best of 5: 34.3 nsec per loop
$ ./python -m timeit "1 == 1.0"
10000000 loops, best of 5: 34.4 nsec per loop
$ ./python -m timeit "(1234578987654321).bit_length()"
10000000 loops, best of 5: 36.4 nsec per loop |
Thanks for the idea, and the PR! To be useful, this would need a bit of tweaking: we can't assume that a digit is always I think it would also make sense to move the detection of the existence of We also need to trade-off the additional complication in the implementation against the benefits: though we do certainly care about the speed, speed at all costs isn't the target here; readability, portability and maintainability of the source counts, too. It'll probably be easier to weigh those factors once there's an implementation that addresses the above points. |
I think we can assume that digit is no larger than unsigned long, otherwise PyLong_AsLong() and like wouldn't work. |
True enough. It looks as though someone (cough) already documented that restriction, too: https://github.com/mdickinson/cpython/blob/v3.6.0/Include/longintrepr.h#L28-L30 |
Thanks for the review. I pushed a change to check if clz can be used ( Btw. mathmodule.c currently duplicates the function: https://github.com/python/cpython/blob/master/Modules/mathmodule.c#L1317. It might be worth factoring it out, but I don't know where to put it. |
Oops. Actually clz should commonly be enough. And platforms where clz and clzl are different (<=> unsigned int and unsigned long are different) should be rare. |
The change is an optimization, so it requires a benchmark, think you provided. Ok. But the speedup is a few nanoseconds on a function which takes currently 40 nanoseconds. It's not really what I would call significant: $ ./python -m perf compare_to ref.json patch.json --table
+---------------------+---------+-----------------------------+
| Benchmark | ref | patch |
+=====================+=========+=============================+
| int==float 8 digits | 39.2 ns | 37.9 ns: 1.03x faster (-3%) |
+---------------------+---------+-----------------------------+
| int==float 1 digit | 38.0 ns | 37.9 ns: 1.00x faster (-0%) |
+---------------------+---------+-----------------------------+
| int.bit_length() | 42.0 ns | 41.2 ns: 1.02x faster (-2%) |
+---------------------+---------+-----------------------------+ (See attached bench_bit_length.py script.) So I'm not really convinced that the change is useful. Is bit_length() used in hot loops? |
By the way, I check the bytecode to make sure that the compiler doesn't optimize that. I'm suprised that it's not replaced with True! Is there a reason to perform the test at runtime? |
There are not good reasons to optimize this case at compile time. This is very obscure way of writing True. |
I don't think a 3% improvement on a micro-benchmark is worth it (this will translate into 0% improvement on real-world workloads). Are there other uses of _Py_bit_length() that show bigger benefits? |
I concur with Antoine and suggest to reject this issue. |
In some microbenchmarks this can give up to 15%. $ ./python.patched -m perf timeit -q --compare-to=./python.default -s "a = list(map(float, range(10000)))" "12345 in a"
Mean +- std dev: [python.default] 1.28 ms +- 0.11 ms -> [python.patched] 1.12 ms +- 0.07 ms: 1.15x faster (-13%) |
But even 15% speed up in particular microbenchmarks looks too small to me for such complex change. |
Closing. |
If this issue is closed by "not a big performance improvement", maybe the XXX in mathmoudle.c should be take off? """
|
I'm in favor of documenting in comments decisions to not micro-optimize |
I reopen the issue since PR 20739 was created. |
Thanks Niklas Fiekas for your tenacity :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: