This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients gregory.p.smith
Date 2022-01-16.23:55:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1642377301.36.0.261547807602.issue46406@roundup.psfhosted.org>
In-reply-to
Content
The PR was directly inspired by Mark Dickinson's code in the email thread directly using __asm__ to get the instruction he wanted.  There is usually a way to make the compiler actually do what you intend.  This appears to be it.

Interestingly, experimenting with small code snippets rather than the entire longobject.c on gotbolt.org to check various compilers output does not always yield as nice of a result.  (clang 11+ showed promise there, but this change benefits gcc equally as well in real world CPython microbenchmark timeit tests).  https://godbolt.org/z/63eWPczjx was my playground code.

```
$ ./b-clang13/python -m timeit -n 1500000 -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'
1500000 loops, best of 5: 450 nsec per loop
$ ./b-clang13-new-basic-divrem1/python -m timeit -n 1500000 -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'
1500000 loops, best of 5: 375 nsec per loop
$ ./b-gcc9/python -m timeit -n 1500000 -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'
1500000 loops, best of 5: 448 nsec per loop
$ ./b-gcc9-new-basic-divrem1/python -m timeit -n 1500000 -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'
1500000 loops, best of 5: 370 nsec per loop
```

That's on an AMD zen3 (x86_64).  Also tested with other divisors, 17 is not specialized by the compiler.  These were not --enable-optimizations builds, though the results remain similar on those for non-specialized values as x//10 turns into when using -fprofile-values on gcc9.

Performance tests using other architectures forthcoming.

A pyperformance suite run on a benchmark-stable host is worthwhile. I don't actually expect this to show up as significant in most things there; we'll see.

The new code is not any more difficult to maintain than the previous code regardless.
History
Date User Action Args
2022-01-16 23:55:01gregory.p.smithsetrecipients: + gregory.p.smith
2022-01-16 23:55:01gregory.p.smithsetmessageid: <1642377301.36.0.261547807602.issue46406@roundup.psfhosted.org>
2022-01-16 23:55:01gregory.p.smithlinkissue46406 messages
2022-01-16 23:55:01gregory.p.smithcreate