classification
Title: Drop support for 15-bit PyLong digits?
Type: Stage: patch review
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, scoder, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2021-10-22 10:21 by mark.dickinson, last changed 2022-01-14 18:55 by mark.dickinson.

Pull Requests
URL Status Linked Edit
PR 30306 closed mark.dickinson, 2021-12-30 19:54
PR 30497 merged mark.dickinson, 2022-01-09 11:28
Messages (11)
msg404746 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-10-22 10:21
Looking at issue #35037 (which is a compatibility issue having to do with PYLONG_BITS_IN_DIGIT), I'm wondering whether it would make sense to drop support for 15-bit PyLong digits altogether. This would simplify some of the code, eliminate a configuration option, and eliminate the scope for ABI mismatches like that occurring in #35037.

There were a couple of reasons that we kept support for 15-bit digits when 30-bit digits were introduced, back in #4258.

- At the time, we wanted to use `long long` for the `twodigits` type with 30-bit digits, and we couldn't guarantee that all platforms we cared about would have `long long` or another 64-bit integer type available.

- It wasn't clear whether there were systems where using 30-bit digits in place of 15-bit digits might cause a performance regression.

Now that we can safely rely on C99 support on all platforms we care about, the existence of a 64-bit integer type isn't an issue (and indeed, we already rely on the existence of such a type in dtoa.c and elsewhere in the codebase).

As to performance, I doubt that there are still platforms where using 15-bit digits gives a performance advantage, but I admit I haven't checked.
msg404794 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-10-22 16:45
+1 "in theory". But I too don't know whether any platforms would be adversely affected, or how to find out :-(
msg404852 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-10-23 05:52
How about: create a fake test file test/test_xintperf with test case and test method(s) that run timeit with a suite of int operations and print report.  Create 'draft' PRs for main and 3.10 (and 3.9?).  Run just this test on buildbots.  (I believe I have seen this done.)

Improvement: write script to find buildbot results and consolidate.

Is there already a template for something like this?  If not, make one.
msg409365 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-12-30 12:45
I posted a request for information on usage of 15-bit digits to python-dev: https://mail.python.org/archives/list/python-dev@python.org/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/
msg409385 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-12-30 19:57
Terry:

> create a fake test file test/test_xintperf [...]

Sounds reasonable, though I'm not sure I know what exact timings I'd want to try. Maybe some of the stock integer-heavy Python benchmarks (pidigits, etc.).

I realised that I have no idea whether any of the buildbots actually use 15-bit digits. I've created PR GH-30306 to find out.
msg409415 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-12-31 11:32
> I've created PR GH-30306 to find out.

Results: we have two Gentoo/x86 buildbots, and a 32-bit Windows build in GitHub Actions: those machines use 15-bit digits, as a result of the logic in pyport.h that chooses 15-bit digits if SIZEOF_VOID_P < 8. Everything else seems to be using 30-bit digits.

GPS pointed out in the python-dev discussion that the other platform we should be thinking about is 32-bit ARM.
msg410139 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-09 11:29
First step in GH-30497, which changes the default to 30-bit digits unconditionally, instead of having the default be platform dependent.
msg410421 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-12 18:44
Adding Stefan Behnel to the nosy, since Cython is one of the few projects that might be directly affected by this change. Stefan: can you see any potential problems with changing the default to 30 here?
msg410425 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2022-01-12 19:51
Cython should be happy with whatever CPython uses (as long as CPython's header files agree with CPython's build ;-) ).

I saw the RasPi benchmarks on the ML. That would have been my suggested trial platform as well.
https://mail.python.org/archives/list/python-dev@python.org/message/5RJGI6THWCDYTTEPXMWXU7CK66RQUTD4/

The results look ok. Maybe the slowdown for pickling is really the increased data size of integers. And it's visible that some compute-heavily benchmarks like pyaes did get a little slower. I doubt that they represent a real use case on such a platform, though. Doing any kind of number crunching on a RasPi without NumPy would appear like a rather strange adventure.

That said, if we decide to keep 15-bit digits in the end, I wonder if "SIZEOF_VOID_P" is the right decision point. It seems more of a "has reasonably fast 64-bit multiply or not" kind of decision – however that translates into code. I'm sure there are 32-bit platforms that would actually benefit from 30-bit digits today.

If we find a platform that would be fine with 30-bits but lacks a fast 64-bit multiply, then we could still try to add a platform specific value size check for smaller numbers. Since those are common case, branch prediction might help us more often than not.

But then, I wonder how much complexity this is even worth, given that the goal is to reduce the complexity. Platform maintainers can still decide to configure the digit size externally for the time being, if it makes a difference for them. Maybe switching off 15-bits by default is just good enough for the next couple of years to come. :)
msg410511 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-13 19:17
Thanks, Stefan. I think I'm going to go ahead with the first step of making 30-bit digits the default, then, but leaving the 15-bit digit option present.

> That said, if we decide to keep 15-bit digits in the end, I wonder if "SIZEOF_VOID_P" is the right decision point. It seems more of a "has reasonably fast 64-bit multiply or not" kind of decision

Agreed. And most platforms we care about _do_ seem to have such an instruction, so "30-bit digits unless the builder explicitly indicates otherwise - e.g., via configure options or pyconfig.h edits" seems reasonable.

My other worry is division. It's less important than multiplication in the sense that I'd expect division operations to be rarer than multiplications in typical code, but the potential impact for code that _does_ make heavy use of division is greater. With 30-bit digits, all the longobject.c source actually *needs* is a 64-bit-by-32-bit unsigned division for cases where the result is guaranteed to fit in a uint32_t. If we're on x86, there's an instruction for that (DIVL), so you'd think that we'd be fine. But without using inline assembly, it seems impossible to persuade current versions of either of GCC or Clang[*] to generate that DIVL instruction - instead, they both want to do a 64-bit-by-64-bit division, and on x86 that involves making a call to a dedicated __udivti3 intrinsic, which is potentially multiple times slower than a simple DIVL.

The division problem affects x64 as well: GCC and Clang currently generate a DIVQ instruction when all we need is a DIVL.

> If we find a platform that would be fine with 30-bits but lacks a fast 64-bit multiply, then we could still try to add a platform specific value size check for smaller numbers. Since those are common case, branch prediction might help us more often than not.

Yes, I think that could work, both for multiplication and division.

[*] Visual Studio 2019 _does_ apparently provide a _udiv64 intrinsic, which we should possibly be attempting to use: https://docs.microsoft.com/en-us/cpp/intrinsics/udiv64?view=msvc-170
msg410589 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-14 18:55
New changeset 025cbe7a9b5d3058ce2eb8015d3650e396004545 by Mark Dickinson in branch 'main':
bpo-45569: Change PYLONG_BITS_IN_DIGIT default to 30 (GH-30497)
https://github.com/python/cpython/commit/025cbe7a9b5d3058ce2eb8015d3650e396004545
History
Date User Action Args
2022-01-14 18:55:08mark.dickinsonsetmessages: + msg410589
2022-01-13 19:17:01mark.dickinsonsetmessages: + msg410511
2022-01-12 19:51:45scodersetmessages: + msg410425
2022-01-12 18:44:49mark.dickinsonsetnosy: + scoder
messages: + msg410421
2022-01-09 11:29:38mark.dickinsonsetmessages: + msg410139
2022-01-09 11:28:56mark.dickinsonsetstage: patch review
pull_requests: + pull_request28703
2021-12-31 11:32:42mark.dickinsonsetmessages: + msg409415
2021-12-30 19:57:22mark.dickinsonsetmessages: + msg409385
stage: patch review -> (no value)
2021-12-30 19:54:47mark.dickinsonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request28519
2021-12-30 12:45:11mark.dickinsonsetmessages: + msg409365
2021-10-23 05:52:58terry.reedysetnosy: + terry.reedy
messages: + msg404852
2021-10-22 16:45:19tim.peterssetnosy: + tim.peters
messages: + msg404794
2021-10-22 10:21:47mark.dickinsoncreate