Author mark.dickinson
Recipients mark.dickinson, scoder, terry.reedy, tim.peters
Date 2022-01-13.19:17:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1642101421.73.0.930037420695.issue45569@roundup.psfhosted.org>
In-reply-to
Content
Thanks, Stefan. I think I'm going to go ahead with the first step of making 30-bit digits the default, then, but leaving the 15-bit digit option present.

> That said, if we decide to keep 15-bit digits in the end, I wonder if "SIZEOF_VOID_P" is the right decision point. It seems more of a "has reasonably fast 64-bit multiply or not" kind of decision

Agreed. And most platforms we care about _do_ seem to have such an instruction, so "30-bit digits unless the builder explicitly indicates otherwise - e.g., via configure options or pyconfig.h edits" seems reasonable.

My other worry is division. It's less important than multiplication in the sense that I'd expect division operations to be rarer than multiplications in typical code, but the potential impact for code that _does_ make heavy use of division is greater. With 30-bit digits, all the longobject.c source actually *needs* is a 64-bit-by-32-bit unsigned division for cases where the result is guaranteed to fit in a uint32_t. If we're on x86, there's an instruction for that (DIVL), so you'd think that we'd be fine. But without using inline assembly, it seems impossible to persuade current versions of either of GCC or Clang[*] to generate that DIVL instruction - instead, they both want to do a 64-bit-by-64-bit division, and on x86 that involves making a call to a dedicated __udivti3 intrinsic, which is potentially multiple times slower than a simple DIVL.

The division problem affects x64 as well: GCC and Clang currently generate a DIVQ instruction when all we need is a DIVL.

> If we find a platform that would be fine with 30-bits but lacks a fast 64-bit multiply, then we could still try to add a platform specific value size check for smaller numbers. Since those are common case, branch prediction might help us more often than not.

Yes, I think that could work, both for multiplication and division.

[*] Visual Studio 2019 _does_ apparently provide a _udiv64 intrinsic, which we should possibly be attempting to use: https://docs.microsoft.com/en-us/cpp/intrinsics/udiv64?view=msvc-170
History
Date User Action Args
2022-01-13 19:17:01mark.dickinsonsetrecipients: + mark.dickinson, tim.peters, terry.reedy, scoder
2022-01-13 19:17:01mark.dickinsonsetmessageid: <1642101421.73.0.930037420695.issue45569@roundup.psfhosted.org>
2022-01-13 19:17:01mark.dickinsonlinkissue45569 messages
2022-01-13 19:17:01mark.dickinsoncreate