[Victor Stinner]
> I saw that you choosed to use the base 2^30 for marshal. For a better
> portability (be able to use .pyc generated without your patch), you
> may keep the base 2^15. I implemented that in my GMP patch (manual
> conversion from/to base 2^15).
Agreed. I'll fix this so that the .pyc format is unchanged. Thanks!
> And why 30 bits and not 31 bits, or 63 bits, or 120 bits?
Mostly laziness: the change from 15 to 30 bits turned out to be extraordinarily easy. Note that the longobject.c part of
the patch does almost nothing except adding a bunch of casts here and there.
31 bits would involve rewriting the powering algorithm, which assumes that PyLong_SHIFT is divisible by 5. It would gain
very little over 30 bits, and if Pernici Mario's optimizations are considered (issue 3944) multiplication would likely be
slower with 31-bit digits than with 30-bit digits.
63 (or 62, or 60) bits is simply too large right now: you'd need access to a hardware 64 x 64 -> 128 bit multiply to make
this worth it, and I'd guess there are many fewer platforms that make this easy, though I don't really know. I know it's
possible on gcc/x86_64 by making use of the (undocumented) __uint128_t type. But even where this type is available, base
2**63 might still be slower than base 2**30. I've done some experiments with multiprecision *decimal* arithmetic in C that
showed that even on a 64-bit machine, using base 10**9 (i.e. approx 30 bits) was significantly faster than using base
10**18.
120 bits? Does GMP even go this far? I guess part of the attraction is that it's a multiple of 8...
The other obvious choices to consider would be 32 bits (or 16 bits, or 64 bits). This is possible, and may even be worth
it, but it would be a *much* bigger effort; most of the algorithms would need to be rewritten. One problem is again the
mismatch between C and assembler: detecting overflow when adding two 32-bit unsigned integers is trivial in x86 assembler,
but it's not so obvious how to write portable C code that has a decent chance of being compiled optimally on a wide variety
of compilers.
I guess my feeling is simply that the 15-bit to 30-bit change seems incredibly easy and cheap: very little code change, and
hence low risk of accidental breakage. So if there are indeed significant efficiency benefits (still to be determined) then
it looks like a clear win to me.
[Gregory Smith]
> (i haven't reviewed your autoconf stuff yet)
The changes to configure and pyconfig.h are just from rerunning autoconf and autoheader; it's only configure.in that should
need looking at. |