Message 75520 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	christian.heimes, gregory.p.smith, mark.dickinson, vstinner
Date	2008-11-05.10:53:48
SpamBayes Score	5.5522253e-13
Marked as misclassified	No
Message-id	<1225882491.92.0.261829844236.issue4258@psf.upfronthosting.co.za>
In-reply-to

Content
[Victor Stinner] > I saw that you choosed to use the base 2^30 for marshal. For a better > portability (be able to use .pyc generated without your patch), you > may keep the base 2^15. I implemented that in my GMP patch (manual > conversion from/to base 2^15). Agreed. I'll fix this so that the .pyc format is unchanged. Thanks! > And why 30 bits and not 31 bits, or 63 bits, or 120 bits? Mostly laziness: the change from 15 to 30 bits turned out to be extraordinarily easy. Note that the longobject.c part of the patch does almost nothing except adding a bunch of casts here and there. 31 bits would involve rewriting the powering algorithm, which assumes that PyLong_SHIFT is divisible by 5. It would gain very little over 30 bits, and if Pernici Mario's optimizations are considered (issue 3944) multiplication would likely be slower with 31-bit digits than with 30-bit digits. 63 (or 62, or 60) bits is simply too large right now: you'd need access to a hardware 64 x 64 -> 128 bit multiply to make this worth it, and I'd guess there are many fewer platforms that make this easy, though I don't really know. I know it's possible on gcc/x86_64 by making use of the (undocumented) __uint128_t type. But even where this type is available, base 263 might still be slower than base 230. I've done some experiments with multiprecision decimal arithmetic in C that showed that even on a 64-bit machine, using base 109 (i.e. approx 30 bits) was significantly faster than using base 1018. 120 bits? Does GMP even go this far? I guess part of the attraction is that it's a multiple of 8... The other obvious choices to consider would be 32 bits (or 16 bits, or 64 bits). This is possible, and may even be worth it, but it would be a much bigger effort; most of the algorithms would need to be rewritten. One problem is again the mismatch between C and assembler: detecting overflow when adding two 32-bit unsigned integers is trivial in x86 assembler, but it's not so obvious how to write portable C code that has a decent chance of being compiled optimally on a wide variety of compilers. I guess my feeling is simply that the 15-bit to 30-bit change seems incredibly easy and cheap: very little code change, and hence low risk of accidental breakage. So if there are indeed significant efficiency benefits (still to be determined) then it looks like a clear win to me. [Gregory Smith] > (i haven't reviewed your autoconf stuff yet) The changes to configure and pyconfig.h are just from rerunning autoconf and autoheader; it's only configure.in that should need looking at.

[Victor Stinner]
> I saw that you choosed to use the base 2^30 for marshal. For a better 
> portability (be able to use .pyc generated without your patch), you 
> may keep the base 2^15. I implemented that in my GMP patch (manual 
> conversion from/to base 2^15).

Agreed.  I'll fix this so that the .pyc format is unchanged. Thanks!

> And why 30 bits and not 31 bits, or 63 bits, or 120 bits?

Mostly laziness: the change from 15 to 30 bits turned out to be extraordinarily easy.  Note that the longobject.c part of 
the patch does almost nothing except adding a bunch of casts here and there.

31 bits would involve rewriting the powering algorithm, which assumes that PyLong_SHIFT is divisible by 5.  It would gain 
very little over 30 bits, and if Pernici Mario's optimizations are considered (issue 3944) multiplication would likely be 
slower with 31-bit digits than with 30-bit digits.

63 (or 62, or 60) bits is simply too large right now:  you'd need access to a hardware 64 x 64 -> 128 bit multiply to make 
this worth it, and I'd guess there are many fewer platforms that make this easy, though I don't really know.  I know it's 
possible on gcc/x86_64 by making use of the (undocumented) __uint128_t type.  But even where this type is available, base 
2**63 might still be slower than base 2**30. I've done some experiments with multiprecision *decimal* arithmetic in C that 
showed that even on a 64-bit machine, using base 10**9 (i.e. approx 30 bits) was significantly faster than using base 
10**18.

120 bits?  Does GMP even go this far?  I guess part of the attraction is that it's a multiple of 8...

The other obvious choices to consider would be 32 bits (or 16 bits, or 64 bits).  This is possible, and may even be worth 
it, but it would be a *much* bigger effort; most of the algorithms would need to be rewritten.  One problem is again the 
mismatch between C and assembler:  detecting overflow when adding two 32-bit unsigned integers is trivial in x86 assembler, 
but it's not so obvious how to write portable C code that has a decent chance of being compiled optimally on a wide variety 
of compilers. 

I guess my feeling is simply that the 15-bit to 30-bit change seems incredibly easy and cheap: very little code change, and 
hence low risk of accidental breakage.  So if there are indeed significant efficiency benefits (still to be determined) then 
it looks like a clear win to me.

[Gregory Smith]
> (i haven't reviewed your autoconf stuff yet)

The changes to configure and pyconfig.h are just from rerunning autoconf and autoheader;  it's only configure.in that should 
need looking at.

History
Date	User	Action	Args
2008-11-05 10:54:52	mark.dickinson	set	recipients: + mark.dickinson, gregory.p.smith, vstinner, christian.heimes
2008-11-05 10:54:51	mark.dickinson	set	messageid: <1225882491.92.0.261829844236.issue4258@psf.upfronthosting.co.za>
2008-11-05 10:53:51	mark.dickinson	link	issue4258 messages
2008-11-05 10:53:49	mark.dickinson	create