[Greg]
> It would seem to me that a more suitable implementation would be to
> store numbers as 32*k-bit 2's complement integers; I've used this in
> C++ libraries. Compared to the rad-15 sign/magnitude approach, it may
> seem trickier to do carries but in practice it's not that big a deal.
Do you know of any publicly available code that takes this approach,
while remaining simple and portable?
For 32-bit limbs (on a 32-bit platform, say, with no C99 support and no
uint64_t available), how *do* you deal with carries? You can write
portable C89 that does the correct thing, but you then have to cross
your fingers and hope that the compiler can turn it into sensible
assembler, and it often can't (I'll post an example of this that I ran
into just yesterday in a second). And even if your compiler can
optimize this effectively, what about all the other compilers out there?
The alternative is to use inline assembler for selected platforms; at
that point, you lose simplicity.
The sign-magnitude versus two's complement is an orthogonal issue, I
think. I'm not convinced of the value of two's complement. Certainly
two's complement would be more convenient for the bitwise operations
under discussion, and probably also works well for addition and
subtraction. But it's less convenient for negation, multiplication,
division, power, conversion to and from human-readable strings, etc.
It's worth noting that GMP uses sign-magnitude, so I can't believe there
could be much performance gain in moving to two's complement.
(Actually, I'm not sure I've seen any arbitrary-precision arithmetic
package that uses two's complement. Has anyone here?)
Here's the example promised earlier: yesterday I wanted to add two 128-
bit unsigned integers a and b, each represented as a pair of type
uint64_t integers, on a 64-bit machine. In portable C99, the code looks
something like:
#include <stdint.h>
/* *sumhigh:*sumlow = ahigh:alow + bhigh:blow */
void
add_128(uint64_t *sumhigh, uint64_t *sumlow,
uint64_t ahigh, uint64_t alow,
uint64_t bhigh, uint64_t blow)
{
alow += blow;
ahigh += bhigh + (alow < blow);
*sumlow = alow;
*sumhigh = ahigh;
}
Ideally, the compiler would manage to optimize this to a simple 'addq,
adcq' pair of instructions. But gcc-4.4 -O3, on OS X 10.6/x86_64 gives
me:
_add_128:
LFB0:
leaq (%r9,%rcx), %rcx
xorl %eax, %eax
leaq (%r8,%rdx), %rdx
pushq %rbp
LCFI0:
cmpq %r9, %rcx
movq %rcx, (%rsi)
setb %al
movq %rsp, %rbp
LCFI1:
addq %rax, %rdx
movq %rdx, (%rdi)
leave
ret
(Here it looks like alow and blow are in r9 and rcx, ahigh and bhigh are
in r8 and rdx, and rsi and rdi are sumlow and sumhigh.)
How do you write the C code in such a way that gcc produces the right
result? I eventually gave up and just put the assembler in directly. |