Author mark.dickinson
Recipients alexandre.vassalotti, amaury.forgeotdarc, christian.heimes, eric.smith, gvanrossum, jaredgrubb, mark.dickinson, nascheme, noam, pitrou, preston, rhettinger, tim.peters
Date 2009-04-19.08:44:14
SpamBayes Score 0.0
Marked as misclassified No
Message-id <>
> Is there a way to use SSE when available and x86 when it's not.

I guess it's possible in theory, but I don't know of any way to do this in 
practice.  I suppose one could trap the SIGILL generated by the attempted 
use of an SSE2 instruction on a non-supported platform---is this how 
things used to work for 386s without the 387?  That would make a sequence 
of floating-point instructions on non-SSE2 x86 horribly slow, though.

Antoine: as Raymond said, the advantage of SSE2 for numeric work is 
accuracy, predictability, and consistency across platforms.  The SSE2 
instructions finally put an end to all the problems arising from the 
mismatch between the precision of the x87 floating-point registers (64-
bits) and the precision of a C double (53-bits).  Those difficulties 
include (1) unpredictable rounding of intermediate values from 64-bit 
precision to 53-bit precision, due to spilling of temporaries from FPU 
registers to memory, and (2) double-rounding.  The arithmetic of Python 
itself is largely immune to the former, but not the latter.  (And of 
course the register spilling still causes headaches for various bits of 

Those difficulties can be *mostly* dealt with by setting the x87 rounding 
precision to double (instead of extended), though this doesn't fix the 
exponent range, so one still ends up with double-rounding on underflow.  
The catch is that one can't mess with the x87 state globally, as various 
library functions (especially libm functions) might depend on it being in whatever the OS considers to be the default state.

There's a very nice paper by David Monniaux that covers all this:  
definitely recommended reading after you've finished Goldberg's "What 
Every Computer Scientist...".  It can currently be found at:

An example: in Python (any version), try this:

>>> 1e16 + 2.9999

On OS X, Windows and FreeBSD you'll get the answer above.
(OS X gcc uses SSE2 by default; Windows and FreeBSD both
make the default x87 rounding-precision 53 bits).

On 32-bit Linux/x86 or Solaris/x86 you'll likely get the answer


instead, because Linux doesn't (usually?) change the Intel default
rounding precision of 64-bits.  Using SSE2 instead of the x87 would have 
fixed this.

</standard x87 rant>
Date User Action Args
2009-04-19 08:44:26mark.dickinsonsetrecipients: + mark.dickinson, gvanrossum, tim.peters, nascheme, rhettinger, amaury.forgeotdarc, pitrou, eric.smith, christian.heimes, alexandre.vassalotti, noam, jaredgrubb, preston
2009-04-19 08:44:25mark.dickinsonsetmessageid: <>
2009-04-19 08:44:24mark.dickinsonlinkissue1580 messages
2009-04-19 08:44:22mark.dickinsoncreate