Message 86158 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	alexandre.vassalotti, amaury.forgeotdarc, christian.heimes, eric.smith, gvanrossum, jaredgrubb, mark.dickinson, nascheme, noam, pitrou, preston, rhettinger, tim.peters
Date	2009-04-19.08:44:14
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1240130665.83.0.328550745196.issue1580@psf.upfronthosting.co.za>
In-reply-to

Content
[Raymond] > Is there a way to use SSE when available and x86 when it's not. I guess it's possible in theory, but I don't know of any way to do this in practice. I suppose one could trap the SIGILL generated by the attempted use of an SSE2 instruction on a non-supported platform---is this how things used to work for 386s without the 387? That would make a sequence of floating-point instructions on non-SSE2 x86 horribly slow, though. Antoine: as Raymond said, the advantage of SSE2 for numeric work is accuracy, predictability, and consistency across platforms. The SSE2 instructions finally put an end to all the problems arising from the mismatch between the precision of the x87 floating-point registers (64- bits) and the precision of a C double (53-bits). Those difficulties include (1) unpredictable rounding of intermediate values from 64-bit precision to 53-bit precision, due to spilling of temporaries from FPU registers to memory, and (2) double-rounding. The arithmetic of Python itself is largely immune to the former, but not the latter. (And of course the register spilling still causes headaches for various bits of CPython). Those difficulties can be mostly dealt with by setting the x87 rounding precision to double (instead of extended), though this doesn't fix the exponent range, so one still ends up with double-rounding on underflow. The catch is that one can't mess with the x87 state globally, as various library functions (especially libm functions) might depend on it being in whatever the OS considers to be the default state. There's a very nice paper by David Monniaux that covers all this: definitely recommended reading after you've finished Goldberg's "What Every Computer Scientist...". It can currently be found at: http://hal.archives-ouvertes.fr/hal-00128124/en/ An example: in Python (any version), try this: >>> 1e16 + 2.9999 10000000000000002.0 On OS X, Windows and FreeBSD you'll get the answer above. (OS X gcc uses SSE2 by default; Windows and FreeBSD both make the default x87 rounding-precision 53 bits). On 32-bit Linux/x86 or Solaris/x86 you'll likely get the answer 10000000000000004.0 instead, because Linux doesn't (usually?) change the Intel default rounding precision of 64-bits. Using SSE2 instead of the x87 would have fixed this. </standard x87 rant>

[Raymond]
> Is there a way to use SSE when available and x86 when it's not.

I guess it's possible in theory, but I don't know of any way to do this in
practice. I suppose one could trap the SIGILL generated by the attempted
use of an SSE2 instruction on a non-supported platform---is this how
things used to work for 386s without the 387? That would make a sequence
of floating-point instructions on non-SSE2 x86 horribly slow, though.

Antoine: as Raymond said, the advantage of SSE2 for numeric work is
accuracy, predictability, and consistency across platforms. The SSE2
instructions finally put an end to all the problems arising from the
mismatch between the precision of the x87 floating-point registers (64-
bits) and the precision of a C double (53-bits). Those difficulties
include (1) unpredictable rounding of intermediate values from 64-bit
precision to 53-bit precision, due to spilling of temporaries from FPU
registers to memory, and (2) double-rounding. The arithmetic of Python
itself is largely immune to the former, but not the latter. (And of
course the register spilling still causes headaches for various bits of
CPython).

Those difficulties can be *mostly* dealt with by setting the x87 rounding
precision to double (instead of extended), though this doesn't fix the
exponent range, so one still ends up with double-rounding on underflow.
The catch is that one can't mess with the x87 state globally, as various
library functions (especially libm functions) might depend on it being in whatever the OS considers to be the default state.

There's a very nice paper by David Monniaux that covers all this:
definitely recommended reading after you've finished Goldberg's "What
Every Computer Scientist...". It can currently be found at:

http://hal.archives-ouvertes.fr/hal-00128124/en/

An example: in Python (any version), try this:

>>> 1e16 + 2.9999
10000000000000002.0

On OS X, Windows and FreeBSD you'll get the answer above.
(OS X gcc uses SSE2 by default; Windows and FreeBSD both
make the default x87 rounding-precision 53 bits).

On 32-bit Linux/x86 or Solaris/x86 you'll likely get the answer

10000000000000004.0

instead, because Linux doesn't (usually?) change the Intel default
rounding precision of 64-bits. Using SSE2 instead of the x87 would have
fixed this.

</standard x87 rant>

History
Date	User	Action	Args
2009-04-19 08:44:26	mark.dickinson	set	recipients: + mark.dickinson, gvanrossum, tim.peters, nascheme, rhettinger, amaury.forgeotdarc, pitrou, eric.smith, christian.heimes, alexandre.vassalotti, noam, jaredgrubb, preston
2009-04-19 08:44:25	mark.dickinson	set	messageid: <1240130665.83.0.328550745196.issue1580@psf.upfronthosting.co.za>
2009-04-19 08:44:24	mark.dickinson	link	issue1580 messages
2009-04-19 08:44:22	mark.dickinson	create