Issue 9069: test_float failure on Solaris

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53315

classification

Title:	test_float failure on Solaris
Type:	behavior	Stage:	needs patch
Components:	Extension Modules	Versions:	Python 2.6

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	drkirkby, mark.dickinson, skrah
Priority:	normal	Keywords:

Created on 2010-06-24 17:10 by mark.dickinson, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
opensolaris_python_buildlog.txt	mark.dickinson, 2010-06-26 08:52
build-with_socket-failure.txt	drkirkby, 2010-06-26 11:28	Complete build, but test suite can not be run. This I know is due to the faillure to build _socket, since if I patch that, the test suite can be run.
pyconfig.h	drkirkby, 2010-06-26 11:29	pyconfig.h header file created for a 64-bit build.

Messages (37)
msg108532 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-24 17:10
Comment from David Kirkby in issue 8265; moved here because it looks like a separate problem. I'm seeing this failure on both Solaris 10 (SPARC processor) in 32-bit mode and OpenSolaris 06/2009 (Intel Xeon) in 64-bit mode using Python 2.6.4. So it is not just an ARM Linux issue. See http://trac.sagemath.org/sage_trac/ticket/9297 http://trac.sagemath.org/sage_trac/ticket/9299 Note, Solaris supports both a 32 and 64-bit ABI. Not sure if that is relevant, but I see "ABI" in the title, so perhaps it might be.
msg108533 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-24 17:11
And the text of the failure (from the first link David provides): test test_float failed -- Traceback (most recent call last): File "/export/home/drkirkby/sage-4.4.4.alpha1/spkg/build/python-2.6.4.p9/src/Lib/test/test_float.py", line 765, in test_roundtrip self.identical(-x, roundtrip(-x)) File "/export/home/drkirkby/sage-4.4.4.alpha1/spkg/build/python-2.6.4.p9/src/Lib/test/test_float.py", line 375, in identical self.fail('%r not identical to %r' % (x, y)) AssertionError: -0.0 not identical to 0.0
msg108534 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-24 17:14
David, would it be possible for you to provide the results of: >>> float.hex(-0.0) >>> float.fromhex('-0x0.0p+0') on those platforms, so that we can tell whether it's the float -> hex conversion or the hex -> float conversion that's losing the sign of the zero?
msg108538 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-24 18:13
David, please could you also tell me whether HAVE_COPYSIGN is defined for those builds of Python? It should be in pyconfig.h in the top level of the build directory, if it is. And (just to double check), at configure time, there should be a line in the output of the configure script that looks like checking for copysign... yes Do you get 'yes' or 'no' there? If I had to guess, I'd say that it's the float -> hex conversion that's going wrong (so that (-0.0).hex() produces '0x0.0p+0' instead of '-0x0.0p+0'), and that this is caused by either a buggy system copysign function, or by the system copysign function not being found and Python using a buggy workaround.
msg108542 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-24 18:47
Hi Mark, Here's the info on the two systems - first the SPARC system, secondly the Intel Xeon system. 1) SPARC * Sun Blade 2000, with 2 x UltraSPARC III+ 1200 MHZ processors * 8 GB RAM * Solaris 10 update 8 10/09 release (This is the latest release of Solaris 10). drkirkby@swan:~$ cat /etc/release Solaris 10 10/09 s10s_u8wos_08a SPARC Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 16 September 2009 drkirkby@swan:~$ uname -a SunOS swan 5.10 Generic_141444-09 sun4u sparc SUNW,Sun-Blade-1000 Python 2.6.4 (r264:75706, Jun 24 2010, 10:39:29) [GCC 4.4.4] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> float.hex(-0.0) '0x0.0p+0' >>> float.fromhex('-0x0.0p+0') -0.0 When configure runs, I see: "checking for copysign... yes" In pyconfig.h I have: /* Define to 1 if you have the `copysign' function. / #define HAVE_COPYSIGN 1 ====================================================== ====================================================== 2) Intel Xeon system. Sun Ultra 27, quad core 3.33 GHz Intel Xeon processor * 12 GB RAM * OpenSolaris 06/2009, updated to build 134 * 64-bit installation. * Note, this is the native operating system on this machine, so VirtualBox is not used. drkirkby@hawk:~$ cat /etc/release OpenSolaris Development snv_134 X86 Copyright 2010 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 01 March 2010 drkirkby@hawk:~$ uname -a SunOS hawk 5.11 snv_134 i86pc i386 i86pc Python 2.6.4 (r264:75706, Jun 24 2010, 17:38:56) [GCC 4.4.4] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> float.hex(-0.0) '0x0.0p+0' >>> float.fromhex('-0x0.0p+0') -0.0 >>> When configure runs, I see: "checking for copysign... yes" In pyconfig.h I have: /* Define to 1 if you have the `copysign' function. */ #define HAVE_COPYSIGN 1 If you feel access to the SPARC system could help you debug this (or any of the other test failures I get), I can get you access to a machine 16-core Sun T5240 which was donated by Sun. I can't provide such easy access to the Xeon system, though you can install OpenSolaris as a Virtual machine in VirtualBox quite easily - its a free download. Dave
msg108543 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-24 19:02
Thanks for the details. So the relevant code (see the float_hex function in Objects/floatobject.c) looks like this: if (x == 0.0) { if(copysign(1.0, x) == -1.0) return PyString_FromString("-0x0.0p+0"); else return PyString_FromString("0x0.0p+0"); } This should produce the correct string for -0.0 (because -0.0 compares equal to 0.0, and then copysign(1.0, x) should be -1.0); I'm reasonably confident that the C code is correct, since the tests pass on all the other platforms that get tested regularly. So a buggy system copysign function looks like a possibility. Another more likely possibility occurs to me, though: and that's that there's a buggy compiler optimization going on: the compiler sees that we're in an 'x == 0.0' branch, and decides that it can substitute '0.0' for 'x' everywhere in the 'if' block. But this is just guessing. Do you still get these failures in a debug build of Python (i.e., by passing --with-pydebug to the configure script)?
msg108544 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-24 19:07
Just to clarify something, in case you notice something does not look quite right. The link I provided to the build failure on the SPARC machine http://trac.sagemath.org/sage_trac/ticket/9297 was a Sun Blade 1000. It is not the same machine from which I just copied the output, which was a Sun Blade 2000. The two machines are pretty similar though - the motherboards, processors, disks, RAM are interchangeable. In fact,'uname' shows Sun-Blade-1000 in both of them. I think the only real difference between them is that the Blade 2000 looks a bit nicer, and is officially supported with faster CPUs. The link I provided to the failure on the Xeon machine http://trac.sagemath.org/sage_trac/ticket/9299 is the same machine where I just posted the output. If you need an account on a SPARC, it will be a more modern Sun T5240 with 32 GB RAM. Dave
msg108546 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-24 20:02
I'll take a look at this in an hour or two. I'll restrict the testing to the Xeon machine, as it is a zillion times quicker than the old SPARCs. What comes to my mind, is that perhaps 'copysign' is only defined in C99. Solaris header files are pretty strict about what gets defined and not defined depending on the mode of compilation. The compiler option -std=c99 is not being passed yet the man page for copysign on my OpenSolaris laptop (yet another system) says: drkirkby@laptop:~$ man copysign Mathematical Library Functions copysign(3M) NAME copysign, copysignf, copysignl - number manipulation func- tion SYNOPSIS c99 [ flag... ] file... -lm [ library... ] #include <math.h> double copysign(double x, double y); float copysignf(float x, float y); long double copysignl(long double x, long double y); DESCRIPTION These functions produce a value with the magnitude of x and the sign of y.
msg108550 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-24 20:54
Using the compiler option -std=c99 allows this test to pass. Perhaps adding the macro AC_PROG_CC_C99 to autoconf to add the right compiler option might be a solution. I know Solaris headers are often quite strict, and will not define something in a header file if the right things are not defined to indicate C99. I would add, there is quite a serious problem on Solaris with _socket failing to build. http://bugs.python.org/issue8852 Unless one uses that workaround, which is not committed to the python source code yet, one can not run any tests of python.
msg108552 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-24 21:30
Thanks for the update. So I'm confused: when -std=c99 isn't given, where is the build finding the copysign function from? That is, why isn't there a link error when building Python? (I'm attempting to install OpenSolaris in Parallels at the moment, but it may take more time than I have available at the moment...)
msg108564 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-24 23:24
Hi Mark, Since 'copysign' is in the maths library, I would not expect the link phase to fail. Solaris does not ship with different maths libraries for C99 (one just links to libm). However, I would not be surprised if the behavior was ill defined if the compiler is not C99. Certainly header files behave differently on Solaris depending on the mode of the compiler. For example, trying to use the INFINITY macro when the compiler is not C99 seems to work on Linux, but fails on Solaris unless you force C99 mode with gcc -std=c99. The following bit of code gives the same results whether one uses 'gcc' or 'gcc -std=c99' on OpenSolaris or Linux. However, if one uses 'gcc -ansi' then the behavior is totally different. drkirkby@hawk:~$ cat cs.c #include <stdio.h> #include <math.h> int main(int argc, char *argv) { double x, y; / Set x and y differently if a command line arguement is given. This will avoid the compiler optimising the values out, as they will not be known in advance. / if (argc==1) { / This will stop compiler optimising 0.0 out x */ x=1.0; y=0.0; } else { x=2.0; y=-0.0; } printf("copysign(%lf,%lf)=%lf\n", x, y, copysign(x, y)); } drkirkby@hawk:~$ gcc -lm cs.c drkirkby@hawk:~$ ./a.out copysign(1.000000,0.000000)=1.000000 drkirkby@hawk:~$ ./a.out z copysign(2.000000,-0.000000)=-2.000000 drkirkby@hawk:~$ gcc -lm -std=c99 cs.c drkirkby@hawk:~$ ./a.out copysign(1.000000,0.000000)=1.000000 drkirkby@hawk:~$ ./a.out z copysign(2.000000,-0.000000)=-2.000000 Note how -ansi screws it up completely drkirkby@hawk:~$ gcc -lm -ansi cs.c drkirkby@hawk:~$ ./a.out copysign(1.000000,0.000000)=0.000000 drkirkby@hawk:~$ ./a.out z copysign(2.000000,-0.000000)=0.000000 I also tried it on a Sun SPARC running a recent version of Solaris (2009 release). Again the results are the same. I then tried it on a Solaris box running the first release of Solaris 10 (03/2005). Then one gets even stranger behavior if one defines -ansi, where the results are almost right, but with poor rounding errors. drkirkby@redstart:~$ gcc -ansi -lm cs.c drkirkby@redstart:~$ ./a.out copysign(1.000000,0.000000)=1.000001 drkirkby@redstart:~$ ./a.out d copysign(2.000000,-0.000000)=-2.000002 But in C99 mode, it works fine. drkirkby@redstart:~$ gcc -std=c99 -lm cs.c drkirkby@redstart:~$ ./a.out copysign(1.000000,0.000000)=1.000000 drkirkby@redstart:~$ ./a.out d copysign(2.000000,-0.000000)=-2.000000 So I draw two conclusions. 1) 'copysign' is in the maths library, so a program which tries to link to 'copysign' will succeed. 2) The behavior of 'copysign' is ill defined unless the compiler is a C99 compiler. I don't think you should use copysign unless the compiler is C99. Trying to come up with a test for 'copysign' working is probably an impossible task, as it undefined. So you could try 99 different values of x and y and they all work, but its anyone guess what will happen with the 100th set of values. Dave
msg108571 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-25 00:02
Just to clarify the hostnames and hardware used, in case you look at the results here or the links to the Sage maths bug tracker and are not sure what is what. Note some are Solaris and some are OpenSolaris. Some have SPARC and some have Intel processors. All machines are 64-bit, but note that by default executables are created 32-bit on Solaris and OpenSolaris. * hawk = Sun Ultra 27, 3.33 GHz quad core Xeon, OpenSolaris 06/2009, but updated to the latest build of OpenSolaris. * laptop = Sony laptop, 2.0 GHz Intel CPU core2 duo, OpenSolaris 06/2009. * swan = Sun Blade 2000, 2 x 1200 MHz SPARC processors, Solaris 10 10/2009 release (Latest release of Solaris 10 at the time I'm writing this) * redstart = Sun Blade 1000, 2 x 900 MHz SPARC processors, Solaris 10 03/2005 (First Solaris 10 release) Although I've not shows the results from them, if I do show any others, likely candidates will be * sage = x86 Linux box (Ubunta I think) 24 cores. * t2 = Sun T5240, T2+ SPARC processors, 16 cores 1167 MHz, Solaris 10 05/2009 (A recent, but not the very latest release of Solaris 10) * bsd = OS X box of some sort. * hpbox = HP C3600 running HP-UX 11.11B, PA-RISC processors. * chaffinch = Virtual machine running Solaris 10 10/2009. (Runs as a guest operating system in VirtualBox) Sometimes having access to different hardware can be useful, but it can get confusing if someone sees a lot of different host names! Dave
msg108579 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-25 07:11
Did you have a chance to try a debug build of Python and see if the problem persists there? I'm failing to reproduce this in OpenSolaris 2009.06, running in Parallels on a MacBook Pro (non-debug 32-bit build of Python): dickinsm@eratosthenes:~/release26-maint$ uname -a SunOS eratosthenes 5.11 snv_111b i86pc i386 i86pc Solaris dickinsm@eratosthenes:~/release26-maint$ cat /etc/release OpenSolaris 2009.06 snv_111b X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 07 May 2009 dickinsm@eratosthenes:~/release26-maint$ ./python Python 2.6.5+ (release26-maint:82213, Jun 25 2010, 00:52:22) [GCC 3.4.3 (csl-sol210-3_4-20050802)] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> float.hex(-0.0) '-0x0.0p+0' >>> import sys; sys.maxsize 2147483647 The most noticeable difference from the machines you describe here is the compiler. (Did you build gcc 4.4.4 by hand on these machines, or is there a package I can download and install somewhere?) I'd still like to understand how the -c99 compiler option affects copysign; it might help inform a workaround. The library function itself can't know how you compiled Python, surely? Can you work out what's going on from the relevant header files?
msg108585 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-25 11:18
So perhaps the cause is simply that copysign isn't being declared for David's Python builds? If that were the case, I'd expect to see some gcc warnings in the Python build output, something like: warning: implicit declaration of function `copysign' David, are there any such warnings? Looking at /usr/include/math.h in my OpenSolaris VM, I see (with irrelevant bits omitted): #if defined(__EXTENSIONS__) \|\| defined(_XOPEN_SOURCE) \|\| \ !defined(_STRICT_STDC) && !defined(_POSIX_C_SOURCE) #if defined(__EXTENSIONS__) \|\| !defined(_XOPEN_SOURCE) extern double copysign __P((double, double)); #endif #endif Assuming that this is the cause, it would be interesting to know which of these defines differs between my OpenSolaris VM and David's machines. (e.g., the 'hawk' machine, since this seems closest in spec to what I'm working with).
msg108586 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-25 11:31
David, my pyconfig.h file contains: /* Defined on Solaris to see additional function prototypes. */ #define __EXTENSIONS__ 1 Does yours?
msg108630 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-25 21:03
Now that I've finally managed to get gcc 4.4.4 installed on OpenSolaris... .. I'm still failing to reproduce this bug. :( dickinsm@eratosthenes:~/release26-maint$ uname -a SunOS eratosthenes 5.11 snv_134 i86pc i386 i86pc Solaris dickinsm@eratosthenes:~/release26-maint$ cat /etc/release OpenSolaris Development snv_134 X86 Copyright 2010 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 01 March 2010 dickinsm@eratosthenes:~/release26-maint$ ./python Python 2.6.4 (release26-maint:75706, Jun 25 2010, 21:44:19) [GCC 4.4.4] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> float.hex(-0.0) '-0x0.0p+0' >>> import sys; sys.maxint 2147483647 As far as I can tell, this setup is almost identical to David's 'hawk' machine (same gcc version, same OpenSolaris build, same Python source revision). I'm not really sure where I can go from here. Stefan, you're not able to reproduce this by any chance, are you?
msg108639 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-06-25 22:27
Mark Dickinson <report@bugs.python.org> wrote: > Now that I've finally managed to get gcc 4.4.4 installed on OpenSolaris... > > .. I'm still failing to reproduce this bug. :( > > > dickinsm@eratosthenes:~/release26-maint$ uname -a > SunOS eratosthenes 5.11 snv_134 i86pc i386 i86pc Solaris > dickinsm@eratosthenes:~/release26-maint$ cat /etc/release > OpenSolaris Development snv_134 X86 > Copyright 2010 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 01 March 2010 > dickinsm@eratosthenes:~/release26-maint$ ./python > Python 2.6.4 (release26-maint:75706, Jun 25 2010, 21:44:19) > [GCC 4.4.4] on sunos5 > Type "help", "copyright", "credits" or "license" for more information. > >>> float.hex(-0.0) > '-0x0.0p+0' > >>> import sys; sys.maxint > 2147483647 > > As far as I can tell, this setup is almost identical to David's 'hawk' machine (same gcc version, same OpenSolaris build, same Python source +revision). > > I'm not really sure where I can go from here. > > Stefan, you're not able to reproduce this by any chance, are you? No, I'm getting the same results as you (OpenSolaris/qemu/32-bit/gcc). I wonder if it's somehow related to issue 7281, which was resolved by changing float_repr_style. But as I said, I cannot reproduce it with either gcc or suncc. On the other hand, who knows how the FPU is emulated in qemu. The C standard is diplomatic as usual: "On implementations that represent a signed zero but do not treat negative zero consistently in arithmetic operations, the copysign functions regard the sign of zero as positive." David, one of your comments implied for me that you have managed to compile Python with -std=c99. For me, this fails instantly. What options did you use?
msg108654 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-25 23:59
Hi, I had hoped to devote more time to this, but have been able to. I will do at the weekend. I would add I was building 64-bit, so adding the compiler flag -m64 on 'hawk' at least some of the time. Depending on your hardware, assuming you have installed !OpenSolaris as a Virtual machine in VirtualBox, it may be a 32 or 64-bit version of OpenSolaris. You need specific instructions from the processor for a 64-bit version and Sony in their infinite wisdom have disabled it on my Vaio laptop, so whilst I can install OpenSolaris as a 64-bit host operating system, any attempt to install a 64-bit guest will fail. If I don't chose to compile C99, then I need to add the compiler flag -DHAVE_DECL_ISFINITE=0. Otherwise I see: Objects/object.c:1036: warning: implicit declaration of function 'isinf' Undefined first referenced symbol in file isfinite ./libpython2.6.so ld: fatal: symbol referencing errors. No output written to python Again, the Solaris man page says: Mathematical Library Functions isfinite(3M) NAME isfinite - test for finite value SYNOPSIS c99 [ flag... ] file... -lm [ library... ] #include <math.h> int isfinite(real-floating x); implying this is a C99 function. This conflicting behavior could be the result of what linker or assembler is being used. On SPARC, I use Sun linker and assembler. On OpenSolaris I use the Sun linker, but the GNU assembler. I would have thought it was better to test this out with small bits of test code like I posted, rather than the complete Python source code. It might be better if I just create you an account on 'hawk'. Drop me an email at david <dot> kirkby {at} onetel \|dot\| net if you want. I can also get you an account at the University of Washington if you want on a Sun T5240 SPARC. I've not verified the problem on that machine, but I can do so. Just drop me an email with a preferred user name and I'll sort it out. The SPARC is very slow - despite it being a current model of a high end server. It is designed for a different sort of task to developing software. The CPUs are pretty slow (1167 MHz) and pretty dumb, but there are 128 hardware threads. In order to get any useful performance from the T5240, the code needs to be highly parallel or have lots of processes like on busy web servers. That is what 't2' is designed for - a high end web server. But 'hawk' is a pretty high spec PC which I run 24/7. Dave
msg108690 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 08:08
> I would add I was building 64-bit, so adding the compiler flag -m64 on 'hawk' at least some of the time. Ah; that may be relevant. Can you tell us exactly what command line you're using to build Python, and the values of any relavent environment variables? > I would have thought it was better to test this out with small bits > of test code like I posted, rather than the complete Python > source code. Unfortunately, the small bits of code don't really help: I get the same results as you for those, and I understand why those are failing: copysign isn't declared (as you'll see if you add -Wall to your compilation line) so the compiler assumes it returns type 'int'. This shouldn't happen with Python because its configure script defines __EXTENSIONS__, which ensures that copysign is declared when math.h is included. Can you still reproduce the strange copysign results with your small examples when __EXTENSIONS__ is #define'd? > This conflicting behavior could be the result of what linker or > assembler is being used. On SPARC, I use Sun linker and assembler. > On OpenSolaris I use the Sun linker, but the GNU assembler. I have the same setup (Sun linker, GNU assembler): dickinsm@eratosthenes:~/release26-maint$ gcc-4.4 -v Using built-in specs. Target: i386-pc-solaris2.11 Configured with: ../gcc-4.4.4/configure --prefix=/usr/local --program-suffix=-4.4 --with-mpfr-include=/usr/include/mpfr --with-gmp-include=/usr/include/gmp --with-as=/usr/bin/gas --with-gnu-as --with-ld=/usr/bin/ld --without-gnu-ld --enable-shared --enable-multilib --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.4 (GCC)
msg108694 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 08:52
David, I'm still missing some easy answers that would really help. Please could you answer the question about whether __EXTENSIONS__ is defined in your pyconfig.h; it would help determine what we should be investigating. There shouldn't be any need for the -std=c99 option: the Python configure script defines __EXTENSIONS__ exactly to make these c99 functions available. So either that isn't happening on your machines, in which case we should be looking for a problem with the configure script, or it is happening, in which case copysign is being properly declared on your machine and we have to look elsewhere (library mismatch? compiler optimization bug?) for the cause of failure. Still no joy with an -m64 build. I've attached a transcript showing the precise steps I used. [I'm tempted to close this issue as 'works for me'; I'm not seeing any test_math failure in OpenSolaris, and neither is the Solaris/SPARC buildbot.]
msg108695 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 08:56
Stefan, thanks for the feedback. I don't think this is related to issue 7281. I thought we'd determined that that issue had nothing to do with copysign itself, and everything to do with what the signbit of the NaN returned by float("nan") happens to be.
msg108698 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-06-26 09:44
Mark Dickinson <report@bugs.python.org> wrote: > I don't think this is related to issue 7281. I thought we'd determined that that issue > had nothing to do with copysign itself, and everything to do with what the signbit of the > NaN returned by float("nan") happens to be. Yes, that's right. Also, to avoid spreading misinformation: The copysign(1.0, float("nan")) behavior is actually the same after changing float_repr_style.
msg108706 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 11:04
Hi, __EXTENSIONS__ is defined to 1. Give me an hour, and I'll attach a log.
msg108711 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 11:28
Here's a build done the same way as you. This gives the same result as you here. But an attempt to run the test suite fails because of _socket. I need to patch that in order that I can run the test suite. See http://bugs.python.org/issue8852
msg108712 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 11:29
Here's the header file that gets created
msg108714 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 12:07
Thanks for the logs. So if you apply the issue8852 patch, and run the test suite, does test_float still fail?
msg108718 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 12:25
Sorry, I seem to have wasted a lot of your time here. Python was built from a script which applied some patches - including that one that allows _socket to build. Without that (which is not commit to python and I'm told it might not be done this year), its impossible to run the test suite. It looks like of the patches has messed up with test_float. Unfortunately, if I just apply the patch at issue8852, then the test suite hangs at: test_posix It's used 48 minutes of CPU time on a 3.33 GHz Xeon as I write, passing most tests up to test_posix, but hanging there. Sorry to have wasted your time. BTW, are you able to run the full test suite, or does the test suite simply not run at all? Dave
msg108723 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 13:24
I see the same _socket build failure as you do; but with the issue 8852 patch, I can run the test suite. It did spend quite a while in test_posix, but the test eventually finished (and failed). I didn't get to the end of the test run, unfortunately, because I ran out of disk space. (4 virtual machines on a small laptop is pushing it a bit. :) It's possible that the test_posix failure was due to running out of space, I guess. Anyway, that's a separate issue.
msg108725 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 13:26
test_float test test_float failed -- Traceback (most recent call last): File "/export/home/drkirkby/Python-2.7rc2/Lib/test/test_float.py", line 1297, in test_roundtrip self.identical(-x, roundtrip(-x)) File "/export/home/drkirkby/Python-2.7rc2/Lib/test/test_float.py", line 907, in identical self.fail('%r not identical to %r' % (x, y)) AssertionError: -0.0 not identical to 0.0 test_fnmatch test_fork1
msg108726 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 13:28
Sorry, I missed out the comment there. This is failing for me, in boht 32 and 64-bit builds with Python-2.7rc2
msg108728 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 14:56
Okay---this one I can reproduce. :) I'm going to call it a gcc optimization bug. Specifically, it seems to be a bug involving gcc's builtin version of the copysign function. When I build a current svn trunk checkout (r82245) with: CC='gcc-4.4 -m64' ./configure && make I get the wrong result: Python 2.7rc2+ (trunk:82245, Jun 26 2010, 05:35:07) [GCC 4.4.4] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> (-0.0).hex() '0x0.0p+0' But when building with either: CC='gcc-4.4 -m64 -fno-builtin-copysign' ./configure && make or CC='gcc-4.4 -m64' ./configure --with-pydebug && make I get the expected results. If I have time I'll investigate further and see if I can generate the bug from smaller code. At any rate, I don't think this is something that can sensibly be fixed in Python itself, so I think this issue should be closed, and a bug filed upstream if necessary. I also can't see a good reason why this bug would be specific to OpenSolaris. Does anyone have gcc-4.4.4 available to test this on OS X, Linux or *BSD?
msg108730 - (view)	Author: David Kirkby (drkirkby)	Date: 2010-06-26 15:36
I'm glad you can reproduce it! I can understand you wanting to close it in this case. I've no problem with that. To me at least, it does not seem anywhere near as serious as the other problem. I will try it on Linux though. I have access to reasonably decent (24 core) Linux box, so I'll try it on that. But it means a lot of messing around, downloading mpir, mpfr, possibly newer versions of the GNU binutils etc. Not a 5 minute job. The weather here in the UK is too nice to site behind a computer screen! Dave
msg108734 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 17:11
Here's some minimal failing code. // Compile with: // gcc-4.4 -m64 -fno-inline -g -O3 copysign_bug.c -o copysign_bug #include <math.h> #include <stdio.h> int copysign_bug(double x) { if (x && (x * 0.5 == x)) return 1; if (copysign(1.0, x) < 0.0) return 2; else return 3; } int main(void) { double x; x = -0.0; printf("copysign_bug(%.17g) = %d\n", x, copysign_bug(x)); x = 0.0; printf("copysign_bug(%.17g) = %d\n", x, copysign_bug(x)); return 0; } This produces the output: copysign_bug(-0) = 3 copysign_bug(0) = 3 I would expecting to see: copysign_bug(-0) = 2 copysign_bug(0) = 3 I've reported this at: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44683
msg108735 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-06-26 17:17
Mark, gcc-4.4 on Fedora 12 is ok: [stefan@fedora-amd64 trunk]$ ./python Python 2.7rc2+ (trunk:82245M, Jun 26 2010, 13:09:14) [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> (-0.0).hex() '-0x0.0p+0' As a general remark, from what I hear on the gmp-bugs list, the newer gcc versions often seem to have problems on Solaris.
msg108736 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-06-26 17:26
Fedora 12: copysign-bug varies wildly ((GCC) 4.4.3 20100127 (Red Hat 4.4.3-4)): [stefan@fedora-amd64 trunk]$ gcc -O0 copysign_bug.c -o copysign_bug [stefan@fedora-amd64 trunk]$ ./copysign_bug copysign_bug(-0) = 2 copysign_bug(0) = 3 [stefan@fedora-amd64 trunk]$ gcc -O2 copysign_bug.c -o copysign_bug [stefan@fedora-amd64 trunk]$ ./copysign_bug copysign_bug(-0) = 3 copysign_bug(0) = 3 [stefan@fedora-amd64 trunk]$ gcc -O3 copysign_bug.c -o copysign_bug [stefan@fedora-amd64 trunk]$ ./copysign_bug copysign_bug(-0) = 2 copysign_bug(0) = 3 [stefan@fedora-amd64 trunk]$ gcc -O3 -fno-inline copysign_bug.c -o copysign_bug [stefan@fedora-amd64 trunk]$ ./copysign_bug copysign_bug(-0) = 3 copysign_bug(0) = 3
msg108737 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-26 17:52
Thanks Stefan. The bug apparently exists in gcc-4.5 on OS X as well. I'll update the gcc bug report. newton:~ dickinsm$ gcc-mp-4.5 -fno-inline -O3 copysign_bug.c -o copysign_bug && ./copysign_bug copysign_bug(-0) = 3 copysign_bug(0) = 3
msg108803 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-06-27 18:28
The gcc optimization bug was fixed (with impressive speed!) upstream. I'm going to close this as 'won't fix'. It's probably possible to find a workaround, but the issue is minor, apparently only affects one platform, and relates to a little-used method. For people who want this fixed in their own Python builds, a fairly unintrusive workaround is to add '-fno-builtin-copysign' to the compiler flags, or to compile with version <4.4 of gcc until the next bugfix release of gcc 4.4 or 4.5. David: thanks for reporting this, and for all the followup.

History
Date	User	Action	Args
2022-04-11 14:57:02	admin	set	github: 53315
2010-06-27 18:28:47	mark.dickinson	set	status: open -> closed resolution: wont fix messages: + msg108803
2010-06-26 17:52:40	mark.dickinson	set	messages: + msg108737
2010-06-26 17:26:04	skrah	set	messages: + msg108736
2010-06-26 17:17:22	skrah	set	messages: + msg108735
2010-06-26 17:11:34	mark.dickinson	set	messages: + msg108734
2010-06-26 15:36:13	drkirkby	set	messages: + msg108730
2010-06-26 14:57:00	mark.dickinson	set	messages: + msg108728
2010-06-26 13:28:02	drkirkby	set	messages: + msg108726
2010-06-26 13:26:36	drkirkby	set	messages: + msg108725
2010-06-26 13:24:56	mark.dickinson	set	messages: + msg108723
2010-06-26 12:25:34	drkirkby	set	messages: + msg108718
2010-06-26 12:07:37	mark.dickinson	set	messages: + msg108714
2010-06-26 11:29:58	drkirkby	set	files: + pyconfig.h messages: + msg108712
2010-06-26 11:28:42	drkirkby	set	files: + build-with_socket-failure.txt messages: + msg108711
2010-06-26 11:04:58	drkirkby	set	messages: + msg108706
2010-06-26 09:44:44	skrah	set	messages: + msg108698
2010-06-26 08:56:11	mark.dickinson	set	messages: + msg108695
2010-06-26 08:52:55	mark.dickinson	set	files: + opensolaris_python_buildlog.txt messages: + msg108694
2010-06-26 08:08:52	mark.dickinson	set	messages: + msg108690
2010-06-25 23:59:42	drkirkby	set	messages: + msg108654
2010-06-25 22:27:30	skrah	set	messages: + msg108639
2010-06-25 21:03:35	mark.dickinson	set	nosy: + skrah messages: + msg108630
2010-06-25 11:31:24	mark.dickinson	set	messages: + msg108586
2010-06-25 11:18:34	mark.dickinson	set	messages: + msg108585
2010-06-25 07:11:52	mark.dickinson	set	messages: + msg108579
2010-06-25 00:02:43	drkirkby	set	messages: + msg108571
2010-06-24 23:24:00	drkirkby	set	messages: + msg108564
2010-06-24 21:30:56	mark.dickinson	set	messages: + msg108552
2010-06-24 20:54:53	drkirkby	set	messages: + msg108550
2010-06-24 20:02:17	drkirkby	set	messages: + msg108546
2010-06-24 19:07:26	drkirkby	set	messages: + msg108544
2010-06-24 19:02:56	mark.dickinson	set	messages: + msg108543
2010-06-24 18:47:46	drkirkby	set	messages: + msg108542
2010-06-24 18:13:44	mark.dickinson	set	messages: + msg108538
2010-06-24 17:14:05	mark.dickinson	set	messages: + msg108534
2010-06-24 17:11:41	mark.dickinson	set	messages: + msg108533
2010-06-24 17:10:46	mark.dickinson	create