This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_float failure on Solaris
Type: behavior Stage: needs patch
Components: Extension Modules Versions: Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: drkirkby, mark.dickinson, skrah
Priority: normal Keywords:

Created on 2010-06-24 17:10 by mark.dickinson, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
opensolaris_python_buildlog.txt mark.dickinson, 2010-06-26 08:52
build-with_socket-failure.txt drkirkby, 2010-06-26 11:28 Complete build, but test suite can not be run. This I know is due to the faillure to build _socket, since if I patch that, the test suite can be run.
pyconfig.h drkirkby, 2010-06-26 11:29 pyconfig.h header file created for a 64-bit build.
Messages (37)
msg108532 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-24 17:10
Comment from David Kirkby in issue 8265;  moved here because it looks like a separate problem.

I'm seeing this failure on both Solaris 10 (SPARC processor) in 32-bit mode and OpenSolaris 06/2009 (Intel Xeon) in 64-bit mode using Python 2.6.4. So it is not just an ARM Linux issue. 

See 

http://trac.sagemath.org/sage_trac/ticket/9297
http://trac.sagemath.org/sage_trac/ticket/9299

Note, Solaris supports both a 32 and 64-bit ABI. Not  sure if that is relevant, but I see "ABI" in the title, so perhaps it might be.
msg108533 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-24 17:11
And the text of the failure (from the first link David provides):

test test_float failed -- Traceback (most recent call last):
  File "/export/home/drkirkby/sage-4.4.4.alpha1/spkg/build/python-2.6.4.p9/src/Lib/test/test_float.py", line 765, in test_roundtrip
    self.identical(-x, roundtrip(-x))
  File "/export/home/drkirkby/sage-4.4.4.alpha1/spkg/build/python-2.6.4.p9/src/Lib/test/test_float.py", line 375, in identical
    self.fail('%r not identical to %r' % (x, y))
AssertionError: -0.0 not identical to 0.0
msg108534 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-24 17:14
David, would it be possible for you to provide the results of:

>>> float.hex(-0.0)
>>> float.fromhex('-0x0.0p+0')

on those platforms, so that we can tell whether it's the float -> hex conversion or the hex -> float conversion that's losing the sign of the zero?
msg108538 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-24 18:13
David, please could you also tell me whether HAVE_COPYSIGN is defined for those builds of Python?  It should be in pyconfig.h in the top level of the build directory, if it is.

And (just to double check), at configure time, there should be a line in the output of the configure script that looks like

checking for copysign... yes

Do you get 'yes' or 'no' there?

If I had to guess, I'd say that it's the float -> hex conversion that's going wrong (so that (-0.0).hex() produces '0x0.0p+0' instead of '-0x0.0p+0'), and that this is caused by either a buggy system copysign function, or by the system copysign function not being found and Python using a buggy workaround.
msg108542 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-24 18:47
Hi Mark, 

Here's the info on the two systems - first the SPARC system, secondly the Intel Xeon system.

1) SPARC

 * Sun Blade 2000, with 2 x UltraSPARC III+ 1200 MHZ processors
 * 8 GB RAM 
 * Solaris 10 update 8 10/09 release (This is the latest release of Solaris 10).

drkirkby@swan:~$ cat /etc/release
                      Solaris 10 10/09 s10s_u8wos_08a SPARC
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 16 September 2009
drkirkby@swan:~$ uname -a
SunOS swan 5.10 Generic_141444-09 sun4u sparc SUNW,Sun-Blade-1000

Python 2.6.4 (r264:75706, Jun 24 2010, 10:39:29) 
[GCC 4.4.4] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> float.hex(-0.0)
'0x0.0p+0'
>>> float.fromhex('-0x0.0p+0')
-0.0

When configure runs, I see:
"checking for copysign... yes"

In pyconfig.h I have:

/* Define to 1 if you have the `copysign' function. */
#define HAVE_COPYSIGN 1

======================================================
======================================================

2) Intel Xeon system. 

* Sun Ultra 27, quad core 3.33 GHz Intel Xeon processor
* 12 GB RAM
* OpenSolaris 06/2009, updated to build 134
* 64-bit installation. 
* Note, this is the native operating system on this machine, so VirtualBox is not used.  

drkirkby@hawk:~$ cat /etc/release
                       OpenSolaris Development snv_134 X86
           Copyright 2010 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 01 March 2010
drkirkby@hawk:~$ uname -a
SunOS hawk 5.11 snv_134 i86pc i386 i86pc

Python 2.6.4 (r264:75706, Jun 24 2010, 17:38:56) 
[GCC 4.4.4] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> float.hex(-0.0)
'0x0.0p+0'
>>> float.fromhex('-0x0.0p+0')
-0.0
>>> 

When configure runs, I see:

"checking for copysign... yes"

In pyconfig.h I have:
/* Define to 1 if you have the `copysign' function. */
#define HAVE_COPYSIGN 1


If you feel access to the SPARC system could help you debug this (or any of the other test failures I get), I can get you access to a machine 16-core Sun T5240 which was donated by Sun. 

I can't provide such easy access to the Xeon system, though you can install OpenSolaris as a Virtual machine in VirtualBox quite easily - its a free download. 

Dave
msg108543 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-24 19:02
Thanks for the details.  So the relevant code (see the float_hex function in Objects/floatobject.c) looks like this:

    if (x == 0.0) {
        if(copysign(1.0, x) == -1.0)
            return PyString_FromString("-0x0.0p+0");
        else
            return PyString_FromString("0x0.0p+0");
    }

This *should* produce the correct string for -0.0 (because -0.0 compares equal to 0.0, and then copysign(1.0, x) should be -1.0);  I'm reasonably confident that the C code is correct, since the tests pass on all the other platforms that get tested regularly.

So a buggy system copysign function looks like a possibility.  Another more likely possibility occurs to me, though: and that's that there's a buggy compiler optimization going on:  the compiler sees that we're in an 'x == 0.0' branch, and decides that it can substitute '0.0' for 'x' everywhere in the 'if' block.  But this is just guessing.

Do you still get these failures in a debug build of Python (i.e., by passing --with-pydebug to the configure script)?
msg108544 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-24 19:07
Just to clarify something, in case you notice something does not look quite right. 

The link I provided to the build failure on the SPARC machine

http://trac.sagemath.org/sage_trac/ticket/9297

was a Sun Blade 1000. It is *not* the same machine from which I just copied  the output, which was a Sun Blade 2000. The two machines are pretty similar though - the motherboards, processors, disks, RAM are interchangeable. In fact,'uname' shows Sun-Blade-1000 in both of them. I think the only real difference between them is that the Blade 2000 looks a bit nicer, and is officially supported with faster CPUs. 

The link I provided to the failure on the Xeon machine

http://trac.sagemath.org/sage_trac/ticket/9299

is the same machine where I just posted the output. 

If you need an account on a SPARC, it will be a more modern Sun T5240 with 32 GB RAM. 

Dave
msg108546 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-24 20:02
I'll take a look at this in an hour or two. I'll restrict the testing to the Xeon machine, as it is a zillion times quicker than the old SPARCs. 

What comes to my mind, is that perhaps 'copysign' is only defined in C99.

Solaris header files are pretty strict about what gets defined and not defined depending on the mode of compilation. The compiler option -std=c99 is not being passed yet the man page for copysign on my OpenSolaris laptop (yet another system) says:

drkirkby@laptop:~$ man copysign

Mathematical Library Functions                       copysign(3M)

NAME
     copysign, copysignf, copysignl - number  manipulation  func-
     tion

SYNOPSIS
     c99 [ flag... ] file... -lm [ library... ]
     #include <math.h>

     double copysign(double x, double y);

     float copysignf(float x, float y);

     long double copysignl(long double x, long double y);

DESCRIPTION
     These functions produce a value with the magnitude of x  and
     the sign of y.
msg108550 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-24 20:54
Using the compiler option -std=c99 allows this test to pass. 

Perhaps adding the macro 

AC_PROG_CC_C99

to autoconf to add the right compiler option might be a solution. I know Solaris headers are often quite strict, and will not define something in a header file if the right things are not defined to indicate C99. 

I would add, there is quite a serious problem on Solaris with _socket failing to build. 

http://bugs.python.org/issue8852

Unless one uses that workaround, which is not committed to the python source code yet, one can not run any tests of python.
msg108552 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-24 21:30
Thanks for the update.

So I'm confused: when -std=c99 isn't given, where is the build finding the copysign function from?  That is, why isn't there a link error when building Python?

(I'm attempting to install OpenSolaris in Parallels at the moment, but it may take more time than I have available at the moment...)
msg108564 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-24 23:24
Hi Mark, 

Since 'copysign' is in the maths library, I would not expect the link phase to fail. Solaris does not ship with different maths libraries for C99 (one just links to libm). 

However, I would not be surprised if the behavior was ill defined if the compiler is not C99. Certainly header files behave differently on Solaris depending on the mode of the compiler. For example, trying to use the INFINITY macro when the compiler is not C99 seems to work on Linux, but fails on Solaris unless you force C99 mode with gcc -std=c99. 

The following bit of code gives the same results whether one uses 'gcc' or 'gcc -std=c99' on OpenSolaris or Linux. However, if one uses 'gcc -ansi' then the behavior is totally different. 

drkirkby@hawk:~$ cat cs.c
#include <stdio.h>
#include <math.h>


int main(int argc, char **argv) { 
   double x, y;

   /* Set x and y differently if a command line arguement is given. 
   This will avoid the compiler optimising the values out, as they
   will not be known in advance. */ 
   if (argc==1) { /* This will stop compiler optimising 0.0 out x */
      x=1.0; 
      y=0.0;
   } else {
      x=2.0;
      y=-0.0;
   }
   printf("copysign(%lf,%lf)=%lf\n", x, y, copysign(x, y));
}

drkirkby@hawk:~$ gcc -lm cs.c 
drkirkby@hawk:~$ ./a.out 
copysign(1.000000,0.000000)=1.000000
drkirkby@hawk:~$ ./a.out  z
copysign(2.000000,-0.000000)=-2.000000
drkirkby@hawk:~$ gcc -lm -std=c99 cs.c 
drkirkby@hawk:~$ ./a.out 
copysign(1.000000,0.000000)=1.000000
drkirkby@hawk:~$ ./a.out  z
copysign(2.000000,-0.000000)=-2.000000

Note how -ansi screws it up completely

drkirkby@hawk:~$ gcc -lm -ansi cs.c 
drkirkby@hawk:~$ ./a.out  
copysign(1.000000,0.000000)=0.000000
drkirkby@hawk:~$ ./a.out  z
copysign(2.000000,-0.000000)=0.000000

I also tried it on a Sun SPARC running a recent version of Solaris (2009 release). Again the results are the same. 

I then tried it on a Solaris box running the first release of Solaris 10 (03/2005). Then one gets even stranger behavior if one defines -ansi, where the results are almost right, but with poor rounding errors. 

drkirkby@redstart:~$ gcc -ansi -lm cs.c
drkirkby@redstart:~$ ./a.out
copysign(1.000000,0.000000)=1.000001
drkirkby@redstart:~$ ./a.out d
copysign(2.000000,-0.000000)=-2.000002

But in C99 mode, it works fine. 

drkirkby@redstart:~$ gcc -std=c99  -lm cs.c
drkirkby@redstart:~$ ./a.out
copysign(1.000000,0.000000)=1.000000
drkirkby@redstart:~$ ./a.out d
copysign(2.000000,-0.000000)=-2.000000

So I draw two conclusions. 


1) 'copysign' is in the maths library, so a program which tries to link to 'copysign' will succeed. 

2) The behavior of 'copysign' is ill defined unless the compiler is a C99 compiler. 

I don't think you should use copysign unless the compiler is C99. Trying to come up with a test for 'copysign' working is probably an impossible task, as it undefined. So you could try 99 different values of x and y and they all work, but its anyone guess what will happen with the 100th set of values. 

Dave
msg108571 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-25 00:02
Just to clarify the hostnames and hardware used, in case you look at the results here or the links to the Sage maths bug tracker and are not sure what is what. 

Note some are Solaris and some are OpenSolaris. Some have SPARC and some have Intel processors. All machines are 64-bit, but note that by default executables are created 32-bit on Solaris and OpenSolaris. 

 * hawk = Sun Ultra 27, 3.33 GHz quad core Xeon, OpenSolaris 06/2009, but updated to the latest build of OpenSolaris. 
 * laptop = Sony laptop, 2.0 GHz Intel CPU core2 duo, OpenSolaris 06/2009. 
 * swan = Sun Blade 2000, 2 x 1200 MHz SPARC processors, Solaris 10 10/2009 release (Latest release of Solaris 10 at the time I'm writing this)
 * redstart = Sun Blade 1000, 2 x 900 MHz SPARC processors, Solaris 10 03/2005 (First Solaris 10 release)

Although I've not shows the results from them, if I do show any others, likely candidates will be

 * sage = x86 Linux box (Ubunta I think) 24 cores. 
 * t2 = Sun T5240, T2+ SPARC processors, 16 cores 1167 MHz, Solaris 10 05/2009 (A recent, but not the very latest release of Solaris 10)
 * bsd = OS X box of some sort. 
 * hpbox = HP C3600 running HP-UX 11.11B, PA-RISC processors. 
 * chaffinch = Virtual machine running Solaris 10 10/2009. (Runs as a guest operating system in VirtualBox) 

Sometimes having access to different hardware can be useful, but it can get confusing if someone sees a lot of different host names! 

Dave
msg108579 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-25 07:11
Did you have a chance to try a debug build of Python and see if the problem persists there?

I'm failing to reproduce this in OpenSolaris 2009.06, running in Parallels on a MacBook Pro (non-debug 32-bit build of Python):

dickinsm@eratosthenes:~/release26-maint$ uname -a
SunOS eratosthenes 5.11 snv_111b i86pc i386 i86pc Solaris
dickinsm@eratosthenes:~/release26-maint$ cat /etc/release
                         OpenSolaris 2009.06 snv_111b X86
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                              Assembled 07 May 2009
dickinsm@eratosthenes:~/release26-maint$ ./python
Python 2.6.5+ (release26-maint:82213, Jun 25 2010, 00:52:22) 
[GCC 3.4.3 (csl-sol210-3_4-20050802)] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> float.hex(-0.0)
'-0x0.0p+0'
>>> import sys; sys.maxsize
2147483647

The most noticeable difference from the machines you describe here is the compiler.  (Did you build gcc 4.4.4 by hand on these machines, or is there a package I can download and install somewhere?)

I'd still like to understand *how* the -c99 compiler option affects copysign;  it might help inform a workaround.  The library function itself can't know how you compiled Python, surely?  Can you work out what's going on from the relevant header files?
msg108585 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-25 11:18
So perhaps the cause is simply that copysign isn't being declared for David's Python builds?  If that were the case, I'd expect to see some gcc warnings in the Python build output, something like:

    warning: implicit declaration of function `copysign'

David, are there any such warnings?

Looking at /usr/include/math.h in my OpenSolaris VM, I see (with irrelevant bits omitted):

#if defined(__EXTENSIONS__) || defined(_XOPEN_SOURCE) || \
        !defined(_STRICT_STDC) && !defined(_POSIX_C_SOURCE)

#if defined(__EXTENSIONS__) || !defined(_XOPEN_SOURCE)

extern double copysign __P((double, double));

#endif

#endif

Assuming that this is the cause, it would be interesting to know which of these defines differs between my OpenSolaris VM and David's machines.
(e.g., the 'hawk' machine, since this seems closest in spec to what I'm working with).
msg108586 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-25 11:31
David, my pyconfig.h file contains:

/* Defined on Solaris to see additional function prototypes. */
#define __EXTENSIONS__ 1

Does yours?
msg108630 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-25 21:03
Now that I've finally managed to get gcc 4.4.4 installed on OpenSolaris...

.. I'm still failing to reproduce this bug. :(


dickinsm@eratosthenes:~/release26-maint$ uname -a
SunOS eratosthenes 5.11 snv_134 i86pc i386 i86pc Solaris
dickinsm@eratosthenes:~/release26-maint$ cat /etc/release 
                       OpenSolaris Development snv_134 X86
           Copyright 2010 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 01 March 2010
dickinsm@eratosthenes:~/release26-maint$ ./python
Python 2.6.4 (release26-maint:75706, Jun 25 2010, 21:44:19) 
[GCC 4.4.4] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> float.hex(-0.0)
'-0x0.0p+0'
>>> import sys; sys.maxint
2147483647

As far as I can tell, this setup is almost identical to David's 'hawk' machine (same gcc version, same OpenSolaris build, same Python source revision).

I'm not really sure where I can go from here.

Stefan, you're not able to reproduce this by any chance, are you?
msg108639 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-06-25 22:27
Mark Dickinson <report@bugs.python.org> wrote:
> Now that I've finally managed to get gcc 4.4.4 installed on OpenSolaris...
>
> .. I'm still failing to reproduce this bug. :(
>
>
> dickinsm@eratosthenes:~/release26-maint$ uname -a
> SunOS eratosthenes 5.11 snv_134 i86pc i386 i86pc Solaris
> dickinsm@eratosthenes:~/release26-maint$ cat /etc/release
>                        OpenSolaris Development snv_134 X86
>            Copyright 2010 Sun Microsystems, Inc.  All Rights Reserved.
>                         Use is subject to license terms.
>                              Assembled 01 March 2010
> dickinsm@eratosthenes:~/release26-maint$ ./python
> Python 2.6.4 (release26-maint:75706, Jun 25 2010, 21:44:19)
> [GCC 4.4.4] on sunos5
> Type "help", "copyright", "credits" or "license" for more information.
> >>> float.hex(-0.0)
> '-0x0.0p+0'
> >>> import sys; sys.maxint
> 2147483647
>
> As far as I can tell, this setup is almost identical to David's 'hawk' machine (same gcc version, same OpenSolaris build, same Python source
+revision).
>
> I'm not really sure where I can go from here.
>
> Stefan, you're not able to reproduce this by any chance, are you?

No, I'm getting the same results as you (OpenSolaris/qemu/32-bit/gcc). I wonder
if it's somehow related to issue 7281, which was resolved by changing float_repr_style.

But as I said, I cannot reproduce it with either gcc or suncc. On the other hand,
who knows how the FPU is emulated in qemu.


The C standard is diplomatic as usual:

"On implementations that represent a signed zero but do not treat negative zero
 consistently in arithmetic operations, the copysign functions regard the sign
 of zero as positive."


David, one of your comments implied for me that you have managed to compile
Python with -std=c99. For me, this fails instantly. What options did you use?
msg108654 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-25 23:59
Hi,
I had hoped to devote more time to this, but have been able to. I will do at the weekend. 

I would add I was building 64-bit, so adding the compiler flag -m64 on 'hawk' at least some of the time. Depending on your hardware, assuming you have installed !OpenSolaris as a Virtual machine in VirtualBox, it may be a 32 or 64-bit version of OpenSolaris. You need specific instructions from the processor for a 64-bit version and Sony in their infinite wisdom have disabled it on my Vaio laptop, so whilst I can install OpenSolaris as a 64-bit host operating system, any attempt to install a 64-bit guest will fail. 

If I don't chose to compile C99, then I need to add the compiler flag -DHAVE_DECL_ISFINITE=0. 

Otherwise I see:

Objects/object.c:1036: warning: implicit declaration of function 'isinf'

Undefined			first referenced
 symbol  			    in file
isfinite                            ./libpython2.6.so
ld: fatal: symbol referencing errors. No output written to python

Again, the Solaris man page says:

Mathematical Library Functions                       isfinite(3M)

NAME
     isfinite - test for finite value

SYNOPSIS
     c99 [ flag... ] file... -lm [ library... ]
     #include <math.h>

     int isfinite(real-floating x);

implying this is a C99 function. 

This conflicting behavior could be the result of what linker or assembler is being used. On SPARC, I use Sun linker and assembler. On OpenSolaris I use  the Sun linker, but the GNU assembler. 

I would have thought it was better to test this out with small bits of test code like I posted, rather than the complete Python source code. 

It might be better if I just create you an account on 'hawk'. Drop me an email at david <dot> kirkby {at} onetel |dot| net if you want. 

I can also get you an account at the University of Washington if you want on a Sun T5240 SPARC. I've not verified the problem on that machine, but I can do so. Just drop me an email with a preferred user name and I'll sort it out. 

The SPARC is very slow - despite it being a current model of a high end server. It is designed for a different sort of task to developing software. The CPUs are pretty slow (1167 MHz) and pretty dumb, but there are 128 hardware threads. In order to get any useful performance from the T5240, the code needs to be highly parallel or have lots of processes like on busy web servers. That is what 't2' is designed for - a high end web server. 

But 'hawk' is a pretty high spec PC which I run 24/7. 

Dave
msg108690 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 08:08
> I would add I was building 64-bit, so adding the compiler flag -m64 on 'hawk' at least some of the time.

Ah;  that may be relevant.  Can you tell us exactly what command line you're using to build Python, and the values of any relavent environment variables?

> I would have thought it was better to test this out with small bits
> of test code like I posted, rather than the complete Python 
> source code. 

Unfortunately, the small bits of code don't really help: I get the same results as you for those, and I understand why those are failing:  copysign isn't declared (as you'll see if you add -Wall to your compilation line) so the compiler assumes it returns type 'int'.  This shouldn't happen with Python because its configure script defines __EXTENSIONS__, which ensures that copysign *is* declared when math.h is included.  Can you still reproduce the strange copysign results with your small examples when __EXTENSIONS__ is #define'd?

> This conflicting behavior could be the result of what linker or
> assembler is being used. On SPARC, I use Sun linker and assembler.
> On OpenSolaris I use  the Sun linker, but the GNU assembler.

I have the same setup (Sun linker, GNU assembler):

dickinsm@eratosthenes:~/release26-maint$ gcc-4.4 -v
Using built-in specs.
Target: i386-pc-solaris2.11
Configured with: ../gcc-4.4.4/configure --prefix=/usr/local --program-suffix=-4.4 --with-mpfr-include=/usr/include/mpfr --with-gmp-include=/usr/include/gmp --with-as=/usr/bin/gas --with-gnu-as --with-ld=/usr/bin/ld --without-gnu-ld --enable-shared --enable-multilib --enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.4 (GCC)
msg108694 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 08:52
David, I'm still missing some easy answers that would really help.  Please could you answer the question about whether __EXTENSIONS__ is defined in your pyconfig.h; it would help determine what we should be investigating.

There shouldn't be any need for the -std=c99 option: the Python configure script defines __EXTENSIONS__ exactly to make these c99 functions available.  So either that isn't happening on your machines, in which case we should be looking for a problem with the configure script, or it *is* happening, in which case copysign is being properly declared on your machine and we have to look elsewhere (library mismatch? compiler optimization bug?) for the cause of failure.

Still no joy with an -m64 build.  I've attached a transcript showing the precise steps I used.

[I'm tempted to close this issue as 'works for me';  I'm not seeing any test_math failure in OpenSolaris, and neither is the Solaris/SPARC buildbot.]
msg108695 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 08:56
Stefan, thanks for the feedback.

I don't *think* this is related to issue 7281.  I thought we'd determined that that issue had nothing to do with copysign itself, and everything to do with what the signbit of the NaN returned by float("nan") happens to be.
msg108698 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-06-26 09:44
Mark Dickinson <report@bugs.python.org> wrote:
> I don't *think* this is related to issue 7281. I thought we'd determined that that issue
> had nothing to do with copysign itself, and everything to do with what the signbit of the
> NaN returned by float("nan") happens to be.

Yes, that's right. Also, to avoid spreading misinformation: The copysign(1.0, float("nan"))
behavior is actually the same after changing float_repr_style.
msg108706 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 11:04
Hi,

__EXTENSIONS__ is defined to 1. 


Give me an hour, and I'll attach a log.
msg108711 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 11:28
Here's a build done the same way as you. This gives the same result as you here. 

But an attempt to run the test suite fails because of _socket. I need to patch that in order that I can run the test suite. 

See http://bugs.python.org/issue8852
msg108712 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 11:29
Here's the header file that gets created
msg108714 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 12:07
Thanks for the logs.

So if you apply the issue8852 patch, and run the test suite, does test_float still fail?
msg108718 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 12:25
Sorry, I seem to have wasted a lot of your time here. 

Python was built from a script which applied some patches - including that one that allows _socket to build. Without that (which is not commit to python and I'm told it might not be done this year), its impossible to run the test suite. 

It looks like of the patches has messed up with test_float. 

Unfortunately, if I just apply the patch at issue8852, then the test suite hangs at:

test_posix

It's used 48 minutes of CPU time on a 3.33 GHz Xeon as I write, passing most tests up to test_posix, but hanging there. 

Sorry to have wasted your time. 


BTW, are you able to run the full test suite, or does the test suite simply not run at all? 

Dave
msg108723 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 13:24
I see the same _socket build failure as you do;  but with the issue 8852 patch, I can run the test suite.

It did spend quite a while in test_posix, but the test eventually finished (and failed).  I didn't get to the end of the test run, unfortunately, because I ran out of disk space.  (4 virtual machines on a small laptop is pushing it a bit. :)

It's possible that the test_posix failure was due to running out of space, I guess.  Anyway, that's a separate issue.
msg108725 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 13:26
test_float
test test_float failed -- Traceback (most recent call last):
  File "/export/home/drkirkby/Python-2.7rc2/Lib/test/test_float.py", line 1297, in test_roundtrip
    self.identical(-x, roundtrip(-x))
  File "/export/home/drkirkby/Python-2.7rc2/Lib/test/test_float.py", line 907, in identical
    self.fail('%r not identical to %r' % (x, y))
AssertionError: -0.0 not identical to 0.0

test_fnmatch
test_fork1
msg108726 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 13:28
Sorry, I missed out the comment there. 

This is failing for me, in boht 32 and 64-bit builds with Python-2.7rc2
msg108728 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 14:56
Okay---this one I *can* reproduce. :)

I'm going to call it a gcc optimization bug.  Specifically, it seems to be a bug involving gcc's builtin version of the copysign function.

When I build a current svn trunk checkout (r82245) with:

  CC='gcc-4.4 -m64' ./configure && make

I get the wrong result:

Python 2.7rc2+ (trunk:82245, Jun 26 2010, 05:35:07) 
[GCC 4.4.4] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> (-0.0).hex()
'0x0.0p+0'

But when building with either:

    CC='gcc-4.4 -m64 -fno-builtin-copysign' ./configure && make

or

    CC='gcc-4.4 -m64' ./configure --with-pydebug && make

I get the expected results.  If I have time I'll investigate further and see if I can generate the bug from smaller code.  At any rate, I don't think this is something that can sensibly be fixed in Python itself, so I think this issue should be closed, and a bug filed upstream if necessary.

I also can't see a good reason why this bug would be specific to OpenSolaris.  Does anyone have gcc-4.4.4 available to test this on OS X, Linux or *BSD?
msg108730 - (view) Author: David Kirkby (drkirkby) Date: 2010-06-26 15:36
I'm glad you can reproduce it! 

I can understand you wanting to close it in this case. I've no problem with that. 

To me at least, it does not seem anywhere near as serious as the other problem. 

I will try it on Linux though. I have access to reasonably decent (24 core) Linux box, so I'll try it on that. But it means a lot of messing around, downloading mpir, mpfr, possibly newer versions of the GNU binutils etc. Not a 5 minute job. The weather here in the UK is too nice to site behind a computer screen!

Dave
msg108734 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 17:11
Here's some minimal failing code.


// Compile with:
// gcc-4.4 -m64 -fno-inline -g -O3 copysign_bug.c -o copysign_bug

#include <math.h>
#include <stdio.h>

int copysign_bug(double x)
{
  if (x && (x * 0.5 == x))
    return 1;
  if (copysign(1.0, x) < 0.0)
    return 2;
  else
    return 3;
}

int main(void) {
  double x;
  x = -0.0;
  printf("copysign_bug(%.17g) = %d\n", x, copysign_bug(x));

  x = 0.0;
  printf("copysign_bug(%.17g) = %d\n", x, copysign_bug(x));

  return 0;
}


This produces the output:

copysign_bug(-0) = 3
copysign_bug(0) = 3

I would expecting to see:

copysign_bug(-0) = 2
copysign_bug(0) = 3

I've reported this at:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44683
msg108735 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-06-26 17:17
Mark, gcc-4.4 on Fedora 12 is ok:

[stefan@fedora-amd64 trunk]$ ./python 
Python 2.7rc2+ (trunk:82245M, Jun 26 2010, 13:09:14) 
[GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> (-0.0).hex()
'-0x0.0p+0'



As a general remark, from what I hear on the gmp-bugs list, the
newer gcc versions often seem to have problems on Solaris.
msg108736 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-06-26 17:26
Fedora 12: copysign-bug varies wildly ((GCC) 4.4.3 20100127 (Red Hat 4.4.3-4)):

[stefan@fedora-amd64 trunk]$ gcc  -O0 copysign_bug.c -o copysign_bug
[stefan@fedora-amd64 trunk]$ ./copysign_bug 
copysign_bug(-0) = 2
copysign_bug(0) = 3

[stefan@fedora-amd64 trunk]$ gcc  -O2 copysign_bug.c -o copysign_bug
[stefan@fedora-amd64 trunk]$ ./copysign_bug 
copysign_bug(-0) = 3
copysign_bug(0) = 3

[stefan@fedora-amd64 trunk]$ gcc  -O3 copysign_bug.c -o copysign_bug
[stefan@fedora-amd64 trunk]$ ./copysign_bug 
copysign_bug(-0) = 2
copysign_bug(0) = 3

[stefan@fedora-amd64 trunk]$ gcc  -O3 -fno-inline copysign_bug.c -o copysign_bug
[stefan@fedora-amd64 trunk]$ ./copysign_bug 
copysign_bug(-0) = 3
copysign_bug(0) = 3
msg108737 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-26 17:52
Thanks Stefan.

The bug apparently exists in gcc-4.5 on OS X as well.  I'll update the gcc bug report.

newton:~ dickinsm$ gcc-mp-4.5 -fno-inline -O3 copysign_bug.c -o copysign_bug && ./copysign_bug
copysign_bug(-0) = 3
copysign_bug(0) = 3
msg108803 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-27 18:28
The gcc optimization bug was fixed (with impressive speed!) upstream.

I'm going to close this as 'won't fix'.  It's probably possible to find a workaround, but the issue is minor, apparently only affects one platform, and relates to a little-used method.  For people who want this fixed in their own Python builds, a fairly unintrusive workaround is to add '-fno-builtin-copysign' to the compiler flags, or to compile with version <4.4 of gcc until the next bugfix release of gcc 4.4 or 4.5.

David: thanks for reporting this, and for all the followup.
History
Date User Action Args
2022-04-11 14:57:02adminsetgithub: 53315
2010-06-27 18:28:47mark.dickinsonsetstatus: open -> closed
resolution: wont fix
messages: + msg108803
2010-06-26 17:52:40mark.dickinsonsetmessages: + msg108737
2010-06-26 17:26:04skrahsetmessages: + msg108736
2010-06-26 17:17:22skrahsetmessages: + msg108735
2010-06-26 17:11:34mark.dickinsonsetmessages: + msg108734
2010-06-26 15:36:13drkirkbysetmessages: + msg108730
2010-06-26 14:57:00mark.dickinsonsetmessages: + msg108728
2010-06-26 13:28:02drkirkbysetmessages: + msg108726
2010-06-26 13:26:36drkirkbysetmessages: + msg108725
2010-06-26 13:24:56mark.dickinsonsetmessages: + msg108723
2010-06-26 12:25:34drkirkbysetmessages: + msg108718
2010-06-26 12:07:37mark.dickinsonsetmessages: + msg108714
2010-06-26 11:29:58drkirkbysetfiles: + pyconfig.h

messages: + msg108712
2010-06-26 11:28:42drkirkbysetfiles: + build-with_socket-failure.txt

messages: + msg108711
2010-06-26 11:04:58drkirkbysetmessages: + msg108706
2010-06-26 09:44:44skrahsetmessages: + msg108698
2010-06-26 08:56:11mark.dickinsonsetmessages: + msg108695
2010-06-26 08:52:55mark.dickinsonsetfiles: + opensolaris_python_buildlog.txt

messages: + msg108694
2010-06-26 08:08:52mark.dickinsonsetmessages: + msg108690
2010-06-25 23:59:42drkirkbysetmessages: + msg108654
2010-06-25 22:27:30skrahsetmessages: + msg108639
2010-06-25 21:03:35mark.dickinsonsetnosy: + skrah
messages: + msg108630
2010-06-25 11:31:24mark.dickinsonsetmessages: + msg108586
2010-06-25 11:18:34mark.dickinsonsetmessages: + msg108585
2010-06-25 07:11:52mark.dickinsonsetmessages: + msg108579
2010-06-25 00:02:43drkirkbysetmessages: + msg108571
2010-06-24 23:24:00drkirkbysetmessages: + msg108564
2010-06-24 21:30:56mark.dickinsonsetmessages: + msg108552
2010-06-24 20:54:53drkirkbysetmessages: + msg108550
2010-06-24 20:02:17drkirkbysetmessages: + msg108546
2010-06-24 19:07:26drkirkbysetmessages: + msg108544
2010-06-24 19:02:56mark.dickinsonsetmessages: + msg108543
2010-06-24 18:47:46drkirkbysetmessages: + msg108542
2010-06-24 18:13:44mark.dickinsonsetmessages: + msg108538
2010-06-24 17:14:05mark.dickinsonsetmessages: + msg108534
2010-06-24 17:11:41mark.dickinsonsetmessages: + msg108533
2010-06-24 17:10:46mark.dickinsoncreate