classification
Title: test_float crashes with assertion failure on Ubuntu buildbot.
Type: crash Stage:
Components: Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: doko, eric.smith, mark.dickinson
Priority: high Keywords:

Created on 2010-07-09 21:01 by mark.dickinson, last changed 2010-07-12 19:31 by mark.dickinson. This issue is now closed.

Messages (9)
msg109796 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-09 21:01
The Ubuntu i386 buildbot was crashing in test_float in the 3.1 branch;  it looks as though _Py_dg_dtoa is producing invalid results.

I've made a couple of checkins to try to diagnose the failure (r82752 and r82754);  here's some of the resulting output from http://www.python.org/dev//buildbot/builders/i386%20Ubuntu%203.1/builds/870

test_float
Unexpected failure in format_float_short. Arguments: d = 9999, format_code = 101, mode = 2, precision = 3
digits == :
Unexpected failure in format_float_short. Arguments: d = 0.096000000000000002, format_code = 102, mode = 3, precision = 2
digits == :

':' is the ASCII character after '9', so this is a classic case of the digit '9' being rounded up to the next ASCII digit.  I don't know why this is happening on this particular buildbot and no others that I've noticed.

This machine is one where double rounding *is* typically a problem (according to its configure output), so it should be using the _Py_{set,get}_387controlword functions to control the FPU precision;  perhaps something's going wrong with this step.
msg109797 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-09 21:02
The py3k branch on the same machine seems fine, as does the release27-maint branch.
msg109798 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-09 21:22
I've just noticed that the 3.1 buildbot is compiling with -O2, while the 2.7 and 3.2 bots are using -O0;  this would explain the different results.

The possibility that this might be a compiler optimization bug makes me a little happier.

Matthias, you wouldn't happen to know what version of gcc is being used by this buildslave, would you?  The machine in question is:

http://www.python.org/dev//buildbot/buildslaves/klose-ubuntu-i386
msg109801 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-09 22:22
Reverted the temporary debugging commits in r82755, so test_float is now crashing again on that buildbot.

I've made yet another temporary commit (r82756) to get the configure script to tell me what gcc version is being used on that machine.  Once I've got that info, I can have a go at reproducing this locally.  If that fails, I'll create a branch and try to debug the buildbot remotely via repeated test runs on that branch.

... and the results from the configure script are in: it's an experimental prerelease version of gcc!  This makes me even more suspicious that it's a compiler bug.

checking gcc version... Using built-in specs.
COLLECT_GCC=/usr/lib/gcc-snapshot/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/4.6.0/lto-wrapper
Target: i686-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 20100702-0ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,java,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --disable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --with-plugin-ld=ld.gold --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.6-snap/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.6-snap --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.6-snap --with-arch-directory=i386 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=yes --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu
Thread model: posix
gcc version 4.6.0 20100702 (experimental) [trunk revision 161740] (Ubuntu 20100702-0ubuntu1)
msg109829 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010-07-10 07:26
> what version of gcc is being used by this buildslave

you already found out, but it's mentioned at the top of the test stdio. I'll update the compiler and recheck.
msg109831 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-10 07:34
On Sat, Jul 10, 2010 at 8:26 AM, Matthias Klose <report@bugs.python.org> wrote:
> you already found out, but it's mentioned at the top of the test stdio.

Ah yes, so it is.  Thank you.  I thought I remembered seeing it
somewhere in the buildbot output in the past;  I obviously didn't look
hard enough.

> I'll update the compiler and recheck.

Thank you again!
msg109906 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-10 18:37
I managed to reproduce this with Ubuntu 10.10 and the gcc-snapshot package.  I've shrunk it to the following (no floating-point in sight!):


/* file dtoa.c */

static char s0[12];

char *_Py_dg_dtoa()
{
  char *s = s0;
  *s++ = '9';
  *s++ = '9';
  while(*--s == '9')
    if (s == s0) {
      *s = '0';
      break;
    }
  ++*s;
  return s0;
}

int main(void) {
  char *s;
  s = _Py_dg_dtoa();
  return s[0] != '1';
}

/* end file dtoa.c */


dickinsm@ubuntu:~/bug$ /usr/lib/gcc-snapshot/bin/gcc -O0 dtoa.c && ./a.out && echo "Success" || echo "Failure"
Success
dickinsm@ubuntu:~/bug$ /usr/lib/gcc-snapshot/bin/gcc -O1 dtoa.c && ./a.out && echo "Success" || echo "Failure"
Failure
msg109908 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010-07-10 18:51
I updated the compiler on the buildbot to trunk 20100709, and your reduced testcase is fixed.  will update the gcc-snapshot package in maverick later.
msg110126 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-07-12 19:31
And I see that test_float is passing again.  Thanks, Matthias!
History
Date User Action Args
2010-07-12 19:31:31mark.dickinsonsetmessages: + msg110126
2010-07-10 18:51:09dokosetstatus: open -> closed
resolution: fixed
messages: + msg109908
2010-07-10 18:37:41mark.dickinsonsetmessages: + msg109906
2010-07-10 07:34:47mark.dickinsonsetmessages: + msg109831
2010-07-10 07:26:22dokosetmessages: + msg109829
2010-07-09 22:22:34mark.dickinsonsetmessages: + msg109801
2010-07-09 21:22:16mark.dickinsonsetnosy: + doko
messages: + msg109798
2010-07-09 21:02:59mark.dickinsonsetmessages: + msg109797
2010-07-09 21:01:51mark.dickinsoncreate