This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_buffer fails on ppc64le: memoryview pack_single() is miscompiled
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: skrah Nosy List: David.Edelsohn, cstratak, fweimer, mark.dickinson, serhiy.storchaka, skrah, vstinner
Priority: normal Keywords: patch

Created on 2019-01-16 16:51 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
ppc64_float32_bug.py vstinner, 2019-01-16 16:51
pack_single.patch vstinner, 2019-01-16 16:51
Pull Requests
URL Status Linked Edit
PR 11593 closed vstinner, 2019-01-17 14:27
PR 11593 closed vstinner, 2019-01-17 14:27
PR 11593 closed vstinner, 2019-01-17 14:27
Messages (21)
msg333774 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-16 16:51
The bug was first reported on Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=1540995

======================================================================
FAIL: test_memoryview_struct_module (test.test_buffer.TestBufferProtocol)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/Python-3.6.4/Lib/test/test_buffer.py", line 2540, in test_memoryview_struct_module
    self.assertEqual(m[1], nd[1])
AssertionError: -21.099998474121094 != -21.100000381469727

The problem is the conversion from C double (64-bit float) and C float (32-bit float). There are 2 implementations:

* Objects/memoryobject.c: pack_single() and unpack_single()
* Modules/_struct.c: nu_float() and np_float()

Attached ppc64_float32_bug.py is the simplified test case to trigger the bug.

The result depends on the compiler optimization level:

* gcc -O0: -21.100000381469727 == -21.100000381469727, OK
* gcc -O1: -21.099998474121094 != -21.100000381469727, ERROR
* (I guess that higher optimization level also trigger the bug)

The problem is that the pack_single() function is "miscompiled" (or "too optimized").

Adding "volatile" to PACK_SINGLE() prevents the unsafe compiler optimization and fix the issue for me: try attached pack_single_volatile.patch.


=== -O1 assembler code with the bug ===

PACK_SINGLE(ptr, d, float);


r30 = ptr
(gdb) p $vs63.v2_double
$17 = {0, -21.100000000000001}

=> 0x00000000100a1178 <pack_single+1988>:       stxsspx vs63,0,r30

(gdb) p /x (*ptr)@4
$10 = {0xcc, 0xcc, 0xa8, 0xc1}

The first byte is 0xcc: WRONG.


=== -O1 assembler code without the bug (volatile) ===

r30 = ptr

(gdb) p $f31
$1 = -21.100000000000001

=> 0x00000000100a11e4 <pack_single+2096>:       frsp    f31,f31

(gdb) p $f31
$2 = -21.100000381469727

   0x00000000100a11e8 <pack_single+2100>:       stfs    f31,152(r1)
   0x00000000100a11ec <pack_single+2104>:       lwz     r9,152(r1)

(gdb) p /x $r9
$8 = 0xc1a8cccd

   0x00000000100a11f0 <pack_single+2108>:       stw     r9,0(r30)

(gdb) p /x (*ptr)@4
$9 = {0xcd, 0xcc, 0xa8, 0xc1}

   0x00000000100a11f4 <pack_single+2112>:       li      r3,0
   0x00000000100a11f8 <pack_single+2116>:       lfd     f31,216(r1)
   0x00000000100a11fc <pack_single+2120>:       ld      r30,200(r1)

The first byte is 0xcd: GOOD.
msg333777 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-16 17:12
This is a performance sensitive function, so I prefer not to add
volatile. MSVC also had a bug with that function, but only in PGO
mode. Microsoft has fixed the issue long ago.

Do newer gcc versions have this issue?

I'm fine with wrapping the entire macro in an #ifdef for __ppc64__.

Who is using this platform? IBM Power9?
msg333778 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-16 17:18
> This is a performance sensitive function, so I prefer not to add
volatile.

I don't suggest to use volatile. I'm just pointing to the function causing the bug and I said that volatile worked for me :-)

I made my tests on:

* Red Hat Enterprise Linux release 8.0 Beta (Ootpa)
* gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3)
* Linux kernel 4.18.0-56.el8.ppc64le
* uname -p: ppc64le
msg333779 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-16 17:36
Does it also work with -fno-inline, just for ppc64? There are other
places in the sources where an inlined memcpy() could be miscompiled.
msg333782 - (view) Author: Charalampos Stratakis (cstratak) * Date: 2019-01-16 18:33
Possibly relevant:

https://fedoraproject.org/wiki/Changes/PPC64LE_Float128_Transition#Detailed_Description

But the work is not complete.
msg333847 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-17 10:40
If gcc-8.0.1-0.14.fc28.ppc64le miscompiles memcpy(), perhaps the upstream
priority in https://bugzilla.redhat.com/show_bug.cgi?id=1540995 should be
"release blocker".


CC David Edelsohn, whose PPC64 buildbot (presumably big endian) works.
msg333862 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-17 12:39
Extract of memoryview pack_signal() function:

#define PACK_SINGLE(ptr, src, type) \
    do {                                     \
        type x;                              \
        x = (type)src;                       \
        memcpy(ptr, (char *)&x, sizeof x);   \
    } while (0)


    /* floats */
    case 'f': case 'd':
        d = PyFloat_AsDouble(item);
        if (d == -1.0 && PyErr_Occurred())
            goto err_occurred;
        if (fmt[0] == 'f') {
            PACK_SINGLE(ptr, d, float);  // ##### BUG IS HERE ###
        }
        else {
            PACK_SINGLE(ptr, d, double);
        }
        break;

Pseudo-code:

double d = PyFloat_AsDouble(item);
float x;
x = (float)d;
memcpy(ptr, (char *)&x, sizeo x);
msg333866 - (view) Author: Florian Weimer (fweimer) Date: 2019-01-17 13:27
I believe this is a GCC bug, and filed <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88892>.
msg333876 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-17 14:36
> If gcc-8.0.1-0.14.fc28.ppc64le miscompiles memcpy(), perhaps the upstream priority in https://bugzilla.redhat.com/show_bug.cgi?id=1540995 should be "release blocker".

With the bug, memoryview doesn't round properly float: rounds to zero rather than rounding to nearest. It's not a critical crash or critical security vulnerability :-)

Anyway, I wrote a fix: PR 11593. My fix is to workaround the GCC compiler bug using volatile but only on ppc64. My change only impacts the 'f' type on ppc64 when compiled by GCC. Other architectures, compilers and types are unaffected and so have no impact on performances.
msg333877 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-17 14:44
The bug occurs when GCC emits a single instruction (stxsspx) for cast + memcpy.

I understand that the struct module is not affected because there is a condition branch (if) between the cast ("float x = ...") and the memcpy().

static int
np_float(char *p, PyObject *v, const formatdef *f)
{
    float x = (float)PyFloat_AsDouble(v);
    if (x == -1 && PyErr_Occurred()) {
        PyErr_SetString(StructError,
                        "required argument is not a float");
        return -1;
    }
    memcpy(p, (char *)&x, sizeof x);
    return 0;
}
msg333878 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-17 14:59
Okay, so it's not a severe bug. That leaves us with the question
what to do about it. As I said above, other call sites could be
affected, too:

_struct.c:

static int
np_float(char *p, PyObject *v, const formatdef *f)
{
    float x = (float)PyFloat_AsDouble(v);
    if (x == -1 && PyErr_Occurred()) {
        PyErr_SetString(StructError,
                        "required argument is not a float");
        return -1;
    }
    memcpy(p, (char *)&x, sizeof x);
    return 0;
}


Even if this one isn't (I can't test) I think I'd prefer a more
global fix for the affected gcc version. Does any switch like
-fno-inline or something else work? If so, we could detect the
issue in configure.ac.
msg333879 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-17 15:00
Our mails apparently crossed. :-)
msg333880 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-17 15:07
Or to put it differently, should we put a specific fix for a single
gcc version into memoryview.c? For all we know, a fix will be pushed
to Fedora 28 in the next 4 months.
msg333881 - (view) Author: Florian Weimer (fweimer) Date: 2019-01-17 15:15
We don't know yet if the GCC bug is specific to POWER.  That depends on what causes it.  Other targets my have double-to-float conversion instructions which hard-code the wrong rounding mode.
msg333882 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-17 15:20
A colleague working on clang asked me to test clang: no, clang doesn't have the bug. test_buffer pass as expected with clang -O3.

Machine code of the cast + memcpy:

(gdb) p $f31
$1 = -21.100000000000001

(gdb) disassemble $pc,$pc+40
=> 0x0000000010078fbc <pack_single+380>:	frsp    f0,f31
   0x0000000010078fc0 <pack_single+384>:	li      r3,0
   0x0000000010078fc4 <pack_single+388>:	stfsx   f0,0,r29

(gdb) stepi
0x0000000010078fc0	1824	            PACK_SINGLE(ptr, d, float);

(gdb) p $f0
$3 = -21.100000381469727

(gdb) stepi
0x0000000010078fc4	1824	            PACK_SINGLE(ptr, d, float);
(gdb) stepi
0x0000000010078fc8	1824	            PACK_SINGLE(ptr, d, float);

(gdb) p /x (*ptr)@4
$8 = {0xcd, 0xcc, 0xa8, 0xc1}

The first byte is 0xcd: GOOD.

Florian explained in the GCC bug report that "frsp" is needed and clang emits it.

"This is incorrect because stfs rounds to zero.  An frsp instruction is missing before the stfs (and would be emitted without the memcpy)."
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88892#c0
msg333884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-17 15:26
"Or to put it differently, should we put a specific fix for a single
gcc version into memoryview.c? For all we know, a fix will be pushed
to Fedora 28 in the next 4 months."

Right now, test_buffer is skipped in the python3.spec of Fedora (recipe to build the package which compiles Python and then runs the Python test suite) on pp64le because of this bug. With my packager hat: I'm done, the bug has been identified and is already worked around in the package.

With my Python hat: I'm not sure if I'm comfortable to have a known rounding issue on memoryview because of a compiler bug. I'm not sure that the GCC bug will be fixed quickly, nor if all operating systems will quickly update GCC or backport the fix.

Well, I wrote a fix, but I don't feel able to decide if the workaround is worth it or not.

For me, the main risk is to forget to remove the workaround once a new GCC version will be released (in N months).

I wouldn't bet that such GCC bug will be fixed and released in less than 4 months.
msg333886 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-17 15:44
> For me, the main risk is to forget to remove the workaround once a new GCC version will be released (in N months).

This is why I suggested using the reproducer in configure.ac, setting something
like HAVE_GCC_MEMCPY_ROUNDING_BUG and then either a) wrap the *entire* PACK_SINGLE()
macro in an #ifdef or b) *preferably* add a global flag like -fno-inline or any
other flag that prevents the issue to CFLAGS.

b) under the condition that any such flag exists.


I may do a) in the weekend, which also addresses Florian Weimer's observation
that it is not known whether the issue is limited to POWER.
msg333887 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-17 16:31
Would it help if define PACK_SINGLE as below?

#define PACK_SINGLE(ptr, src, type) \
    do {                                     \
        union {                              \
            type value;                      \
            unsigned char raw[sizeof(type)]; \
        } x;                                 \
        x.value = (type)src;                 \
        memcpy(ptr, x.raw, sizeof(type));    \
    } while (0)
msg333888 - (view) Author: Florian Weimer (fweimer) Date: 2019-01-17 17:08
No, GCC will optimize away the union.
msg334582 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-30 17:41
The GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88892 has been fixed in the development branch ("trunk"):
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=268083

I requested a fix for GCC 8.2.
msg338383 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-19 16:41
The root issue was a bug in GCC which has been fixed. Fedora Rawhide got the new fixed GCC and so I close the issue.
History
Date User Action Args
2022-04-11 14:59:10adminsetgithub: 79933
2019-03-19 16:41:30vstinnersetstatus: open -> closed
resolution: third party
messages: + msg338383

stage: patch review -> resolved
2019-01-30 17:41:11vstinnersetmessages: + msg334582
2019-01-18 20:54:12skrahsetassignee: skrah
2019-01-17 17:08:37fweimersetmessages: + msg333888
2019-01-17 16:31:37serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg333887
2019-01-17 15:44:57skrahsetmessages: + msg333886
2019-01-17 15:26:23vstinnersetmessages: + msg333884
2019-01-17 15:20:47vstinnersetmessages: + msg333882
2019-01-17 15:15:49fweimersetmessages: + msg333881
2019-01-17 15:07:55skrahsetmessages: + msg333880
2019-01-17 15:00:28skrahsetmessages: + msg333879
2019-01-17 14:59:12skrahsetmessages: + msg333878
2019-01-17 14:44:14vstinnersetmessages: + msg333877
2019-01-17 14:36:57vstinnersetmessages: + msg333876
2019-01-17 14:28:14vstinnersetstage: patch review
pull_requests: + pull_request11293
2019-01-17 14:28:04vstinnersetstage: (no value)
pull_requests: + pull_request11292
2019-01-17 14:27:54vstinnersetstage: (no value)
pull_requests: + pull_request11291
2019-01-17 13:27:44fweimersetmessages: + msg333866
2019-01-17 12:43:38fweimersetnosy: + fweimer
2019-01-17 12:39:01vstinnersetmessages: + msg333862
2019-01-17 10:40:52skrahsetnosy: + David.Edelsohn
messages: + msg333847
2019-01-16 18:33:58cstrataksetmessages: + msg333782
2019-01-16 17:36:33skrahsetmessages: + msg333779
2019-01-16 17:18:12vstinnersetmessages: + msg333778
2019-01-16 17:12:19skrahsetmessages: + msg333777
2019-01-16 17:06:30cstrataksetnosy: + cstratak
2019-01-16 16:51:34vstinnersetfiles: + pack_single.patch
keywords: + patch
2019-01-16 16:51:18vstinnercreate