msg333774 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-16 16:51 |
The bug was first reported on Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=1540995
======================================================================
FAIL: test_memoryview_struct_module (test.test_buffer.TestBufferProtocol)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/builddir/build/BUILD/Python-3.6.4/Lib/test/test_buffer.py", line 2540, in test_memoryview_struct_module
self.assertEqual(m[1], nd[1])
AssertionError: -21.099998474121094 != -21.100000381469727
The problem is the conversion from C double (64-bit float) and C float (32-bit float). There are 2 implementations:
* Objects/memoryobject.c: pack_single() and unpack_single()
* Modules/_struct.c: nu_float() and np_float()
Attached ppc64_float32_bug.py is the simplified test case to trigger the bug.
The result depends on the compiler optimization level:
* gcc -O0: -21.100000381469727 == -21.100000381469727, OK
* gcc -O1: -21.099998474121094 != -21.100000381469727, ERROR
* (I guess that higher optimization level also trigger the bug)
The problem is that the pack_single() function is "miscompiled" (or "too optimized").
Adding "volatile" to PACK_SINGLE() prevents the unsafe compiler optimization and fix the issue for me: try attached pack_single_volatile.patch.
=== -O1 assembler code with the bug ===
PACK_SINGLE(ptr, d, float);
r30 = ptr
(gdb) p $vs63.v2_double
$17 = {0, -21.100000000000001}
=> 0x00000000100a1178 <pack_single+1988>: stxsspx vs63,0,r30
(gdb) p /x (*ptr)@4
$10 = {0xcc, 0xcc, 0xa8, 0xc1}
The first byte is 0xcc: WRONG.
=== -O1 assembler code without the bug (volatile) ===
r30 = ptr
(gdb) p $f31
$1 = -21.100000000000001
=> 0x00000000100a11e4 <pack_single+2096>: frsp f31,f31
(gdb) p $f31
$2 = -21.100000381469727
0x00000000100a11e8 <pack_single+2100>: stfs f31,152(r1)
0x00000000100a11ec <pack_single+2104>: lwz r9,152(r1)
(gdb) p /x $r9
$8 = 0xc1a8cccd
0x00000000100a11f0 <pack_single+2108>: stw r9,0(r30)
(gdb) p /x (*ptr)@4
$9 = {0xcd, 0xcc, 0xa8, 0xc1}
0x00000000100a11f4 <pack_single+2112>: li r3,0
0x00000000100a11f8 <pack_single+2116>: lfd f31,216(r1)
0x00000000100a11fc <pack_single+2120>: ld r30,200(r1)
The first byte is 0xcd: GOOD.
|
msg333777 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-16 17:12 |
This is a performance sensitive function, so I prefer not to add
volatile. MSVC also had a bug with that function, but only in PGO
mode. Microsoft has fixed the issue long ago.
Do newer gcc versions have this issue?
I'm fine with wrapping the entire macro in an #ifdef for __ppc64__.
Who is using this platform? IBM Power9?
|
msg333778 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-16 17:18 |
> This is a performance sensitive function, so I prefer not to add
volatile.
I don't suggest to use volatile. I'm just pointing to the function causing the bug and I said that volatile worked for me :-)
I made my tests on:
* Red Hat Enterprise Linux release 8.0 Beta (Ootpa)
* gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3)
* Linux kernel 4.18.0-56.el8.ppc64le
* uname -p: ppc64le
|
msg333779 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-16 17:36 |
Does it also work with -fno-inline, just for ppc64? There are other
places in the sources where an inlined memcpy() could be miscompiled.
|
msg333782 - (view) |
Author: Charalampos Stratakis (cstratak) * |
Date: 2019-01-16 18:33 |
Possibly relevant:
https://fedoraproject.org/wiki/Changes/PPC64LE_Float128_Transition#Detailed_Description
But the work is not complete.
|
msg333847 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-17 10:40 |
If gcc-8.0.1-0.14.fc28.ppc64le miscompiles memcpy(), perhaps the upstream
priority in https://bugzilla.redhat.com/show_bug.cgi?id=1540995 should be
"release blocker".
CC David Edelsohn, whose PPC64 buildbot (presumably big endian) works.
|
msg333862 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-17 12:39 |
Extract of memoryview pack_signal() function:
#define PACK_SINGLE(ptr, src, type) \
do { \
type x; \
x = (type)src; \
memcpy(ptr, (char *)&x, sizeof x); \
} while (0)
/* floats */
case 'f': case 'd':
d = PyFloat_AsDouble(item);
if (d == -1.0 && PyErr_Occurred())
goto err_occurred;
if (fmt[0] == 'f') {
PACK_SINGLE(ptr, d, float); // ##### BUG IS HERE ###
}
else {
PACK_SINGLE(ptr, d, double);
}
break;
Pseudo-code:
double d = PyFloat_AsDouble(item);
float x;
x = (float)d;
memcpy(ptr, (char *)&x, sizeo x);
|
msg333866 - (view) |
Author: Florian Weimer (fweimer) |
Date: 2019-01-17 13:27 |
I believe this is a GCC bug, and filed <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88892>.
|
msg333876 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-17 14:36 |
> If gcc-8.0.1-0.14.fc28.ppc64le miscompiles memcpy(), perhaps the upstream priority in https://bugzilla.redhat.com/show_bug.cgi?id=1540995 should be "release blocker".
With the bug, memoryview doesn't round properly float: rounds to zero rather than rounding to nearest. It's not a critical crash or critical security vulnerability :-)
Anyway, I wrote a fix: PR 11593. My fix is to workaround the GCC compiler bug using volatile but only on ppc64. My change only impacts the 'f' type on ppc64 when compiled by GCC. Other architectures, compilers and types are unaffected and so have no impact on performances.
|
msg333877 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-17 14:44 |
The bug occurs when GCC emits a single instruction (stxsspx) for cast + memcpy.
I understand that the struct module is not affected because there is a condition branch (if) between the cast ("float x = ...") and the memcpy().
static int
np_float(char *p, PyObject *v, const formatdef *f)
{
float x = (float)PyFloat_AsDouble(v);
if (x == -1 && PyErr_Occurred()) {
PyErr_SetString(StructError,
"required argument is not a float");
return -1;
}
memcpy(p, (char *)&x, sizeof x);
return 0;
}
|
msg333878 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-17 14:59 |
Okay, so it's not a severe bug. That leaves us with the question
what to do about it. As I said above, other call sites could be
affected, too:
_struct.c:
static int
np_float(char *p, PyObject *v, const formatdef *f)
{
float x = (float)PyFloat_AsDouble(v);
if (x == -1 && PyErr_Occurred()) {
PyErr_SetString(StructError,
"required argument is not a float");
return -1;
}
memcpy(p, (char *)&x, sizeof x);
return 0;
}
Even if this one isn't (I can't test) I think I'd prefer a more
global fix for the affected gcc version. Does any switch like
-fno-inline or something else work? If so, we could detect the
issue in configure.ac.
|
msg333879 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-17 15:00 |
Our mails apparently crossed. :-)
|
msg333880 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-17 15:07 |
Or to put it differently, should we put a specific fix for a single
gcc version into memoryview.c? For all we know, a fix will be pushed
to Fedora 28 in the next 4 months.
|
msg333881 - (view) |
Author: Florian Weimer (fweimer) |
Date: 2019-01-17 15:15 |
We don't know yet if the GCC bug is specific to POWER. That depends on what causes it. Other targets my have double-to-float conversion instructions which hard-code the wrong rounding mode.
|
msg333882 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-17 15:20 |
A colleague working on clang asked me to test clang: no, clang doesn't have the bug. test_buffer pass as expected with clang -O3.
Machine code of the cast + memcpy:
(gdb) p $f31
$1 = -21.100000000000001
(gdb) disassemble $pc,$pc+40
=> 0x0000000010078fbc <pack_single+380>: frsp f0,f31
0x0000000010078fc0 <pack_single+384>: li r3,0
0x0000000010078fc4 <pack_single+388>: stfsx f0,0,r29
(gdb) stepi
0x0000000010078fc0 1824 PACK_SINGLE(ptr, d, float);
(gdb) p $f0
$3 = -21.100000381469727
(gdb) stepi
0x0000000010078fc4 1824 PACK_SINGLE(ptr, d, float);
(gdb) stepi
0x0000000010078fc8 1824 PACK_SINGLE(ptr, d, float);
(gdb) p /x (*ptr)@4
$8 = {0xcd, 0xcc, 0xa8, 0xc1}
The first byte is 0xcd: GOOD.
Florian explained in the GCC bug report that "frsp" is needed and clang emits it.
"This is incorrect because stfs rounds to zero. An frsp instruction is missing before the stfs (and would be emitted without the memcpy)."
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88892#c0
|
msg333884 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-17 15:26 |
"Or to put it differently, should we put a specific fix for a single
gcc version into memoryview.c? For all we know, a fix will be pushed
to Fedora 28 in the next 4 months."
Right now, test_buffer is skipped in the python3.spec of Fedora (recipe to build the package which compiles Python and then runs the Python test suite) on pp64le because of this bug. With my packager hat: I'm done, the bug has been identified and is already worked around in the package.
With my Python hat: I'm not sure if I'm comfortable to have a known rounding issue on memoryview because of a compiler bug. I'm not sure that the GCC bug will be fixed quickly, nor if all operating systems will quickly update GCC or backport the fix.
Well, I wrote a fix, but I don't feel able to decide if the workaround is worth it or not.
For me, the main risk is to forget to remove the workaround once a new GCC version will be released (in N months).
I wouldn't bet that such GCC bug will be fixed and released in less than 4 months.
|
msg333886 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-01-17 15:44 |
> For me, the main risk is to forget to remove the workaround once a new GCC version will be released (in N months).
This is why I suggested using the reproducer in configure.ac, setting something
like HAVE_GCC_MEMCPY_ROUNDING_BUG and then either a) wrap the *entire* PACK_SINGLE()
macro in an #ifdef or b) *preferably* add a global flag like -fno-inline or any
other flag that prevents the issue to CFLAGS.
b) under the condition that any such flag exists.
I may do a) in the weekend, which also addresses Florian Weimer's observation
that it is not known whether the issue is limited to POWER.
|
msg333887 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-01-17 16:31 |
Would it help if define PACK_SINGLE as below?
#define PACK_SINGLE(ptr, src, type) \
do { \
union { \
type value; \
unsigned char raw[sizeof(type)]; \
} x; \
x.value = (type)src; \
memcpy(ptr, x.raw, sizeof(type)); \
} while (0)
|
msg333888 - (view) |
Author: Florian Weimer (fweimer) |
Date: 2019-01-17 17:08 |
No, GCC will optimize away the union.
|
msg334582 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-01-30 17:41 |
The GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88892 has been fixed in the development branch ("trunk"):
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=268083
I requested a fix for GCC 8.2.
|
msg338383 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-03-19 16:41 |
The root issue was a bug in GCC which has been fixed. Fedora Rawhide got the new fixed GCC and so I close the issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:10 | admin | set | github: 79933 |
2019-03-19 16:41:30 | vstinner | set | status: open -> closed resolution: third party messages:
+ msg338383
stage: patch review -> resolved |
2019-01-30 17:41:11 | vstinner | set | messages:
+ msg334582 |
2019-01-18 20:54:12 | skrah | set | assignee: skrah |
2019-01-17 17:08:37 | fweimer | set | messages:
+ msg333888 |
2019-01-17 16:31:37 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg333887
|
2019-01-17 15:44:57 | skrah | set | messages:
+ msg333886 |
2019-01-17 15:26:23 | vstinner | set | messages:
+ msg333884 |
2019-01-17 15:20:47 | vstinner | set | messages:
+ msg333882 |
2019-01-17 15:15:49 | fweimer | set | messages:
+ msg333881 |
2019-01-17 15:07:55 | skrah | set | messages:
+ msg333880 |
2019-01-17 15:00:28 | skrah | set | messages:
+ msg333879 |
2019-01-17 14:59:12 | skrah | set | messages:
+ msg333878 |
2019-01-17 14:44:14 | vstinner | set | messages:
+ msg333877 |
2019-01-17 14:36:57 | vstinner | set | messages:
+ msg333876 |
2019-01-17 14:28:14 | vstinner | set | stage: patch review pull_requests:
+ pull_request11293 |
2019-01-17 14:28:04 | vstinner | set | stage: (no value) pull_requests:
+ pull_request11292 |
2019-01-17 14:27:54 | vstinner | set | stage: (no value) pull_requests:
+ pull_request11291 |
2019-01-17 13:27:44 | fweimer | set | messages:
+ msg333866 |
2019-01-17 12:43:38 | fweimer | set | nosy:
+ fweimer
|
2019-01-17 12:39:01 | vstinner | set | messages:
+ msg333862 |
2019-01-17 10:40:52 | skrah | set | nosy:
+ David.Edelsohn messages:
+ msg333847
|
2019-01-16 18:33:58 | cstratak | set | messages:
+ msg333782 |
2019-01-16 17:36:33 | skrah | set | messages:
+ msg333779 |
2019-01-16 17:18:12 | vstinner | set | messages:
+ msg333778 |
2019-01-16 17:12:19 | skrah | set | messages:
+ msg333777 |
2019-01-16 17:06:30 | cstratak | set | nosy:
+ cstratak
|
2019-01-16 16:51:34 | vstinner | set | files:
+ pack_single.patch keywords:
+ patch |
2019-01-16 16:51:18 | vstinner | create | |