classification
Title: test_ctypes failing on Linux SPARC64
Type: behavior Stage:
Components: ctypes Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: kelledin-3, martin.panter
Priority: normal Keywords:

Created on 2018-09-22 17:56 by kelledin-3, last changed 2018-09-30 18:22 by kelledin-3.

Messages (5)
msg326102 - (view) Author: Frank Schaefer (kelledin-3) Date: 2018-09-22 17:56
Python 3.6.6 on Linux 4.16.18 SPARC64 fails test_ctypes.  Specifically, it appears to be due to the _testfunc_large_struct_update_value() or _testfunc_reg_struct_update_value():

0:00:44 load avg: 46.24 [137/407/1] test_ctypes failed -- running: test_socket (44 sec), test_subprocess (35 sec), test_venv (43 sec), test_normalization (43 sec), test_signal (43 sec), test_multiprocessing_spawn (43 sec), test_concurrent_futures (43 sec), test_email (34 sec), test_cmd_line_script (43 sec), test_tools (43 sec), test_pickletools (34 sec), test_zipfile (30 sec), test_multiprocessing_fork (33 sec), test_pyclbr (31 sec), test_math (42 sec), test_calendar (35 sec), test_datetime (33 sec), test_distutils (30 sec)
test test_ctypes failed -- Traceback (most recent call last):
  File "/usr/src/dist/new/Python-3.6.6/Lib/ctypes/test/test_structures.py", line 416, in test_pass_by_value
    self.assertEqual(s.first, 0xdeadbeef)
AssertionError: 195948557 != 3735928559

Obviously, the "0xbadf00d" field setting is propagating back up through something that's supposed to be passed-by-value, and the test case quite rightly picks up on it.  I suspect this bug exists in 2.7.15 as well (2.7 just doesn't have the testcase to catch it).
 
This is built with gcc-8.2.0, glibc-2.27, kernel 4.16.18, CFLAGS="-O1 -mcpu=v9 -mtune=v9".  (FYI I had to turn down optimization to resolve another test failure, hence the "-O1".)

I'm guessing SPARC64 calling conventions are still passing certain large values by reference, and libffi isn't dealing with this?  I'm still investigating.  I've tried it with and without --with-system-libffi, with no difference (my system libffi is 3.2.1).
msg326504 - (view) Author: Frank Schaefer (kelledin-3) Date: 2018-09-26 20:41
Further details:

I cloned libffi from a few days ago to see if I had any different behavior.  So far the test fails the same way with the updated libffi.

I'll also see about contacting libffi upstream and see what they can suggest here.
msg326636 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-09-28 13:39
Seems to be a common theme on various 64-bit ABIs. There is already a fix for Python’s Windows copy of the FFI library (Issue 29565), and a “hack” for Arm and x86 Windows (again!): Issue 30353.
msg326656 - (view) Author: Frank Schaefer (kelledin-3) Date: 2018-09-28 20:01
FYI the libffi bug report is open here:

https://github.com/libffi/libffi/issues/451

As noted in the bug report, this issue actually doesn't appear to impact ARM64 (or ARM32 GNUEABI/GNUEABIHF).
msg326740 - (view) Author: Frank Schaefer (kelledin-3) Date: 2018-09-30 18:22
Well, after perusing the ctypes callproc.c code, I found the hacks referenced by martin.panter and tried activating them with some SPARC64 #ifdefs:

--- python3.6-3.6.6.orig/Modules/_ctypes/callproc.c
+++ python3.6-3.6.6/Modules/_ctypes/callproc.c
@@ -1041,6 +1041,7 @@ GetComError(HRESULT errcode, GUID *riid,
 #endif
 
 #if (defined(__x86_64__) && (defined(__MINGW64__) || defined(__CYGWIN__))) || \
+    (defined(__sparc_v9__) || (defined(__sparc__) && defined(__arch64__))) || \
     defined(__aarch64__)
 #define CTYPES_PASS_BY_REF_HACK
 #define POW2(x) (((x & ~(x - 1)) == x) ? x : 0)


This is based on #ifdef checks in libffi, but somewhat more generalized.  The good news is, this appears to resolve all test_ctypes failures.  So I'm guessing this is necessary on Linux/SPARC64, though I can't tell if it's necessary for Solaris/SPARC64.  I don't even know what built-in compiler defines get turned on for Solaris, though someone else might.

It might also be advisable to backport this to Python 2.7, but obviously we should also backport the additional ctypes tests if we do that.

My biggest concern is, do these hacks have a purely performance-centric impact, or do they potentially degrade functionality as well?
History
Date User Action Args
2018-09-30 18:22:03kelledin-3setmessages: + msg326740
2018-09-28 20:01:31kelledin-3setmessages: + msg326656
2018-09-28 13:39:09martin.pantersetnosy: + martin.panter
messages: + msg326636
2018-09-26 20:41:56kelledin-3setmessages: + msg326504
2018-09-22 17:56:12kelledin-3create