This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Issue with ctypes in AIX
Type: behavior Stage:
Components: ctypes Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ayappan, BTaskaya, David.Edelsohn, Michael.Felt, T.Rex, amaury.forgeotdarc, belopolsky, eryksun, meador.inge, ronaldoussoren, sanket, vinay.sajip
Priority: normal Keywords:

Created on 2019-10-29 09:17 by Ayappan, last changed 2022-04-11 14:59 by admin.

Messages (33)
msg355632 - (view) Author: (Ayappan) Date: 2019-10-29 09:17
There seems to be a behavioral issue with ctypes in AIX. 
The specific symptom is that passing structures containing arrays to a C function by value appears to be broken.
Consider the below program.,

#!/usr/bin/env python3

from ctypes import *

libc = CDLL('libc.a(shr_64.o)')


class MemchrArgsHack(Structure):
    _fields_ = [("s", c_char_p), ("c", c_ulong), ("n", c_ulong)]


memchr_args_hack = MemchrArgsHack()
memchr_args_hack.s = b"abcdef"
memchr_args_hack.c = ord('d')
memchr_args_hack.n = 7


class MemchrArgsHack2(Structure):
    _fields_ = [("s", c_char_p), ("c_n", c_ulong * 2)]


memchr_args_hack2 = MemchrArgsHack2()
memchr_args_hack2.s = b"abcdef"
memchr_args_hack2.c_n[0] = ord('d')
memchr_args_hack2.c_n[1] = 7

print(
    CFUNCTYPE(c_char_p, c_char_p, c_uint, c_ulong,
              c_void_p)(('memchr', libc))(b"abcdef", c_uint(ord('d')),
                                          c_ulong(7), None))
print(
    CFUNCTYPE(c_char_p, MemchrArgsHack,
              c_void_p)(('memchr', libc))(memchr_args_hack, None))
print(
    CFUNCTYPE(c_char_p, MemchrArgsHack2,
              c_void_p)(('memchr', libc))(memchr_args_hack2, None))

This one uses memchr from the C library and passing it structures that would map to the registers that correspond to the arguments of memchr. This works for the first structure type in the reproducer; however, using the second structure type (which should be treated the same way) does not work.

In the failing case, the last register that should be used for the structure is not populated from the structure. The reproducer passes an extra argument so that the register is instead populated from that argument for the failing case (instead of working because the register still contains the correct value from a previous call).

The output should be the same for all three calls, but we get:
b'def'
b'def'
None

The last line is None, because memchr got 0 (the None passed on the call) for its length argument.
msg356588 - (view) Author: (Ayappan) Date: 2019-11-14 09:55
Any update on this ?
msg361324 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-02-03 19:43
How was Python compiled?  With GCC? Which version of GCC?
I assume that Python was built as a 64 bit application based on libc loading the 64 bit member shr_64.o.

Does the testcase work in 32 bit mode?
Does the testcase work if Python is compiled by XLC?

This likely is an incompatibility in libffi with libffi loading the registers incorrectly for the call into libc.a(shr_64.o).

It seems rather fragile to pass a struct that is supposed to have the same parameter layout as the function signature.
msg361374 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-02-04 21:44
Is this a legal use of Python ctypes?  I don't see anything in the Python documentation that one can call a ctypes function with an argument list that does not match the function signature and expect it to work.  Maybe this works on x86 by accident, but that doesn't mean that it is guaranteed to work everywhere and is permitted.
msg361433 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-02-05 14:58
The bug report implies a different bug than what is being reported.  The bug is not related to calling a LIBC function with an argument list that does not match the function signature.

The true issue is that a Python ctypes structure definition on AIX that contains an array as in the example does not create an argument list that matches the AIX ABI for argument passing.  An example that directly uses libffi seems to work, but invoking libffi via Python ctypes does not.

In other words, Python ctypes structures created with _fields_ equivalent to

struct {
  const char *s;
  unsigned long d;
  size_t n;
}

should produce the same argument list as

struct {
  const char *s;
  unsigned long c_n[2];
}

but the version with the array does not.

libffi passes arrays as pointers, so Python ctypes converts arrays passed by value as libffi structs.  This occurs in cpython/Modules/_ctypes/stgdict.c .  It is likely that ctypes is not generating the correct libffi descriptor.

The memchr example visually demonstrates the incorrect argument list, but is not intended to be correct, safe use of ctypes.
msg374173 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 13:13
On Fedora32/PPC64LE (5.7.9-200.fc32.ppc64le), with little change:
  libc = CDLL('/usr/lib64/libc.so.6')
I get the correct answer:
b'def'
b'def'
b'def'
# python3 --version
Python 3.8.3
libffi : 3.1-24


On Fedora32/x86_64 (5.7.9-200.fc32.x86_64), with a little change:
  libc = CDLL('/usr/lib64/libc-2.31.so')
that crashes:
b'def'
Segmentation fault (core dumped)
# python3 --version
Python 3.8.3
libffi : 3.1-24


AIX : libffi-3.2.1
On AIX 7.2, with Python 3.8.5 compiled with XLC v13, in 64bit:
b'def'
b'def'
None
On AIX 7.2, with Python 3.8.5 compiled with GCC 8.4, in 64bit:
b'def'
b'def'
None

On AIX 7.2, with Python 3.8.5 compiled with XLC v13, in 32bit:
  ( libc = CDLL('libc.a(shr.o)') )
b'def'
b'def'
b'def'
On AIX 7.2, with Python 3.8.5 compiled with GCC 8.4, in 32bit:
b'def'
b'def'
b'def'

Preliminary conclusions:
 - this is a 64bit issue on AIX and it is independent of the compiler
 - it is worse on Fedora/x86_64
 - it works perfectly on Fedora/PPC64LE
what a mess.
msg374174 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 13:36
Fedora32/x86_64

[root@destiny10 tmp]# gdb /usr/bin/python3.8 core
...
Core was generated by `python3 ./Pb.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f898a02a1d8 in __memchr_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install python3-3.8.3-2.fc32.x86_64
(gdb) where
#0  0x00007f898a02a1d8 in __memchr_sse2 () from /lib64/libc.so.6
#1  0x00007f898982caf0 in ffi_call_unix64 () from /lib64/libffi.so.6
#2  0x00007f898982c2ab in ffi_call () from /lib64/libffi.so.6
#3  0x00007f8989851ef1 in _ctypes_callproc.cold () from /usr/lib64/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#4  0x00007f898985ba2f in PyCFuncPtr_call () from /usr/lib64/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#5  0x00007f8989d6c7a1 in _PyObject_MakeTpCall () from /lib64/libpython3.8.so.1.0
#6  0x00007f8989d69111 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#7  0x00007f8989d62ec4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#8  0x00007f8989dde109 in PyEval_EvalCodeEx () from /lib64/libpython3.8.so.1.0
#9  0x00007f8989dde0cb in PyEval_EvalCode () from /lib64/libpython3.8.so.1.0
#10 0x00007f8989dff028 in run_eval_code_obj () from /lib64/libpython3.8.so.1.0
#11 0x00007f8989dfe763 in run_mod () from /lib64/libpython3.8.so.1.0
#12 0x00007f8989cea81b in PyRun_FileExFlags () from /lib64/libpython3.8.so.1.0
#13 0x00007f8989cea19d in PyRun_SimpleFileExFlags () from /lib64/libpython3.8.so.1.0
#14 0x00007f8989ce153c in Py_RunMain.cold () from /lib64/libpython3.8.so.1.0
#15 0x00007f8989dd1bf9 in Py_BytesMain () from /lib64/libpython3.8.so.1.0
#16 0x00007f8989fb7042 in __libc_start_main () from /lib64/libc.so.6
#17 0x0000557a1f3c407e in _start ()
msg374175 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 13:55
On AIX:

root@castor4## gdb /opt/freeware/bin/python3
...
(gdb) run -m pdb Pb.py
...
(Pdb) n
b'def'
> /home2/freeware/src/packages/BUILD/Python-3.8.5/32bit/Pb.py(35)<module>()
-> print(
(Pdb) n
> /home2/freeware/src/packages/BUILD/Python-3.8.5/32bit/Pb.py(36)<module>()
-> CFUNCTYPE(c_char_p, MemchrArgsHack2,
(Pdb)
Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 1]
0x090000000016426c in __fd_select () from /usr/lib/libc.a(shr_64.o)
(gdb) b ffi_call
Breakpoint 1 at 0x1217918
(gdb) c
...
(Pdb) n

Thread 2 hit Breakpoint 1, 0x0900000001217918 in ffi_call () from /opt/freeware/lib/libffi.a(libffi.so.6)
(gdb) where
#0  0x0900000001217918 in ffi_call () from /opt/freeware/lib/libffi.a(libffi.so.6)
#1  0x0900000001217780 in ffi_prep_cif_machdep () from /opt/freeware/lib/libffi.a(libffi.so.6)
#2  0x0900000001216fb8 in ffi_prep_cif_var () from /opt/freeware/lib/libffi.a(libffi.so.6)
......

(gdb) b memchr
Breakpoint 2 at 0x9000000001b0d60
(gdb) c
Continuing.

Thread 2 hit Breakpoint 2, 0x09000000001b0d60 in memchr () from /usr/lib/libc.a(shr_64.o)
(gdb) i register
r0             0x9000000001b0d60        648518346343124320
r1             0xfffffffffffc8d0        1152921504606832848
r2             0x9001000a008e8b8        648535941212334264
r3             0xa000000003669e0        720575940382845408
r4             0x64     100
r5             0x0      0
r6             0x9001000a04ee730        648535941216921392
r7             0x0      0
...
(gdb) x/s $r3
0xa000000003669e0:      "abcdef"

So:
 - the string is passed as r3.
 - r4 contains "d" = 0x64=100
 - but the size 7 is missing

Anyway, it seems that ffi does not pass the pointer, but values. However, the length 7 is missing. Not in r5, and nowhere in the other registers.
msg374178 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 14:04
On Fedora/x86_64, in order to get the core, one must do:
  coredumpctl -o /tmp/core dump /usr/bin/python3.8
msg374180 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 14:06
On Fedora/PPC64LE, where it is OK, the same debug with gdb gives:

(gdb) where
#0  0x00007ffff7df03b0 in __memchr_power8 () from /lib64/libc.so.6
#1  0x00007fffea167680 in ?? () from /lib64/libffi.so.6
#2  0x00007fffea166284 in ffi_call () from /lib64/libffi.so.6
#3  0x00007fffea1a7fdc in _ctypes_callproc () from /usr/lib64/python3.8/lib-dynload/_ctypes.cpython-38-ppc64le-linux-gnu.so
..........
(gdb) i register
r0             0x7fffea167614      140737120728596
r1             0x7fffffffc490      140737488340112
r2             0x7fffea187f00      140737120861952
r3             0x7fffea33a140      140737122640192
r4             0x6464              25700
r5             0x7                 7
r6             0x0                 0
r7             0x7fffea33a147      140737122640199
r8             0x7fffea33a140      140737122640192

(gdb) x/s 0x7fffea33a140
0x7fffea33a140: "abcdef"

r3: string
r4 : 0x6464 : "d" ??
r5: 7 : length of the string !!!
msg374183 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 14:10
On AIX in 32bit, we have:

Thread 2 hit Breakpoint 2, 0xd01407e0 in memchr () from /usr/lib/libc.a(shr.o)
(gdb) where
#0  0xd01407e0 in memchr () from /usr/lib/libc.a(shr.o)
#1  0xd438f480 in ffi_call_AIX () from /opt/freeware/lib/libffi.a(libffi.so.6)
#2  0xd438effc in ffi_call () from /opt/freeware/lib/libffi.a(libffi.so.6)
....
(gdb) i register
r0             0xd01407e0       3490973664
r1             0x2ff20f80       804392832
r2             0xf07a3cc0       4034542784
r3             0xb024c558       2955199832
r4             0x64     100
r5             0x7      7
r6             0x0      0
...

(gdb) x/s 0xb024c558
0xb024c558:     "abcdef"

r5 is OK.
msg374184 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 14:16
AIX: difference between 32bit and 64bit.

After the second print, the stack is:

32bit:
#0  0xd01407e0 in memchr () from /usr/lib/libc.a(shr.o)
#1  0xd438f480 in ffi_call_AIX () from /opt/freeware/lib/libffi.a(libffi.so.6)
#2  0xd438effc in ffi_call () from /opt/freeware/lib/libffi.a(libffi.so.6)
#3  0xd14979bc in ?? ()
#4  0xd148995c in ?? ()
#5  0xd20fd5d8 in _PyObject_MakeTpCall () from /opt/freeware/lib/libpython3.8.so

64bit:
#0  0x09000000001b0d60 in memchr () from /usr/lib/libc.a(shr_64.o)
#1  0x0900000001217f00 in ffi_closure_ASM () from /opt/freeware/lib/libffi.a(libffi.so.6)
#2  0x0900000001217aac in ffi_prep_closure_loc () from /opt/freeware/lib/libffi.a(libffi.so.6)
#3  0x0900000000d30900 in ?? ()
#4  0x0900000000d22b6c in ?? ()
#5  0x0900000000ebbc18 in _PyObject_MakeTpCall () from /opt/freeware/lib64/libpython3.8.so

So, the execution does not run in the same ffi routines in 32bit and in 64bit. Bug ?

It should be interesting to do the same with Python3 and libffi built with -O0 -g maybe.
msg374192 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-24 15:04
# pwd
/opt/freeware/src/packages/BUILD/libffi-3.2.1

# grep -R ffi_closure_ASM *
powerpc-ibm-aix7.2.0.0/.libs/libffi.exp:         ffi_closure_ASM
powerpc-ibm-aix7.2.0.0/include/ffitarget.h:    void * code_pointer;       /* Pointer to ffi_closure_ASM */
src/powerpc/aix_closure.S:                .globl ffi_closure_ASM
src/powerpc/darwin_closure.S:            .globl _ffi_closure_ASM
src/powerpc/ffi_darwin.c:                 extern void ffi_closure_ASM (void);
                                          *((unsigned long *)&tramp[2]) = (unsigned long) ffi_closure_ASM; /* function  */
src/powerpc/ffitarget.h:                  void * code_pointer;  /* Pointer to ffi_closure_ASM */

# grep -R ffi_call_AIX *
powerpc-ibm-aix7.2.0.0/.libs/libffi.exp:  ffi_call_AIX
src/powerpc/aix.S:                        .globl ffi_call_AIX
src/powerpc/ffi_darwin.c:                 extern void ffi_call_AIX(extended_cif *, long, unsigned, unsigned *,

In 64bit, I see that: ffi_darwin.c  is compiled and used for building libffi.so.6 .
Same in 32bit.

The code of file src/powerpc/ffi_darwin.c seems to be able to handle both FFI_AIX and FFI_DARWIN , dynamically based on cif->abi .

The code looks like VERY complex!

The hypothesis is that the 64bit code has a bug vs the 32bit version.
msg374369 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-27 09:22
On AIX 7.2, with libffi compiled with -O0 -g, I have:

1) Call to memchr thru memchr_args_hack
#0  0x09000000001b0d60 in memchr () from /usr/lib/libc.a(shr_64.o)
#1  0x09000000058487a0 in ffi_call_DARWIN () from /opt/freeware/lib/libffi.a(libffi.so.6)
#2  0x0900000005847eec in ffi_call (cif=0xfffffff, fn=0xffffca90, rvalue=0xfffffff, avalue=0xffffca80) at ../src/powerpc/ffi_darwin.c:31
#3  0x09000000058f9900 in ?? ()
#4  0x09000000058ebb6c in ?? ()
#5  0x090000000109fc18 in _PyObject_MakeTpCall () from /opt/freeware/lib64/libpython3.8.so

r3             0xa000000003659e0        720575940382841312
r4             0x64     100
r5             0x7      7
(gdb) x/s $r3
0xa000000003659e0:      "abcdef"

2) Call to memchr thru memchr_args_hack2
#0  0x09000000001b0d60 in memchr () from /usr/lib/libc.a(shr_64.o)
#1  0x09000000058487a0 in ffi_call_DARWIN () from /opt/freeware/lib/libffi.a(libffi.so.6)
#2  0x0900000005847eec in ffi_call (cif=0xfffffff, fn=0xffffca90, rvalue=0xfffffff, avalue=0xffffca80) at ../src/powerpc/ffi_darwin.c:31
#3  0x09000000058f9900 in ?? ()
#4  0x09000000058ebb6c in ?? ()
#5  0x090000000109fc18 in _PyObject_MakeTpCall () from /opt/freeware/lib64/libpython3.8.so

r3             0xa000000003659e0        720575940382841312
r4             0x64     100
r5             0x0      0

So, it looks like, when libffi is not compiled with -O but with -O0 -g, that in 64bit ffi_call_DARWIN() is call in both cases (memchr_args_hack and memchr_args_hack2).
However, as seen previously, it was not the case with libffi built with -O .

Moreover, we have in source code:
  switch (cif->abi)
    {
    case FFI_AIX:
      ffi_call_AIX(&ecif, -(long)cif->bytes, cif->flags, ecif.rvalue, fn,
                   FFI_FN(ffi_prep_args));
      break;
    case FFI_DARWIN:
      ffi_call_DARWIN(&ecif, -(long)cif->bytes, cif->flags, ecif.rvalue, fn,
                      FFI_FN(ffi_prep_args), cif->rtype);

Why calling ffi_call_DARWIN instead of ffi_call_AIX ?

Hummm Will rebuild libffi and python both with gcc -O0 -g -gdwarf and look at details.
msg374375 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-27 13:03
After adding traces and after rebuilding Python and libffi with -O0 -g -gdwarf, it appears that, still in 64bit, the bug is still there, but that ffi_call_AIX is called now instead of ffi_call_DARWIN from ffi_call() routine of ../src/powerpc/ffi_darwin.c (lines 915...).
???

# ./Pb.py
TONY: libffi: src/powerpc/ffi_darwin.c : FFI_AIX
TONY: libffi: cif->abi: 1  -(long)cif->bytes : -144  cif->flags : 8  ecif.rvalue : fffffffffffd1f0  fn: 9001000a0082640  FFI_FN(ffi_prep_args) : 9001000a0483be8
b'def'
TONY: libffi: src/powerpc/ffi_darwin.c : FFI_AIX
TONY: libffi: cif->abi: 1  -(long)cif->bytes : -144  cif->flags : 8  ecif.rvalue : fffffffffffd220  fn: 9001000a0082640  FFI_FN(ffi_prep_args) : 9001000a0483be8
b'def'
TONY: libffi: src/powerpc/ffi_darwin.c : FFI_AIX
TONY: libffi: cif->abi: 1  -(long)cif->bytes : -144  cif->flags : 8  ecif.rvalue : fffffffffffd220  fn: 9001000a0082640  FFI_FN(ffi_prep_args) : 9001000a0483be8
None

In 32bit with same build environment, a different code is run since the traces are not printed.

Thus, 32bit and 64bit are managed very differently.
msg374389 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-27 15:50
Fedora32/x86_64 : Python v3.8.5 has been built.
Issue is still there, but different in debug or optimized mode.
Thus, change done in https://bugs.python.org/issue22273 did not fix this issue.

./Pb-3.8.5-debug.py :
#!/opt/freeware/src/packages/BUILD/Python-3.8.5/build/debug/python
...

i./Pb-3.8.5-optimized.py :
#!/opt/freeware/src/packages/BUILD/Python-3.8.5/build/optimized/python


BUILD=debug
export LD_LIBRARY_PATH=/opt/freeware/src/packages/BUILD/Python-3.8.5/build/debug:/usr/lib64:/usr/lib
export PYTHONPATH=/opt/freeware/src/packages/BUILD/Python-3.8.5/build/debug/Modules
./Pb-3.8.5-debug.py
b'def'
None
None

BUILD=optimized
export LD_LIBRARY_PATH=/opt/freeware/src/packages/BUILD/Python-3.8.5/build/optimized:/usr/lib64:/usr/lib
export PYTHONPATH=/opt/freeware/src/packages/BUILD/Python-3.8.5/build/optimized/Modules
+ ./Pb-3.8.5-optimized.py
b'def'
Pb-3.8.5.sh: line 6: 103569 Segmentation fault      (core dumped) ./Pb-3.8.5-$BUILD.py
msg374392 - (view) Author: Tony Reix (T.Rex) Date: 2020-07-27 15:58
Fedora32/x86_64 : Python v3.8.5 : optimized : uint type.

If, instead of using ulong type, the Pb.py program makes use of uint, the issue is different: see below.
This means that the issue depends on the length of the data.

BUILD=optimized
TYPE=int
export LD_LIBRARY_PATH=/opt/freeware/src/packages/BUILD/Python-3.8.5/build/optimized:/usr/lib64:/usr/lib
export PYTHONPATH=/opt/freeware/src/packages/BUILD/Python-3.8.5/build/optimized/Modules
./Pb-3.8.5-int-optimized.py
b'def'
None
None

# cat ./Pb-3.8.5-int-optimized.py
#!/opt/freeware/src/packages/BUILD/Python-3.8.5/build/optimized/python

# #!/opt/freeware/src/packages/BUILD/Python-3.8.5/python
#       #!/usr/bin/env python3

from ctypes import *

libc = CDLL('/usr/lib64/libc-2.31.so')

class MemchrArgsHack(Structure):
    _fields_ = [("s", c_char_p), ("c", c_uint), ("n", c_uint)]

memchr_args_hack = MemchrArgsHack()
memchr_args_hack.s = b"abcdef"
memchr_args_hack.c = ord('d')
memchr_args_hack.n = 7

class MemchrArgsHack2(Structure):
    _fields_ = [("s", c_char_p), ("c_n", c_uint * 2)]

memchr_args_hack2 = MemchrArgsHack2()
memchr_args_hack2.s = b"abcdef"
memchr_args_hack2.c_n[0] = ord('d')
memchr_args_hack2.c_n[1] = 7

print( CFUNCTYPE(c_char_p, c_char_p, c_uint, c_uint, c_void_p)(('memchr', libc))(b"abcdef", c_uint(ord('d')), c_uint(7), None))
print( CFUNCTYPE(c_char_p, MemchrArgsHack, c_void_p)(('memchr', libc))(memchr_args_hack, None))
print( CFUNCTYPE(c_char_p, MemchrArgsHack2, c_void_p)(('memchr', libc))(memchr_args_hack2, None))
msg374395 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-07-27 16:13
Tony, Please see my reply from 2020-02-05.  This is a known "bug" in Python ctypes.  This is documented in Python ctypes.  This will not be fixed.  This cannot be fixed.

Python ctypes converts the array to a structure and creates an incorrect libffi descriptor.  The call to libffi creates a "fake" descriptor (description of the arguments) that doesn't match the actual data.  It cannot work.

This should be closed as "wont fix".
msg374685 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-08-02 12:25
David, is the issue you mention in your message at 2020-02-05 reproducible with valid usage of ctypes (that is, by calling a C function with the correct signature)? 

What do you mean with "this is documented in Python ctypes"?
msg374686 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-08-02 15:30
I thought that the ctypes documentation mentioned that Arrays passed by Value are converted to Structures.  I cannot find it in the ctypes documentation at the moment.  But Modules/_ctypes/stgdict.c has a large comment about passing Arrays by Value as Structs.

* See bpo-22273. Arrays are normally treated as pointers, which is
* fine when an array name is being passed as parameter, but not when
* passing structures by value that contain arrays. On 64-bit Linux,
* small structures passed by value are passed in registers, and in
* order to do this, libffi needs to know the true type of the array
* members of structs. Treating them as pointers breaks things.

The comment proceeds to discuss 64-bit Linux, which means x86_64 Linux. Python ctypes coerces the array into a structure, which works on x86_64 Linux.  Something about the libffi descriptor created by Python ctypes does not work correctly for AIX, and it may be a fundamental difference in the alignment and padding rules for passing Arrays versus Structures.  Python ctypes assumes that Arrays and Structures are interchangeable with respect to argument passing.

And, again, the initial example was completely wrong and illegal and not expected to work because it created a data object that doesn't match the function signature, which happened to behave correctly on x86.  The initial example in the bug report repeatedly confuses people because they continually try to debug an incorrect use of ctypes.
msg374726 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-08-03 07:35
Thanks for the code reference.

I'm not a ctypes expert, but do maintain another project using libffi. The comment in stgdict.c is correct, and that code is used for all platforms.

However, code to embed arrays into a struct (as described int the comment) is only used for structs with a size smaller than 16 bytes (MAX_STRUCT_SIZE), which AFAIK is not correct.  My other project does something similar, but for all struct sizes.

That said, I haven't studied the ctypes code in detail yet.

As a quick test you could check if increasing MAX_STRUCT_SIZE to (say) 32 fixes this particular example.
msg374730 - (view) Author: Tony Reix (T.Rex) Date: 2020-08-03 09:48
After more investigations, we (Damien and I) think that there are several issues in Python 3.8.5 :

1) Documentation.
  a) AFAIK, the only place in the Python ctypes documentation where it talks about how arrays in a structure are managed appears at: https://docs.python.org/3/library/ctypes.html#arrays
  b) the size of the structure in the example given here is much greater than in our case.
  c) The documentation does NOT talk that a structure <= 16 bytes and a structure greater than 16 bytes are managed differently. That's a bug in the documentation vs the code.

2) Tests
  Looking at tests, there are NO test about our case.

3) There is a bug in Python
  About the issue here, we see with gdb that Python provides libffi with a description saying that our case is passed as pointers. However, Python does NOT provides libffi with pointers for the array c_n, but with values.

4) libffi obeys Python directives given in description, thinking that it deals with 2 pointers, and thus it pushes only 2 values in registers R3 and R4.

=====================================================
Bug in Python:
-----------------------------------------------------
1) gdb
(gdb) b ffi_call

Breakpoint 1 at 0x9000000016fab80: file ../src/powerpc/ffi_darwin.c, line 919.

(gdb) run

Starting program: /home2/freeware/bin/python3 /tmp/Pb_damien2.py

Thread 2 hit Breakpoint 1, ffi_call (cif=0xfffffffffffd108,

    fn=@0x9001000a0082640: 0x9000000001b0d60 <memchr>,

    rvalue=0xfffffffffffd1d0, avalue=0xfffffffffffd1c0)

(gdb) p *(ffi_cif *)$r3

$9 = {abi = FFI_AIX, nargs = 2, arg_types = 0xfffffffffffd1b0, rtype = 0xa00000000435cb8, bytes = 144, flags = 8}

(gdb) x/2xg 0xfffffffffffd1b0

0xfffffffffffd1b0:      0x0a0000000043ca48      0x08001000a0002a10

(gdb) p *(ffi_type *)0x0a0000000043ca48

$11 = {size = 16, alignment = 8, type = 13, elements = 0xa0000000012eed0}   <= 13==FFI_TYPE_STRUCT size == 16 on AIX!!! == 24 on Linux

(gdb) p *(ffi_type *)0x08001000a0002a10

$12 = {size = 8, alignment = 8, type = 14, elements = 0x0} <= FFI_TYPE_POINTER


(gdb) x/3xg *(long *)$r6

0xa00000000436050:      0x0a00000000152200      0x0000000000000064

0xa00000000436060:      0x0000000000000007  <= 7 is present in avalue[2]

(gdb) x/s 0x0a00000000152200

0xa00000000152200:      "abcdef"

-----------------------------------------------------
2) prints in libffi: AIX : aix_adjust_aggregate_sizes()

TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size:24 s->type:13 : FFI_TYPE_STRUCT
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() FFI_TYPE_STRUCT Before s->size:24
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size: 8
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() After ALIGN s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c: ffi_call: FFI_AIX
TONY: libffi: cif->abi:  1  -(long)cif->bytes : -144  cif->flags :  8  ecif.rvalue : fffffffffffd200  fn: 9001000a0227760  FFI_FN(ffi_prep_args) : 9001000a050a108
s   element  : char pointer: a00000000153d40 abcdef
c_n element 0: a Long:       100          0X64 = 100  instead of a pointer
c_n element 1: a Long:       0      libffi obeys description given by Python and pushes to R4 only what it thinks is a pointer (100 instead), and nothing in R5

====================================================================

Summary:
- Python documentation is uncomplete vs the code
- Python code gives libffi a description about pointers
  but Python code provides libffi with values.
msg374747 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-08-03 17:42
The example with memchr() never will be correct because it is invalid to call a function with an argument list that doesn't match the function signature.

Your comment mentions that the AIX structure is size 16, but it looks like Python calculates the AIX structure size as 24 bytes.  Have you tried increasing the value of MAX_STRUCT_SIZE in stgdict.c to 32 or 64, corresponding to the argument registers available in 32 bit or 64 bit mode?
msg374834 - (view) Author: Tony Reix (T.Rex) Date: 2020-08-04 18:13
I do agree that the example with memchr is not correct.

About your suggestion, I've done it. With 32. And that works fine.
All 3 values are passed by value.


# cat Pb-3.8.5.py
#!/usr/bin/env python3

from ctypes import *

mine = CDLL('./MemchrArgsHack2.so')

class MemchrArgsHack2(Structure):
    _fields_ = [("s",   c_char_p),
                ("c_n", c_ulong * 2)]

memchr_args_hack2 = MemchrArgsHack2()
memchr_args_hack2.s = b"abcdef"
memchr_args_hack2.c_n[0] = ord('d')
memchr_args_hack2.c_n[1] = 7

print( "sizeof(MemchrArgsHack2): ", sizeof(MemchrArgsHack2) )

print( CFUNCTYPE(c_char_p, MemchrArgsHack2, c_void_p)           (('my_memchr', mine)) (memchr_args_hack2, None) )


# cat MemchrArgsHack2.c
#include <string.h>
#include <stdio.h>

struct MemchrArgsHack2
{
        char *s;
        unsigned long c_n[2];
};

extern char *my_memchr(struct MemchrArgsHack2 args)
{
        printf("s   element  : char pointer: %p %s\n", args.s, args.s);
        printf("c_n element 0: a Long:       %ld\n",           args.c_n[0]);
        printf("c_n element 1: a Long:       %ld\n",           args.c_n[1]);

        return(args.s +3);
}



TONY Modules/_ctypes/stgdict.c: MAX_STRUCT_SIZE=32
sizeof(MemchrArgsHack2):  24
TONY: libffi: src/powerpc/ffi_darwin.c : ffi_prep_cif_machdep()
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size:24 s->type:13 : FFI_TYPE_STRUCT
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() FFI_TYPE_STRUCT Before s->size:24
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size: 8
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size:16 s->type:13 : FFI_TYPE_STRUCT
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() FFI_TYPE_STRUCT Before s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:11 : FFI_TYPE_UINT64
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size: 8
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:11 : FFI_TYPE_UINT64
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() After ALIGN s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size:16 s->size:24
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() After ALIGN s->size:24
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c: ffi_call: FFI_AIX
TONY: libffi: cif->abi:  1  -(long)cif->bytes : -144  cif->flags :  8  ecif.rvalue : fffffffffffd210  fn: 9001000a0227760  FFI_FN(ffi_prep_args) : 9001000a050a108

s   element  : char pointer: a00000000154d40 abcdef
c_n element 0: a Long:       100
c_n element 1: a Long:       7        <<<<  Correct value appears.
b'def'

With the regular version (16), I have:

sizeof(MemchrArgsHack2):  24
TONY: libffi: src/powerpc/ffi_darwin.c : ffi_prep_cif_machdep()
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size:24 s->type:13 : FFI_TYPE_STRUCT
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() FFI_TYPE_STRUCT Before s->size:24
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size: 8
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() p->size: 8 s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() After ALIGN s->size:16
TONY: libffi: src/powerpc/ffi_darwin.c : aix_adjust_aggregate_sizes() s->size: 8 s->type:14 : FFI_TYPE_POINTER
TONY: libffi: src/powerpc/ffi_darwin.c: ffi_call: FFI_AIX
TONY: libffi: cif->abi:  1  -(long)cif->bytes : -144  cif->flags :  8  ecif.rvalue : fffffffffffd210  fn: 9001000a0227760  FFI_FN(ffi_prep_args) : 9001000a050a108

s   element  : char pointer: a00000000154d40 abcdef
c_n element 0: a Long:       100
c_n element 1: a Long:       0        <<< Python pushed nothing for this.
msg374837 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-08-04 18:28
As mentioned in the stgdict.c comment, this relates back to #22273 and #29565. The passing of arrays/structs is fragile, to use a euphemism. The ctypes behavior conforms to the x64 Linux ABI and x64 libffi, even the comment from #22273,

"Structs that are larger than 32 bytes get copied to the stack (see classify_argument in ffi64.c), so we don't have to worry about classifying their elements for register passing. Thus if a new field is added for this in StgDictObject, then PyCArrayType_new should only allocate it for array types that are 32 bytes or less. Using it for larger array types would serve no point.

And now we're at the crux of the problem.  I don't know what Ronald and others recommend.  ctypes is choosing x64 behavior to define an inherently target-dependent and ABI-dependent design decisions.  The real solution requires that _ctypes/stgdict.c incorporate target-specific logic, but it's not clear that the Python community wants to go down that path.
msg374861 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-08-05 07:46
As I mentioned earlier the code in sgtdict.c seems dodgy, the libffi representation for an array in a dict should be the same regardless of the size of the dictionary.  

That said, I haven't studied the ctypes code yet and am not sure if my analysis is correct.

In the end we'll need some unit tests that demonstrate the issue as well as a patch that fixes it.
msg374863 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-08-05 08:37
Relevant libffi documentation: https://github.com/libffi/libffi/blob/4661ba7928b49588aec9e6976673208c8cbf0295/doc/libffi.texi#L505
msg375146 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-08-10 19:52
+Eryksun,Vinay

The patch to address array passing described in issue #22273 introduced regressions for other targets.  The 16 byte struct size is specific to x86 ABI and register passing convention.  I appreciate that the 16-32 byte size structure causes an abort for x64, but the patch shifted the problem to other targets that now produce wrong code.  The later comments in discussion thread for issue #22273 refer to patches that disable and reenable ctypes struct tests for ARMv7 and PPC, so this regression is not a surprise.

stgdict.c currently includes a target-specific work-around for small structures that is not restricted to the one target (x64) affected.

What's the best way to proceed?
msg375493 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2020-08-15 19:32
> stgdict.c currently includes a target-specific work-around for small structures that is not restricted to the one target (x64) affected. What's the best way to proceed?

I think more data is needed to determine the best way to proceed. The original failure was for x64, but other targets may be affected too if structs below a certain size, when passed by value, are passed in registers - libffi would have incomplete information about how to pass the struct correctly, as arrays are normally encoded as pointers in libffi. Do we know for particular targets what the struct size limits are for passing by value in registers? If so, we could set the MAX_STRUCT_SIZE according to target.

I would suggest adding a test to Lib/ctypes/test/test_structures.py in the test_array_in_struct method (or an analogous test_38628 method), to get it to fail - rather than using the OP's MemchrArgsHack. Then any patches to the stgdict.c code would have to pass that test on all architectures. But, noting that test_array_in_struct passes 16-byte structures by value to a C function to verify correct passing of the struct - if these tests aren't failing on AIX now, how come? That test was failing on x64 before the stgdict.c patch was added, and started working afterwards. Unfortunately I don't have an AIX environment I can try things in :-(
msg375507 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-08-16 12:42
As mentioned before I haven't studied the ctypes code base, but I am a bit worried about the use of MAX_STRUCT_SIZE, an array definition in a structure is always part of the struct itself and is never a pointer.

I agree with Vinay that there needs to be a unittest that demonstrates the problem.
msg375509 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2020-08-16 15:06
> an array definition in a structure is always part of the struct itself and is never a pointer

True, but a problem only arises in practice when passing by value in registers. It's still an open libffi issue that doesn't look like it's going to be solved any time soon, hence the attempted workaround in ctypes.

https://github.com/libffi/libffi/issues/33
msg375511 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-08-16 15:20
Yes, it doesn't appear that it will be solved in libffi.  I don't fully understand the need for the work-around because it should gracefully overflow to the stack.  I can't tell if the issue is a problem with arguments passed by value that need to be passed partially in registers and partially in the stack.

But if the work-around is necessary, it is target- and ABI-dependent: the number of arguments passed in registers is target- and ABI-dependent.  Implementing a work-around solely based on x64 ABI is not correct.  The ctypes stgdict.c code needs to define MAX_STRUCT_SIZE based on the target, at least for the targets that experience the problem.
msg375517 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2020-08-16 16:30
> Implementing a work-around solely based on x64 ABI is not correct.

But AFAIK the test_array_in_struct test passes on AIX and exercises the workaround - why does it work if the workaround is faulty? If OTOH the test is faulty, could you update it with code that fails on AIX, as I suggested earlier?
History
Date User Action Args
2022-04-11 14:59:22adminsetgithub: 82809
2020-08-16 16:30:56vinay.sajipsetmessages: + msg375517
2020-08-16 15:20:27David.Edelsohnsetmessages: + msg375511
2020-08-16 15:06:56vinay.sajipsetmessages: + msg375509
2020-08-16 12:42:01ronaldoussorensetmessages: + msg375507
versions: + Python 3.9, Python 3.10
2020-08-15 19:32:17vinay.sajipsetmessages: + msg375493
2020-08-10 19:52:49David.Edelsohnsetnosy: + vinay.sajip, eryksun
messages: + msg375146
2020-08-05 08:37:44ronaldoussorensetmessages: + msg374863
2020-08-05 07:46:26ronaldoussorensetmessages: + msg374861
2020-08-04 18:28:13David.Edelsohnsetmessages: + msg374837
2020-08-04 18:13:07T.Rexsetmessages: + msg374834
2020-08-04 12:25:35sanketsetnosy: + sanket
2020-08-03 17:42:48David.Edelsohnsetmessages: + msg374747
2020-08-03 09:48:25T.Rexsetmessages: + msg374730
2020-08-03 08:40:54ronaldoussorensetnosy: + amaury.forgeotdarc, belopolsky, meador.inge
2020-08-03 07:35:24ronaldoussorensetmessages: + msg374726
2020-08-02 15:30:05David.Edelsohnsetmessages: + msg374686
2020-08-02 12:25:48ronaldoussorensetnosy: + ronaldoussoren
messages: + msg374685
2020-07-27 16:13:21David.Edelsohnsetmessages: + msg374395
2020-07-27 15:58:23T.Rexsetmessages: + msg374392
2020-07-27 15:52:10T.Rexsetversions: + Python 3.8, - Python 3.7
2020-07-27 15:50:42T.Rexsetmessages: + msg374389
2020-07-27 13:03:15T.Rexsetmessages: + msg374375
2020-07-27 09:22:43T.Rexsetmessages: + msg374369
2020-07-24 21:16:09BTaskayasetnosy: + BTaskaya
2020-07-24 15:04:16T.Rexsetmessages: + msg374192
2020-07-24 14:16:35T.Rexsetmessages: + msg374184
2020-07-24 14:10:45T.Rexsetmessages: + msg374183
2020-07-24 14:06:21T.Rexsetmessages: + msg374180
2020-07-24 14:04:04T.Rexsetmessages: + msg374178
2020-07-24 13:55:13T.Rexsetmessages: + msg374175
2020-07-24 13:36:14T.Rexsetmessages: + msg374174
2020-07-24 13:13:02T.Rexsetnosy: + T.Rex
messages: + msg374173
2020-02-05 14:58:40David.Edelsohnsetmessages: + msg361433
2020-02-04 21:44:44David.Edelsohnsetmessages: + msg361374
2020-02-03 19:43:36David.Edelsohnsetmessages: + msg361324
2020-01-29 18:50:19David.Edelsohnsetnosy: + David.Edelsohn, Michael.Felt
2019-11-14 09:55:20Ayappansetmessages: + msg356588
2019-10-29 09:17:23Ayappancreate