classification
Title: ctypes change made clang fail to build
Type: Stage: resolved
Components: ctypes Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Paul Monson, lukasz.langa, miss-islington, petr.viktorin, serge-sans-paille, steve.dower, vstinner
Priority: Keywords: 3.8regression, patch

Created on 2019-06-03 09:21 by petr.viktorin, last changed 2019-08-30 14:56 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
bug.py vstinner, 2019-08-28 16:12
point.c vstinner, 2019-08-30 09:48
point.py vstinner, 2019-08-30 09:49
Pull Requests
URL Status Linked Edit
PR 13796 closed Paul Monson, 2019-06-04 01:14
PR 15612 merged vstinner, 2019-08-30 09:22
PR 15613 merged miss-islington, 2019-08-30 12:31
Messages (18)
msg344394 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2019-06-03 09:21
Hello,
I haven't investigated this fully yet, and won't be able to find time today. I'm forwarding the report here in case someone familiar with the code wants to take a look.

On Fedora, "clang" fails to build with Python 3.8, probably due this change (which was supposed to be Windows-only):
https://github.com/python/cpython/commit/32119e10b792ad7ee4e5f951a2d89ddbaf111cc5#diff-998bfefaefe2ab83d5f523e18f158fa4R413

According to serge_sans_paille: if ``self->b_ptr`` contains pointer, the ``memcpy`` creates sharing, and this is dangerous: if a ``__del__`` happens to free the original pointer, we end up with a dangling reference in ``new_ptr``. As far as I can tell, this is what happens in the clang bindings code.

Fedora discussion with link to logs: https://bugzilla.redhat.com/show_bug.cgi?id=1715016
msg344461 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-06-03 18:57
The change causing the build failure comes from this commit.
https://github.com/python/cpython/pull/3806/commits/ea8a0dcea68409d37f3ad9e60e42777c76ad903b

The commit comment says: "Fix copy of structures when passed by value."

Steve, do you recall what this change fixed?
msg344493 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-06-03 23:42
If I undo the changes to StructUnionType_paramfunc then test_pass_by_value (ctypes.test.test_structures.StructureTestCase) fails on x64 on Windows.  Looking at the code I don't think this is specific to Windows.

This is a test for fixing this issue: https://bugs.python.org/issue29565
msg344498 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-06-03 23:55
Reading the related bugs more carefully I think the struct/union passing conventions are different on Windows x64 and Linux.  I have a fix which works for Windows but preserves the prior code for Linux.
msg344579 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 14:43
> On Fedora, "clang" fails to build with Python 3.8

Would you mind to elaborate? It works for me on Fedora 30 with clang 8.0.0:

$ ./configure CC=clang
$ make
$ ./python -m test -v test_ctypes
...
Tests result: SUCCESS

I tested the master branch of Python.
msg344581 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2019-06-04 14:47
The issue is with building clang using Python 3.8; not building Python 3.8 using clang :)
msg344583 - (view) Author: (serge-sans-paille) * Date: 2019-06-04 14:50
@vstinner: to reproduce the issue

```
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir _build
cd _build
cmake3 ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install -DPYTHON_EXECUTABLE=$HOME/sources/cpython/_build/install/bin/python3
make -j4
```
msg344589 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 15:09
It would be great to be able to write a reproducer and then convert it into a proper unit test.
msg344625 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-06-04 19:03
I added a unittest to the PR that illustrates the problem.  It doesn't pass yet.
msg345394 - (view) Author: Paul Monson (Paul Monson) * Date: 2019-06-12 17:57
I did some reading about parameter passing and it's still not clear to me whether https://bugs.python.org/issue37140 is a bug in CPython or whether the clang bindings were relying on incorrect parameter passing behavior to work.

The change in https://github.com/python/cpython/pull/13796 restores the previous behavior where Windows and non-Windows builds pass structs differently.
msg346451 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-06-24 23:40
> According to serge_sans_paille: if ``self->b_ptr`` contains pointer, the ``memcpy`` creates sharing, and this is dangerous: if a ``__del__`` happens to free the original pointer, we end up with a dangling reference in ``new_ptr``. As far as I can tell, this is what happens in the clang bindings code.

We probably need a second parg->obj to keep self alive for as long as copied_self. Or pack it into a tuple.

Having a repro test for this would be ideal, especially if we can make it happen (even crash) on all platforms. The double-free issue would seem to be real, and I don't want it to crash on Windows either.
msg350667 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-28 16:12
Ok, I'm able to reproduce the crash using:

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir _build
cd _build
cmake3 ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install -DPYTHON_EXECUTABLE=$HOME/sources/cpython/_build/install/bin/python3
make -j4
make check-clang-python

The crash occurs in the test_access_specifiers() method defined in tests/cindex/test_access_specifiers.py. This test uses ctypes to call functions of the libclang.so dynamic library.

I reduced the test case from 4350 lines of Python code to 131 lines of Python which only depends on the standard library and libclang.so.

$ cd llvm-project/clang/bindings/python
$ wc -l tests/cindex/test_access_specifiers.py tests/cindex/util.py clang/cindex.py 
    41 tests/cindex/test_access_specifiers.py
    90 tests/cindex/util.py
  4219 clang/cindex.py
  4350 total

$ wc -l bug.py
131 bug.py

See attached bug.py script. Python 3.7 is fine, whereas Python 3.8 does crash:

$ python3.7 bug.py 
spelling
spelling = None
_CXString del: obj id=7f0904a6dcb0
spelling = None ---


$ python3.8 bug.py 
spelling
spelling = None
_CXString del: obj id=7f4fa6180c40
_CXString del: obj id=7f4fa6180ac0
free(): double free detected in tcache 2
Aborted (core dumped)


Notice that _CXString.__del__() is only called once in Python 3.7, but called twice in Python 3.8.
msg350862 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-30 09:22
It's a Python 3.8 regression, so I mark it as a release blocker.
msg350863 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-30 09:48
I was surprised by the following test in StructUnionType_paramfunc():

if ((size_t)self->b_size > sizeof(void*)) {
   ... copy the structure ...
} else {
   ... pass by reference (?) ...
}

So I wrote a simple C library with the structure: typedef struct Point { int x; int y; } and functions modifying this structure, structured passed by copy or structure passed by reference.

I'm surprised: ctypes works as expected :-)

Try attached point.c and point.py:

$ gcc -c point.c -fpic && gcc -shared -o libpoint.so point.o && python3 point.py
p = <Point x=1 y=2>
p = <Point x=1 y=2>
p = <Point x=0 y=0>
ok
sizeof(Point) = 16 bytes

Modify CoordType in point.c and point.py to test different sizes, on x86-64, I get:

* c_byte: 2 bytes
* c_short: 4 bytes
* c_int: 8 bytes
* c_long: 16 bytes
msg350864 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-30 10:16
Ok, it took me a while to understand the subtle ctypes internals. serge-sans-paille and me read the C code and run a debugger (gdb) on it to undertand how passing a structure by copy work in ctypes.

The root function is StructUnionType_paramfunc(): it copies the structure if it's larger than sizeof(void*) bytes.

StructUnionType_paramfunc() creates a new internal object which is stored into parg->obj. The only purpose of this object is to release the memory copy allocated by PyMem_Malloc().

This issue is a problem with the internal object: if the struture has a finalizer, the finalizer is called twice. First, it is called on the internal object which is more and less a full copy of the structure. Second, it is called on the structure (once the last reference to the structure is removed).

The code behaves as if the the finalizer is called twice. Even if it's two separated Python object, in fact the two objects contain the same structure values. For example, if the structure contains a pointer to memory block and the finalizer calls free(ptr): free(ptr) will be called twice with the same ptr value.

This surprising behavior comes from this code:

        void *new_ptr = PyMem_Malloc(self->b_size);
        if (new_ptr == NULL)
            return NULL;
        memcpy(new_ptr, self->b_ptr, self->b_size);
        copied_self = (CDataObject *)PyCData_AtAddress(
            (PyObject *)Py_TYPE(self), new_ptr);
        copied_self->b_needsfree = 1;

copied_self reuses the exact same type of the structure. If the structure has a finalizer defined in Python, it will be called.

copied_self finalized is called at the "cleanup:" label of  _ctypes_callproc():

    for (i = 0; i < argcount; ++i)
        Py_XDECREF(args[i].keep);

--

Trying to duplicate self isn't needed. All we need is to call PyMem_Free(ptr) at the _ctypes_callproc() exit.

My PR 15612 introduces a new internal StructParam_Type type which has exactly one purpose: call PyMem_Free(ptr) in its deallocator. The type has no finalizer.
msg350874 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-30 12:30
New changeset 96b4087ce784ee7434dffdf69c475f5b40543982 by Victor Stinner in branch 'master':
bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
https://github.com/python/cpython/commit/96b4087ce784ee7434dffdf69c475f5b40543982
msg350876 - (view) Author: miss-islington (miss-islington) Date: 2019-08-30 12:50
New changeset 17f61ed25a856ed673ad6f2e9782c3d5e556f151 by Miss Islington (bot) in branch '3.8':
bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
https://github.com/python/cpython/commit/17f61ed25a856ed673ad6f2e9782c3d5e556f151
msg350884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-30 14:56
Ok, the regression is now fixed. I close the issue.

Thanks Petr Viktorin for the bug report, Paul Monson for the fix proposed fix, and serge-sans-paille for your help on investigating this complex bug.
History
Date User Action Args
2019-08-30 14:56:33vstinnersetstatus: open -> closed
priority: release blocker ->
messages: + msg350884

resolution: fixed
stage: patch review -> resolved
2019-08-30 12:50:47miss-islingtonsetnosy: + miss-islington
messages: + msg350876
2019-08-30 12:31:03miss-islingtonsetpull_requests: + pull_request15287
2019-08-30 12:30:40vstinnersetmessages: + msg350874
2019-08-30 10:16:32vstinnersetmessages: + msg350864
2019-08-30 09:49:02vstinnersetfiles: + point.py
2019-08-30 09:48:55vstinnersetkeywords: + 3.8regression
files: + point.c
messages: + msg350863

versions: + Python 3.9
2019-08-30 09:22:41vstinnersetpriority: normal -> release blocker
nosy: + lukasz.langa
messages: + msg350862

2019-08-30 09:22:14vstinnersetpull_requests: + pull_request15286
2019-08-28 16:12:51vstinnersetfiles: + bug.py

messages: + msg350667
2019-06-24 23:40:13steve.dowersetmessages: + msg346451
2019-06-12 17:57:26Paul Monsonsetmessages: + msg345394
2019-06-04 19:03:44Paul Monsonsetmessages: + msg344625
2019-06-04 15:09:47vstinnersetmessages: + msg344589
2019-06-04 14:50:54serge-sans-paillesetmessages: + msg344583
2019-06-04 14:47:34petr.viktorinsetmessages: + msg344581
2019-06-04 14:43:53vstinnersetnosy: + vstinner
messages: + msg344579
2019-06-04 01:14:05Paul Monsonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13681
2019-06-03 23:55:05Paul Monsonsetmessages: + msg344498
2019-06-03 23:42:22Paul Monsonsetmessages: + msg344493
2019-06-03 18:57:15Paul Monsonsetnosy: + steve.dower, Paul Monson
messages: + msg344461
2019-06-03 09:51:07serge-sans-paillesetnosy: + serge-sans-paille
2019-06-03 09:21:37petr.viktorincreate