msg344394 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2019-06-03 09:21 |
Hello,
I haven't investigated this fully yet, and won't be able to find time today. I'm forwarding the report here in case someone familiar with the code wants to take a look.
On Fedora, "clang" fails to build with Python 3.8, probably due this change (which was supposed to be Windows-only):
https://github.com/python/cpython/commit/32119e10b792ad7ee4e5f951a2d89ddbaf111cc5#diff-998bfefaefe2ab83d5f523e18f158fa4R413
According to serge_sans_paille: if ``self->b_ptr`` contains pointer, the ``memcpy`` creates sharing, and this is dangerous: if a ``__del__`` happens to free the original pointer, we end up with a dangling reference in ``new_ptr``. As far as I can tell, this is what happens in the clang bindings code.
Fedora discussion with link to logs: https://bugzilla.redhat.com/show_bug.cgi?id=1715016
|
msg344461 - (view) |
Author: Paul Monson (Paul Monson) * |
Date: 2019-06-03 18:57 |
The change causing the build failure comes from this commit.
https://github.com/python/cpython/pull/3806/commits/ea8a0dcea68409d37f3ad9e60e42777c76ad903b
The commit comment says: "Fix copy of structures when passed by value."
Steve, do you recall what this change fixed?
|
msg344493 - (view) |
Author: Paul Monson (Paul Monson) * |
Date: 2019-06-03 23:42 |
If I undo the changes to StructUnionType_paramfunc then test_pass_by_value (ctypes.test.test_structures.StructureTestCase) fails on x64 on Windows. Looking at the code I don't think this is specific to Windows.
This is a test for fixing this issue: https://bugs.python.org/issue29565
|
msg344498 - (view) |
Author: Paul Monson (Paul Monson) * |
Date: 2019-06-03 23:55 |
Reading the related bugs more carefully I think the struct/union passing conventions are different on Windows x64 and Linux. I have a fix which works for Windows but preserves the prior code for Linux.
|
msg344579 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-06-04 14:43 |
> On Fedora, "clang" fails to build with Python 3.8
Would you mind to elaborate? It works for me on Fedora 30 with clang 8.0.0:
$ ./configure CC=clang
$ make
$ ./python -m test -v test_ctypes
...
Tests result: SUCCESS
I tested the master branch of Python.
|
msg344581 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2019-06-04 14:47 |
The issue is with building clang using Python 3.8; not building Python 3.8 using clang :)
|
msg344583 - (view) |
Author: (serge-sans-paille) * |
Date: 2019-06-04 14:50 |
@vstinner: to reproduce the issue
```
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir _build
cd _build
cmake3 ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install -DPYTHON_EXECUTABLE=$HOME/sources/cpython/_build/install/bin/python3
make -j4
```
|
msg344589 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-06-04 15:09 |
It would be great to be able to write a reproducer and then convert it into a proper unit test.
|
msg344625 - (view) |
Author: Paul Monson (Paul Monson) * |
Date: 2019-06-04 19:03 |
I added a unittest to the PR that illustrates the problem. It doesn't pass yet.
|
msg345394 - (view) |
Author: Paul Monson (Paul Monson) * |
Date: 2019-06-12 17:57 |
I did some reading about parameter passing and it's still not clear to me whether https://bugs.python.org/issue37140 is a bug in CPython or whether the clang bindings were relying on incorrect parameter passing behavior to work.
The change in https://github.com/python/cpython/pull/13796 restores the previous behavior where Windows and non-Windows builds pass structs differently.
|
msg346451 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2019-06-24 23:40 |
> According to serge_sans_paille: if ``self->b_ptr`` contains pointer, the ``memcpy`` creates sharing, and this is dangerous: if a ``__del__`` happens to free the original pointer, we end up with a dangling reference in ``new_ptr``. As far as I can tell, this is what happens in the clang bindings code.
We probably need a second parg->obj to keep self alive for as long as copied_self. Or pack it into a tuple.
Having a repro test for this would be ideal, especially if we can make it happen (even crash) on all platforms. The double-free issue would seem to be real, and I don't want it to crash on Windows either.
|
msg350667 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-08-28 16:12 |
Ok, I'm able to reproduce the crash using:
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir _build
cd _build
cmake3 ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install -DPYTHON_EXECUTABLE=$HOME/sources/cpython/_build/install/bin/python3
make -j4
make check-clang-python
The crash occurs in the test_access_specifiers() method defined in tests/cindex/test_access_specifiers.py. This test uses ctypes to call functions of the libclang.so dynamic library.
I reduced the test case from 4350 lines of Python code to 131 lines of Python which only depends on the standard library and libclang.so.
$ cd llvm-project/clang/bindings/python
$ wc -l tests/cindex/test_access_specifiers.py tests/cindex/util.py clang/cindex.py
41 tests/cindex/test_access_specifiers.py
90 tests/cindex/util.py
4219 clang/cindex.py
4350 total
$ wc -l bug.py
131 bug.py
See attached bug.py script. Python 3.7 is fine, whereas Python 3.8 does crash:
$ python3.7 bug.py
spelling
spelling = None
_CXString del: obj id=7f0904a6dcb0
spelling = None ---
$ python3.8 bug.py
spelling
spelling = None
_CXString del: obj id=7f4fa6180c40
_CXString del: obj id=7f4fa6180ac0
free(): double free detected in tcache 2
Aborted (core dumped)
Notice that _CXString.__del__() is only called once in Python 3.7, but called twice in Python 3.8.
|
msg350862 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-08-30 09:22 |
It's a Python 3.8 regression, so I mark it as a release blocker.
|
msg350863 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-08-30 09:48 |
I was surprised by the following test in StructUnionType_paramfunc():
if ((size_t)self->b_size > sizeof(void*)) {
... copy the structure ...
} else {
... pass by reference (?) ...
}
So I wrote a simple C library with the structure: typedef struct Point { int x; int y; } and functions modifying this structure, structured passed by copy or structure passed by reference.
I'm surprised: ctypes works as expected :-)
Try attached point.c and point.py:
$ gcc -c point.c -fpic && gcc -shared -o libpoint.so point.o && python3 point.py
p = <Point x=1 y=2>
p = <Point x=1 y=2>
p = <Point x=0 y=0>
ok
sizeof(Point) = 16 bytes
Modify CoordType in point.c and point.py to test different sizes, on x86-64, I get:
* c_byte: 2 bytes
* c_short: 4 bytes
* c_int: 8 bytes
* c_long: 16 bytes
|
msg350864 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-08-30 10:16 |
Ok, it took me a while to understand the subtle ctypes internals. serge-sans-paille and me read the C code and run a debugger (gdb) on it to undertand how passing a structure by copy work in ctypes.
The root function is StructUnionType_paramfunc(): it copies the structure if it's larger than sizeof(void*) bytes.
StructUnionType_paramfunc() creates a new internal object which is stored into parg->obj. The only purpose of this object is to release the memory copy allocated by PyMem_Malloc().
This issue is a problem with the internal object: if the struture has a finalizer, the finalizer is called twice. First, it is called on the internal object which is more and less a full copy of the structure. Second, it is called on the structure (once the last reference to the structure is removed).
The code behaves as if the the finalizer is called twice. Even if it's two separated Python object, in fact the two objects contain the same structure values. For example, if the structure contains a pointer to memory block and the finalizer calls free(ptr): free(ptr) will be called twice with the same ptr value.
This surprising behavior comes from this code:
void *new_ptr = PyMem_Malloc(self->b_size);
if (new_ptr == NULL)
return NULL;
memcpy(new_ptr, self->b_ptr, self->b_size);
copied_self = (CDataObject *)PyCData_AtAddress(
(PyObject *)Py_TYPE(self), new_ptr);
copied_self->b_needsfree = 1;
copied_self reuses the exact same type of the structure. If the structure has a finalizer defined in Python, it will be called.
copied_self finalized is called at the "cleanup:" label of _ctypes_callproc():
for (i = 0; i < argcount; ++i)
Py_XDECREF(args[i].keep);
--
Trying to duplicate self isn't needed. All we need is to call PyMem_Free(ptr) at the _ctypes_callproc() exit.
My PR 15612 introduces a new internal StructParam_Type type which has exactly one purpose: call PyMem_Free(ptr) in its deallocator. The type has no finalizer.
|
msg350874 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-08-30 12:30 |
New changeset 96b4087ce784ee7434dffdf69c475f5b40543982 by Victor Stinner in branch 'master':
bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
https://github.com/python/cpython/commit/96b4087ce784ee7434dffdf69c475f5b40543982
|
msg350876 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-08-30 12:50 |
New changeset 17f61ed25a856ed673ad6f2e9782c3d5e556f151 by Miss Islington (bot) in branch '3.8':
bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
https://github.com/python/cpython/commit/17f61ed25a856ed673ad6f2e9782c3d5e556f151
|
msg350884 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-08-30 14:56 |
Ok, the regression is now fixed. I close the issue.
Thanks Petr Viktorin for the bug report, Paul Monson for the fix proposed fix, and serge-sans-paille for your help on investigating this complex bug.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:16 | admin | set | github: 81321 |
2019-08-30 14:56:33 | vstinner | set | status: open -> closed priority: release blocker -> messages:
+ msg350884
resolution: fixed stage: patch review -> resolved |
2019-08-30 12:50:47 | miss-islington | set | nosy:
+ miss-islington messages:
+ msg350876
|
2019-08-30 12:31:03 | miss-islington | set | pull_requests:
+ pull_request15287 |
2019-08-30 12:30:40 | vstinner | set | messages:
+ msg350874 |
2019-08-30 10:16:32 | vstinner | set | messages:
+ msg350864 |
2019-08-30 09:49:02 | vstinner | set | files:
+ point.py |
2019-08-30 09:48:55 | vstinner | set | keywords:
+ 3.8regression files:
+ point.c messages:
+ msg350863
versions:
+ Python 3.9 |
2019-08-30 09:22:41 | vstinner | set | priority: normal -> release blocker nosy:
+ lukasz.langa messages:
+ msg350862
|
2019-08-30 09:22:14 | vstinner | set | pull_requests:
+ pull_request15286 |
2019-08-28 16:12:51 | vstinner | set | files:
+ bug.py
messages:
+ msg350667 |
2019-06-24 23:40:13 | steve.dower | set | messages:
+ msg346451 |
2019-06-12 17:57:26 | Paul Monson | set | messages:
+ msg345394 |
2019-06-04 19:03:44 | Paul Monson | set | messages:
+ msg344625 |
2019-06-04 15:09:47 | vstinner | set | messages:
+ msg344589 |
2019-06-04 14:50:54 | serge-sans-paille | set | messages:
+ msg344583 |
2019-06-04 14:47:34 | petr.viktorin | set | messages:
+ msg344581 |
2019-06-04 14:43:53 | vstinner | set | nosy:
+ vstinner messages:
+ msg344579
|
2019-06-04 01:14:05 | Paul Monson | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request13681 |
2019-06-03 23:55:05 | Paul Monson | set | messages:
+ msg344498 |
2019-06-03 23:42:22 | Paul Monson | set | messages:
+ msg344493 |
2019-06-03 18:57:15 | Paul Monson | set | nosy:
+ steve.dower, Paul Monson messages:
+ msg344461
|
2019-06-03 09:51:07 | serge-sans-paille | set | nosy:
+ serge-sans-paille
|
2019-06-03 09:21:37 | petr.viktorin | create | |