classification
Title: Segementation faults on ARM and ARM64
Type: crash Stage:
Components: Build, Cross-Build, ctypes, FreeBSD Versions: Python 3.6, Python 2.7
process
Status: open Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Alex.Willmer, koobs, stefanrink@yahoo.com, vstinner, zach.ware
Priority: normal Keywords:

Created on 2018-10-11 09:43 by stefanrink@yahoo.com, last changed 2018-10-15 06:21 by koobs.

Messages (8)
msg327527 - (view) Author: Stefan Rink (stefanrink@yahoo.com) Date: 2018-10-11 09:43
When trying to run dask distributed on ARM you will end with a segmentation fault on multiple in multiple tests;

Works on AMD64 but does not work on ARM or AARCH64 and results in a Signal 11.

Example;
# python
Python 3.6.6 (default, Sep 29 2018, 05:50:41) 
[GCC 4.2.1 Compatible FreeBSD Clang 6.0.1 (tags/RELEASE_601/final 335540)] on freebsd12
Type "help", "copyright", "credits" or "license" for more information.
>>> from dask.distributed import Client
>>> client = Client()
Segmentation fault (core dumped)

# gdb /usr/local/bin/python3.6 python3.6.core 
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/python3.6...(no debugging symbols found)...done.
[New LWP 101213]
Core was generated by `python'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  testcancel (curthread=0x3e8) at /usr/src/lib/libthr/thread/thr_cancel.c:144
144     {
(gdb) bt
#0  testcancel (curthread=0x3e8) at /usr/src/lib/libthr/thread/thr_cancel.c:144
#1  _thr_cancel_enter (curthread=0x3e8) at /usr/src/lib/libthr/thread/thr_cancel.c:146
#2  0x00000000402a5b54 in __thr_connect (fd=3, name=0xffffffffd310, namelen=106)
    at /usr/src/lib/libthr/thread/thr_syscalls.c:179
#3  0x000000004240e23c in uuid_generate_time () from /usr/local/lib/libuuid.so.1
#4  0x0000000042382068 in ffi_call_SYSV () from /lib/libffi.so.6
#5  0x00000000423822c8 in ffi_call () from /lib/libffi.so.6
#6  0x000000004234664c in _ctypes_callproc () from /usr/local/lib/python3.6/lib-dynload/_ctypes.so
#7  0x00000000423403f0 in ?? () from /usr/local/lib/python3.6/lib-dynload/_ctypes.so
#8  0x00000000422f5720 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
msg327528 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-11 10:04
Is it possible that this bug is a duplicate of bpo-29804?

Would you be able to test Python 3.7.1RC1?

Sadly, this issue says "OK, PR 1559 (in 3.7.0) for Issue30353 has been backported to 3.6 in PR 5954 for release in 3.6.5".

I know that FreeBSD uses our bundled copy of libffi. I'm not sure that the copy in the 3.6 branch contains latest fixes for ARM64 :-( It seems to be libffi 3.1. The latest version is libffi-3.2.1 ("released on November 12, 2014": no release for 4 days?) on https://sourceware.org/libffi/
msg327530 - (view) Author: Stefan Rink (stefanrink@yahoo.com) Date: 2018-10-11 10:24
* (re)installed libffi but it was already version 3.2.1.
* Currently compiling python3.7 on both ARM and AARCH64 to test them.
This may take a while because I need to install dask again for 3.7 to test.. 

Didn't try this on Linux ARM yet so I'll also try that.

Ps. If someone wants to try a fix or something but doesn't have access to ARM64 hardware give me a notice; you can try it on one of my sopine nodes, pine64 or on a raspberry.
msg327535 - (view) Author: Stefan Rink (stefanrink@yahoo.com) Date: 2018-10-11 15:04
On ARM32 it still crashed but on the ARM64 upgrade to 3.7 helped!
Need to do some more testing but it looks like it's working now.

# python3.7
Python 3.7.0 (default, Sep 29 2018, 05:58:20) 
[Clang 6.0.1 (tags/RELEASE_601/final 335540)] on freebsd12
Type "help", "copyright", "credits" or "license" for more information.
>>> from dask.distributed import Client
>>> client = Client()
>>> client.cluster
LocalCluster('tcp://127.0.0.1:22415', workers=4, ncores=4)
msg327541 - (view) Author: Stefan Rink (stefanrink@yahoo.com) Date: 2018-10-11 15:57
Upgrade to 3.7 fixed the main problem for me, or at least on the hardware/arch I use.

-- On ARM32 it still failed but I don't have debugging symbols there so not so easy to troubleshoot further --
msg327543 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-11 16:02
> Upgrade to 3.7 fixed the main problem for me, or at least on the hardware/arch I use.

For Python, we should try to identify the required backport, or upgrade libffi in Python 3.6. But I'm scared by Modules/_ctypes/libffi/ since it's unclear to me if we patched it or not...

IHMO the best option for FreeBSD would be to change how FreeBSD builds Python 3.6 to use --with-system-ffi (use recent libffi rather than Python old copy).
msg327544 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2018-10-11 16:10
IIRC, the default in 3.6 is to use `--with-system-ffi` if available on all platforms but macOS, falling back to the bundled copy only if a system copy can't be found; 3.7 removes the bundled copy.  Also, the version bundled with 3.6 is v3.2.1.

Note though that this is all from memory, I haven't actually gone to look at the code again :)
msg327731 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2018-10-15 06:21
All our FreeBSD ports (lang/python??) and the packages produced from them all contain a LIBFFI option which is enabled by default, since 2015 [1][2]:

LIBFFI=on: Use libffi from ports instead of bundled version

This means that any 'default' package builds of these ports, including those in the official package repositories, will install and build again st the port/package version of libffi, and not the bundled version.

This was originally due to broken builds on i386 (see #22521 and issue23042), but also to due library policy (use external/upstream, not bundled libraries), and not wanting to use outdated/stale version any longer.

[1] python27: https://svnweb.freebsd.org/changeset/ports/377581
[2] python34+: https://svnweb.freebsd.org/changeset/ports/378821
History
Date User Action Args
2018-10-15 06:21:12koobssetmessages: + msg327731
2018-10-11 16:10:38zach.waresetmessages: + msg327544
2018-10-11 16:02:33vstinnersetnosy: + zach.ware
messages: + msg327543
2018-10-11 15:57:01stefanrink@yahoo.comsetresolution: works for me
messages: + msg327541
2018-10-11 15:04:45stefanrink@yahoo.comsetmessages: + msg327535
2018-10-11 10:24:25stefanrink@yahoo.comsetmessages: + msg327530
2018-10-11 10:04:36vstinnersetnosy: + vstinner
messages: + msg327528
2018-10-11 09:43:26stefanrink@yahoo.comcreate