classification
Title: Segmentation fault with urllib.request.urlopen and threads
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Anne Archibald, Manjusaka, gilles-duboscq, vstinner
Priority: normal Keywords:

Created on 2020-03-04 18:38 by Anne Archibald, last changed 2021-02-18 18:19 by gilles-duboscq. This issue is now closed.

Files
File name Uploaded Description Edit
urllib_segfault.py Anne Archibald, 2020-03-04 18:38
Messages (13)
msg363374 - (view) Author: Anne Archibald (Anne Archibald) * Date: 2020-03-04 18:38
This was discovered in the astropy test suite, where ThreadPoolExecutor is used to concurrently launch a lot of urllib.request.urlopen. This occurs when the URLs are local files; I'm not sure about other URL schemes.

The problem appears to occur in python 3.7 but not python 3.8 or python 3.6 (on a different machine).

$ python urllib_segfault.py
Linux-5.3.0-29-generic-x86_64-with-Ubuntu-19.10-eoan
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0]
Segmentation fault (core dumped)
$ python3.8 urllib_segfault.py
Linux-5.3.0-29-generic-x86_64-with-glibc2.29
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 9.2.1 20191008]
$ python3 urllib_segfault.py 
Linux-4.15.0-88-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0]
$

The Astropy bug report: https://github.com/astropy/astropy/issues/10008
msg364873 - (view) Author: Manjusaka (Manjusaka) * Date: 2020-03-23 18:21
I have tried it on Python 3.7.3 Ubuntu 18.04 

Linux-4.15.0-1060-aws-x86_64-with-debian-buster-sid
Python 3.7.3 (default, Mar 23 2020, 18:15:26)
[GCC 7.5.0]

there is no segmentation fault 

would you mind sharing the core dump to help us find more detail about the crash?
msg364876 - (view) Author: Manjusaka (Manjusaka) * Date: 2020-03-23 18:29
use the same GCC 8.3.0 to recompile the Python 3.7.3 still no fault

maybe we need the core dump to figure it out.
msg364881 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-23 18:49
Python 3.7.3 is outdated. Can you please try on newer Python version?


I failed to reproduce the crash on Fedora 31. I tested:

vstinner@apu$ python3.7 -VV
Python 3.7.6 (default, Jan 30 2020, 09:44:41) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

vstinner@apu$ python3.8 -VV
Python 3.8.2 (default, Feb 26 2020, 00:00:00) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

vstinner@apu$ python3.9 -VV
Python 3.9.0a4 (default, Feb 27 2020, 00:00:00) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

vstinner@apu$ python3-debug -VV
Python 3.7.6 (default, Jan 30 2020, 09:04:19) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
msg364882 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-23 18:57
Try to get the gdb traceback (C call stack).

You may also try: python3 -X faulthandler urllib_segfault.py.

But I expect a crash in Python finalization, where there is no Python frame and so empty traceback.
msg364890 - (view) Author: Manjusaka (Manjusaka) * Date: 2020-03-23 19:30
Hello Victor

I have tried on both MacOS and Ubuntu 18.04 from 3.8.2 to the newest code in master, and can't reproduce this problem

macOS-10.15.4-x86_64-i386-64bit
Python 3.9.0a4+ (heads/master:9a81ab107a, Mar 24 2020, 02:06:30)
[Clang 11.0.3 (clang-1103.0.32.26)]

Linux-4.15.0-1060-aws-x86_64-with-glibc2.27
Python 3.8.2 (default, Mar 23 2020, 19:21:16)
[GCC 8.3.0]

Linux-4.15.0-1060-aws-x86_64-with-glibc2.27
Python 3.9.0a4 (default, Mar 23 2020, 19:25:43)
[GCC 8.3.0]

I think it may be caused by the kernel version and the glibc version, I don't have enough evidence to prove it.

I will try the Python 3.7.6 to the newest version in Ubuntu 19.10 with Kernel 5.3.0-29 and GCC 8.3.0
msg365035 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-26 00:24
If someone manages to reproduce the bug, please provide *at least* the Python traceback. You can use faulthandler to get it. It would also be very useful to get the C call stack using gdb (where command in gdb).

In the meanwhile, I close the issue since only the reporter reproduced the crashed.

Please provide more information about your OS, OS version, how you installed Python, Python package version, etc. if you want to reopen the issue.
msg387246 - (view) Author: Gilles Duboscq (gilles-duboscq) Date: 2021-02-18 17:13
I'm not sure it's the same but we have seen stack traces looking like the one there: https://github.com/astropy/astropy/issues/9699

Current thread 0x00007fffa857e3c0 (most recent call first):
  File "/sw/lib/python3.7/socket.py", line 748 in getaddrinfo
  File "/sw/lib/python3.7/socket.py", line 707 in create_connection
  File "/sw/lib/python3.7/http/client.py", line 938 in connect
  File "/sw/lib/python3.7/http/client.py", line 966 in send
  File "/sw/lib/python3.7/http/client.py", line 1026 in _send_output
  File "/sw/lib/python3.7/http/client.py", line 1247 in endheaders
  File "/sw/lib/python3.7/http/client.py", line 1298 in _send_request
  File "/sw/lib/python3.7/http/client.py", line 1252 in request

For us it happens when using multiprocessing: the main process forks 2 processes and both use urlopen at roughly the same time. We are seeing this on Python 3.7.2 on macOS 10.14.3.
msg387249 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-18 17:27
"Segmentation fault with (...) threads (...) getaddrinfo"

Aha, another victim on a getaddrinfo() implementation which is not thread safe.

See this code in Modules/socketmodule.c:

/* Lock to allow python interpreter to continue, but only allow one
   thread to be in gethostbyname or getaddrinfo */
#if defined(USE_GETHOSTBYNAME_LOCK)
static PyThread_type_lock netdb_lock;
#endif

Can you please check if your Python was built with HAVE_GETHOSTBYNAME_R?

$ python3
Python 3.9.1 (default, Jan 20 2021, 00:00:00) 
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
>>> import sysconfig; repr(sysconfig.get_config_var('HAVE_GETHOSTBYNAME_R'))
'1'


Modules/socketmodule.c is full of #ifdef involving macOS...
msg387250 - (view) Author: Gilles Duboscq (gilles-duboscq) Date: 2021-02-18 17:46
I get '0' so it was not built with HAVE_GETHOSTBYNAME_R.
msg387253 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-18 18:01
Oh wait, I removed that lock:

commit 0de437de6210c2b32b09d6c47a805b23d023bd59
Author: Victor Stinner <vstinner@python.org>
Date:   Thu May 28 17:23:39 2020 +0200

    bpo-25920: Remove socket.getaddrinfo() lock on macOS (GH-20177)
    
    On macOS, socket.getaddrinfo() no longer uses an internal lock to
    prevent race conditions when calling getaddrinfo(). getaddrinfo is
    thread-safe is macOS 10.5, whereas Python 3.9 requires macOS 10.6 or
    newer.
    
    The lock was also used on FreeBSD older than 5.3, OpenBSD older than
    201311 and NetBSD older than 4.

Please open a new issue specific to macOS: specify your macOS version, your Python version and how you installed or built Python.
msg387254 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-18 18:03
> For us it happens when using multiprocessing: the main process forks 2 processes and both use urlopen at roughly the same time. We are seeing this on Python 3.7.2 on macOS 10.14.3.

From what I understood, calling fork() (and then continue to execute regular Python code) is no longer safe in macOS 10.14 and must no longer be used. Only fork+exec is safe (spawn a new child process), but posix_spawn() is preferred on macOS. But I'm not a macOS expert.
msg387259 - (view) Author: Gilles Duboscq (gilles-duboscq) Date: 2021-02-18 18:19
Thanks Victor we'll look into moving away from this pattern.
History
Date User Action Args
2021-02-18 18:19:35gilles-duboscqsetmessages: + msg387259
2021-02-18 18:03:15vstinnersetmessages: + msg387254
2021-02-18 18:01:09vstinnersetmessages: + msg387253
2021-02-18 17:46:31gilles-duboscqsetmessages: + msg387250
2021-02-18 17:27:42vstinnersetmessages: + msg387249
2021-02-18 17:13:02gilles-duboscqsetnosy: + gilles-duboscq
messages: + msg387246
2020-03-26 00:24:53vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg365035

stage: resolved
2020-03-23 19:30:26Manjusakasetmessages: + msg364890
2020-03-23 18:57:54vstinnersetmessages: + msg364882
2020-03-23 18:49:38vstinnersetmessages: + msg364881
2020-03-23 18:29:55Manjusakasetmessages: + msg364876
2020-03-23 18:21:44Manjusakasetnosy: + Manjusaka
messages: + msg364873
2020-03-11 23:43:57vstinnersetnosy: + vstinner
2020-03-04 18:38:22Anne Archibaldcreate