msg363374 - (view) |
Author: Anne Archibald (Anne Archibald) * |
Date: 2020-03-04 18:38 |
This was discovered in the astropy test suite, where ThreadPoolExecutor is used to concurrently launch a lot of urllib.request.urlopen. This occurs when the URLs are local files; I'm not sure about other URL schemes.
The problem appears to occur in python 3.7 but not python 3.8 or python 3.6 (on a different machine).
$ python urllib_segfault.py
Linux-5.3.0-29-generic-x86_64-with-Ubuntu-19.10-eoan
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
[GCC 8.3.0]
Segmentation fault (core dumped)
$ python3.8 urllib_segfault.py
Linux-5.3.0-29-generic-x86_64-with-glibc2.29
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 9.2.1 20191008]
$ python3 urllib_segfault.py
Linux-4.15.0-88-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
$
The Astropy bug report: https://github.com/astropy/astropy/issues/10008
|
msg364873 - (view) |
Author: Manjusaka (Manjusaka) * |
Date: 2020-03-23 18:21 |
I have tried it on Python 3.7.3 Ubuntu 18.04
Linux-4.15.0-1060-aws-x86_64-with-debian-buster-sid
Python 3.7.3 (default, Mar 23 2020, 18:15:26)
[GCC 7.5.0]
there is no segmentation fault
would you mind sharing the core dump to help us find more detail about the crash?
|
msg364876 - (view) |
Author: Manjusaka (Manjusaka) * |
Date: 2020-03-23 18:29 |
use the same GCC 8.3.0 to recompile the Python 3.7.3 still no fault
maybe we need the core dump to figure it out.
|
msg364881 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-23 18:49 |
Python 3.7.3 is outdated. Can you please try on newer Python version?
I failed to reproduce the crash on Fedora 31. I tested:
vstinner@apu$ python3.7 -VV
Python 3.7.6 (default, Jan 30 2020, 09:44:41)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
vstinner@apu$ python3.8 -VV
Python 3.8.2 (default, Feb 26 2020, 00:00:00)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
vstinner@apu$ python3.9 -VV
Python 3.9.0a4 (default, Feb 27 2020, 00:00:00)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
vstinner@apu$ python3-debug -VV
Python 3.7.6 (default, Jan 30 2020, 09:04:19)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
|
msg364882 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-23 18:57 |
Try to get the gdb traceback (C call stack).
You may also try: python3 -X faulthandler urllib_segfault.py.
But I expect a crash in Python finalization, where there is no Python frame and so empty traceback.
|
msg364890 - (view) |
Author: Manjusaka (Manjusaka) * |
Date: 2020-03-23 19:30 |
Hello Victor
I have tried on both MacOS and Ubuntu 18.04 from 3.8.2 to the newest code in master, and can't reproduce this problem
macOS-10.15.4-x86_64-i386-64bit
Python 3.9.0a4+ (heads/master:9a81ab107a, Mar 24 2020, 02:06:30)
[Clang 11.0.3 (clang-1103.0.32.26)]
Linux-4.15.0-1060-aws-x86_64-with-glibc2.27
Python 3.8.2 (default, Mar 23 2020, 19:21:16)
[GCC 8.3.0]
Linux-4.15.0-1060-aws-x86_64-with-glibc2.27
Python 3.9.0a4 (default, Mar 23 2020, 19:25:43)
[GCC 8.3.0]
I think it may be caused by the kernel version and the glibc version, I don't have enough evidence to prove it.
I will try the Python 3.7.6 to the newest version in Ubuntu 19.10 with Kernel 5.3.0-29 and GCC 8.3.0
|
msg365035 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-26 00:24 |
If someone manages to reproduce the bug, please provide *at least* the Python traceback. You can use faulthandler to get it. It would also be very useful to get the C call stack using gdb (where command in gdb).
In the meanwhile, I close the issue since only the reporter reproduced the crashed.
Please provide more information about your OS, OS version, how you installed Python, Python package version, etc. if you want to reopen the issue.
|
msg387246 - (view) |
Author: Gilles Duboscq (gilles-duboscq) |
Date: 2021-02-18 17:13 |
I'm not sure it's the same but we have seen stack traces looking like the one there: https://github.com/astropy/astropy/issues/9699
Current thread 0x00007fffa857e3c0 (most recent call first):
File "/sw/lib/python3.7/socket.py", line 748 in getaddrinfo
File "/sw/lib/python3.7/socket.py", line 707 in create_connection
File "/sw/lib/python3.7/http/client.py", line 938 in connect
File "/sw/lib/python3.7/http/client.py", line 966 in send
File "/sw/lib/python3.7/http/client.py", line 1026 in _send_output
File "/sw/lib/python3.7/http/client.py", line 1247 in endheaders
File "/sw/lib/python3.7/http/client.py", line 1298 in _send_request
File "/sw/lib/python3.7/http/client.py", line 1252 in request
For us it happens when using multiprocessing: the main process forks 2 processes and both use urlopen at roughly the same time. We are seeing this on Python 3.7.2 on macOS 10.14.3.
|
msg387249 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-02-18 17:27 |
"Segmentation fault with (...) threads (...) getaddrinfo"
Aha, another victim on a getaddrinfo() implementation which is not thread safe.
See this code in Modules/socketmodule.c:
/* Lock to allow python interpreter to continue, but only allow one
thread to be in gethostbyname or getaddrinfo */
#if defined(USE_GETHOSTBYNAME_LOCK)
static PyThread_type_lock netdb_lock;
#endif
Can you please check if your Python was built with HAVE_GETHOSTBYNAME_R?
$ python3
Python 3.9.1 (default, Jan 20 2021, 00:00:00)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
>>> import sysconfig; repr(sysconfig.get_config_var('HAVE_GETHOSTBYNAME_R'))
'1'
Modules/socketmodule.c is full of #ifdef involving macOS...
|
msg387250 - (view) |
Author: Gilles Duboscq (gilles-duboscq) |
Date: 2021-02-18 17:46 |
I get '0' so it was not built with HAVE_GETHOSTBYNAME_R.
|
msg387253 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-02-18 18:01 |
Oh wait, I removed that lock:
commit 0de437de6210c2b32b09d6c47a805b23d023bd59
Author: Victor Stinner <vstinner@python.org>
Date: Thu May 28 17:23:39 2020 +0200
bpo-25920: Remove socket.getaddrinfo() lock on macOS (GH-20177)
On macOS, socket.getaddrinfo() no longer uses an internal lock to
prevent race conditions when calling getaddrinfo(). getaddrinfo is
thread-safe is macOS 10.5, whereas Python 3.9 requires macOS 10.6 or
newer.
The lock was also used on FreeBSD older than 5.3, OpenBSD older than
201311 and NetBSD older than 4.
Please open a new issue specific to macOS: specify your macOS version, your Python version and how you installed or built Python.
|
msg387254 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-02-18 18:03 |
> For us it happens when using multiprocessing: the main process forks 2 processes and both use urlopen at roughly the same time. We are seeing this on Python 3.7.2 on macOS 10.14.3.
From what I understood, calling fork() (and then continue to execute regular Python code) is no longer safe in macOS 10.14 and must no longer be used. Only fork+exec is safe (spawn a new child process), but posix_spawn() is preferred on macOS. But I'm not a macOS expert.
|
msg387259 - (view) |
Author: Gilles Duboscq (gilles-duboscq) |
Date: 2021-02-18 18:19 |
Thanks Victor we'll look into moving away from this pattern.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:27 | admin | set | github: 84034 |
2021-02-18 18:19:35 | gilles-duboscq | set | messages:
+ msg387259 |
2021-02-18 18:03:15 | vstinner | set | messages:
+ msg387254 |
2021-02-18 18:01:09 | vstinner | set | messages:
+ msg387253 |
2021-02-18 17:46:31 | gilles-duboscq | set | messages:
+ msg387250 |
2021-02-18 17:27:42 | vstinner | set | messages:
+ msg387249 |
2021-02-18 17:13:02 | gilles-duboscq | set | nosy:
+ gilles-duboscq messages:
+ msg387246
|
2020-03-26 00:24:53 | vstinner | set | status: open -> closed resolution: out of date messages:
+ msg365035
stage: resolved |
2020-03-23 19:30:26 | Manjusaka | set | messages:
+ msg364890 |
2020-03-23 18:57:54 | vstinner | set | messages:
+ msg364882 |
2020-03-23 18:49:38 | vstinner | set | messages:
+ msg364881 |
2020-03-23 18:29:55 | Manjusaka | set | messages:
+ msg364876 |
2020-03-23 18:21:44 | Manjusaka | set | nosy:
+ Manjusaka messages:
+ msg364873
|
2020-03-11 23:43:57 | vstinner | set | nosy:
+ vstinner
|
2020-03-04 18:38:22 | Anne Archibald | create | |