classification
Title: runtime/interp/thread state refactoring leads to segmentation fault
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: eric.snow Nosy List: Quentin.Pradet, SethMichaelLarson, eric.snow
Priority: normal Keywords:

Created on 2022-01-09 20:48 by Quentin.Pradet, last changed 2022-01-12 15:29 by eric.snow. This issue is now closed.

Messages (12)
msg410166 - (view) Author: Quentin Pradet (Quentin.Pradet) * Date: 2022-01-09 20:48
Since https://github.com/python/cpython/commit/32a67246b0d1e08cd50fc3bfa58052cfeb515b2e which was introduced through https://bugs.python.org/issue46008 and shipped as part of alpha 3, the urllib3 test suite reliably crashes on Fedora 35 (ie. gives a segfault).

I've narrowed the reproducer code down to https://github.com/pquentin/urllib3/blob/segfault/test/test_reproduce.py, but that still requires cffi, pyOpenSSL and Tornado.

The segfault either happens in the `ssl` module or the `selectors` module:

```
================================================= test session starts =================================================
platform linux -- Python 3.11.0a2+, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/q/pub/urllib3, configfile: setup.cfg
collected 1 item                                                                                                      

test/test_reproduce.py Fatal Python error: Segmentation fault

Thread 0x00007fee9a250640 (most recent call first):
  File "/home/q/pub/install/lib/python3.11/ssl.py", line 1346 in do_handshake
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/iostream.py", line 1391 in _do_ssl_handshake
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/iostream.py", line 1478 in _handle_read
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/iostream.py", line 696 in _handle_events
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 189 in _handle_events
  File "/home/q/pub/install/lib/python3.11/asyncio/events.py", line 80 in _run
  File "/home/q/pub/install/lib/python3.11/asyncio/base_events.py", line 1858 in _run_once
  File "/home/q/pub/install/lib/python3.11/asyncio/base_events.py", line 591 in run_forever
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 199 in start
  File "/home/q/pub/install/lib/python3.11/threading.py", line 968 in run
  File "/home/q/pub/install/lib/python3.11/threading.py", line 1031 in _bootstrap_inner
  File "/home/q/pub/install/lib/python3.11/threading.py", line 988 in _bootstrap

Extension modules: tornado.speedups, _brotli, _cffi_backend (total: 3)
zsh: segmentation fault (core dumped)  pytest
```

```
================================================= test session starts =================================================
platform linux -- Python 3.11.0a2+, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/q/pub/urllib3, configfile: setup.cfg
collected 1 item                                                                                                      

test/test_reproduce.py Fatal Python error: Segmentation fault

Thread 0x00007fee9a250640 (most recent call first):
  File "/home/q/pub/install/lib/python3.11/ssl.py", line 1346 in do_handshake
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/iostream.py", line 1391 in _do_ssl_handshake
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/iostream.py", line 1478 in _handle_read
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/iostream.py", line 696 in _handle_events
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 189 in _handle_events
  File "/home/q/pub/install/lib/python3.11/asyncio/events.py", line 80 in _run
  File "/home/q/pub/install/lib/python3.11/asyncio/base_events.py", line 1858 in _run_once
  File "/home/q/pub/install/lib/python3.11/asyncio/base_events.py", line 591 in run_forever
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 199 in start
  File "/home/q/pub/install/lib/python3.11/threading.py", line 968 in run
  File "/home/q/pub/install/lib/python3.11/threading.py", line 1031 in _bootstrap_inner
  File "/home/q/pub/install/lib/python3.11/threading.py", line 988 in _bootstrap

Extension modules: tornado.speedups, _brotli, _cffi_backend (total: 3)
zsh: segmentation fault (core dumped)  pytest
```

I can work on a better reproducer, but thought this was already interesting as https://github.com/python/cpython/pull/29977 states there should have been zero change in behavior.
msg410167 - (view) Author: Quentin Pradet (Quentin.Pradet) * Date: 2022-01-09 20:51
Sorry, I pasted the same crash twice. Here's the segmentation fault in `selectors`:

```
============================================================================================================= test session starts =============================================================================================================
platform linux -- Python 3.11.0a2+, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/q/pub/urllib3, configfile: setup.cfg
collected 1 item                                                                                                                                                                                                                              

test/test_reproduce.py Fatal Python error: Segmentation fault

Thread 0x00007f9683378640 (most recent call first):
  File "/home/q/pub/install/lib/python3.11/selectors.py", line 416 in select
  File "/home/q/pub/install/lib/python3.11/asyncio/base_events.py", line 1822 in _run_once
  File "/home/q/pub/install/lib/python3.11/asyncio/base_events.py", line 591 in run_forever
  File "/home/q/pub/urllib3/venv/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 199 in start
  File "/home/q/pub/install/lib/python3.11/threading.py", line 968 in run
  File "/home/q/pub/install/lib/python3.11/threading.py", line 1031 in _bootstrap_inner
  File "/home/q/pub/install/lib/python3.11/threading.py", line 988 in _bootstrap

Extension modules: tornado.speedups, _brotli, _cffi_backend (total: 3)
zsh: segmentation fault (core dumped)  pytest
```

(Those results are produced with https://github.com/python/cpython/commit/32a67246b0d1e08cd50fc3bfa58052cfeb515b2e)
msg410227 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-10 16:30
I'll look into this today.  To reproduce, I should run the urllib3 test suite?
msg410231 - (view) Author: Quentin Pradet (Quentin.Pradet) * Date: 2022-01-10 17:01
Yes, exactly. `pip install nox && nox -Rs test-3.11`. My reproducer is still too big to be useful, and I failed to reproduce this on Docker. If you can't reproduce, I'll work on the reproducer more. Thanks!
msg410252 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-10 20:28
I must be missing something.  Here is what I did:

```
$ cd cpython
$ git checkout main
$ make -j8
$ ./python -v venv ../venv-urllib3
$ cd ..
$ git clone https://github.com/urllib3/urllib3
$ cd urllib3
$ ../venv-urllib3/bin/python3 -m pip install nox
$ ../venv-urllib3/bin/nox -Rs test-3.11
nox > Running session test-3.11
nox > Session test-3.11 skipped: Python interpreter 3.11 not found.
```

I am not familiar with nox so I'm not sure how to trouble-shoot this.
msg410253 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-10 20:32
It looks like the urllib3 CI is passing for 3.11.0a3: https://github.com/urllib3/urllib3/runs/4762856431 (Ubuntu 3.11-dev test-3.11).

Is there a urllib3 issue number you could point me at?
msg410254 - (view) Author: Quentin Pradet (Quentin.Pradet) * Date: 2022-01-10 20:43
We haven't opened an issue for this, but discussed it in Discord, sorry. It also does not always crash on GitHub Actions. Here's an example of a crash: https://github.com/urllib3/urllib3/runs/4740730329?check_suite_focus=true

If nox can't find your Python version and `nox --no-venv -Rs test-3.11` does not run your version of Python 3.11, you can always execute those commands directly:

pip install -r dev-requirements.txt
pip install ".[socks,secure,brotli]"
pytest

If that does not crash on your system, I'd appreciate if you could tell me how I could help get this fixed besides simplifying the reproducer. Thanks.
msg410348 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-11 23:05
> pip install -r dev-requirements.txt
> pip install ".[socks,secure,brotli]"
> pytest

I was able to reproduce the crash with these steps.  Thanks!
msg410362 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-11 23:42
FYI, I get the crash with just:

 pytest test/contrib/test_pyopenssl.py::TestHTTPS::test_verify_none_and_bad_fingerprint
msg410370 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-12 00:33
It looks like PyThreadState.async_exc is getting set to 0x01 somewhere.  There isn't any code like that in https://github.com/python/cpython/commit/32a67246b0d1e08cd50fc3bfa58052cfeb515b2e.  However, the struct layout of PyThreadState did change slightly in that commit.

The ABI is generally not stable until the first beta (and sometimes a bit after that).  Could it be that the wheel for one or more the dependencies was built against an earlier 3.11 release (with the previous PyThreadState layout)?  

If I move fields around in PyThreadState just right, I can no longer reproduce the problem.
msg410384 - (view) Author: Quentin Pradet (Quentin.Pradet) * Date: 2022-01-12 05:38
Oh my god. You're right, I had a cffi wheel compiled for Python 3.11 before that commit. But the wheel was not coming from PyPI, it was coming from pip's own cache! And we recently enabled pip's cache in GitHub Actions too.

So the wheel compiled locally for alpha 2 got reused for alpha 3, and 💥.

Sorry for the noise but thanks a lot for your help, this was really baffling. Closing!
msg410402 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2022-01-12 15:29
I'm glad we were able to figure it out relatively quickly...and without any big headaches. :)

And thanks for testing against the alpha releases!!!  You're having a positive impact.
History
Date User Action Args
2022-01-12 15:29:19eric.snowsetresolution: not a bug
messages: + msg410402
2022-01-12 05:38:40Quentin.Pradetsetstatus: open -> closed

messages: + msg410384
stage: needs patch -> resolved
2022-01-12 00:33:36eric.snowsetmessages: + msg410370
2022-01-11 23:42:19eric.snowsetmessages: + msg410362
2022-01-11 23:05:18eric.snowsetmessages: + msg410348
2022-01-10 20:43:19Quentin.Pradetsetmessages: + msg410254
2022-01-10 20:32:33eric.snowsetmessages: + msg410253
2022-01-10 20:28:21eric.snowsetmessages: + msg410252
2022-01-10 17:01:12Quentin.Pradetsetmessages: + msg410231
2022-01-10 16:30:25eric.snowsetassignee: eric.snow
messages: + msg410227
stage: needs patch
2022-01-09 20:51:32Quentin.Pradetsetmessages: + msg410167
2022-01-09 20:48:32Quentin.Pradetcreate