classification
Title: 3.10 b1 armhf Bus Error in hashlib test: test_gil
Type: crash Stage: resolved
Components: Build Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: 36515 Superseder:
Assigned To: Nosy List: Anthony Sottile, christian.heimes, hroncok, pablogsal, vstinner
Priority: release blocker Keywords:

Created on 2021-05-05 05:31 by Anthony Sottile, last changed 2021-05-25 07:26 by christian.heimes. This issue is now closed.

Messages (10)
msg392973 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2021-05-05 05:31
terribly sorry, I don't have much information to go off on this other than the build logs from deadsnakes.

I retried this build twice and it seems it consistently fails in this position:

https://launchpadlibrarian.net/537139233/buildlog_ubuntu-bionic-armhf.python3.10_3.10.0~b1-1+bionic2_BUILDING.txt.gz

The relevant logs from the build:

```
./python -m test --pgo --timeout=1200 || true
0:00:00 load avg: 2.37 Run tests sequentially (timeout: 20 min)
0:00:00 load avg: 2.37 [ 1/44] test_array
0:00:11 load avg: 2.23 [ 2/44] test_base64
0:00:15 load avg: 2.13 [ 3/44] test_binascii
0:00:16 load avg: 2.13 [ 4/44] test_binop
0:00:17 load avg: 2.13 [ 5/44] test_bisect
0:00:18 load avg: 2.13 [ 6/44] test_bytes
0:01:18 load avg: 1.41 [ 7/44] test_bz2 -- test_bytes passed in 1 min
0:01:25 load avg: 1.35 [ 8/44] test_cmath
0:01:28 load avg: 1.35 [ 9/44] test_codecs
0:01:52 load avg: 1.23 [10/44] test_collections
0:02:10 load avg: 1.16 [11/44] test_complex
0:02:16 load avg: 1.15 [12/44] test_dataclasses
0:02:23 load avg: 1.14 [13/44] test_datetime
0:03:18 load avg: 1.09 [14/44] test_decimal -- test_datetime passed in 54.9 sec
0:05:06 load avg: 1.01 [15/44] test_difflib -- test_decimal passed in 1 min 47 sec
0:05:24 load avg: 1.01 [16/44] test_embed
0:05:27 load avg: 1.01 [17/44] test_float
0:05:32 load avg: 1.01 [18/44] test_fstring
0:05:43 load avg: 1.00 [19/44] test_functools
0:05:47 load avg: 1.00 [20/44] test_generators
0:05:51 load avg: 1.00 [21/44] test_hashlib
Fatal Python error: Bus error

Current thread 0xf7901220 (most recent call first):
  File "/<<PKGBUILDDIR>>/Lib/test/test_hashlib.py", line 842 in test_gil
  File "/<<PKGBUILDDIR>>/Lib/unittest/case.py", line 549 in _callTestMethod
  File "/<<PKGBUILDDIR>>/Lib/unittest/case.py", line 592 in run
  File "/<<PKGBUILDDIR>>/Lib/unittest/case.py", line 652 in __call__
  File "/<<PKGBUILDDIR>>/Lib/unittest/suite.py", line 122 in run
  File "/<<PKGBUILDDIR>>/Lib/unittest/suite.py", line 84 in __call__
  File "/<<PKGBUILDDIR>>/Lib/unittest/suite.py", line 122 in run
  File "/<<PKGBUILDDIR>>/Lib/unittest/suite.py", line 84 in __call__
  File "/<<PKGBUILDDIR>>/Lib/unittest/suite.py", line 122 in run
  File "/<<PKGBUILDDIR>>/Lib/unittest/suite.py", line 84 in __call__
  File "/<<PKGBUILDDIR>>/Lib/test/support/testresult.py", line 169 in run
  File "/<<PKGBUILDDIR>>/Lib/test/support/__init__.py", line 959 in _run_suite
  File "/<<PKGBUILDDIR>>/Lib/test/support/__init__.py", line 1082 in run_unittest
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/runtest.py", line 210 in _test_module
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/runtest.py", line 246 in _runtest_inner2
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/runtest.py", line 282 in _runtest_inner
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/runtest.py", line 154 in _runtest
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/runtest.py", line 194 in runtest
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/main.py", line 423 in run_tests_sequential
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/main.py", line 521 in run_tests
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/main.py", line 694 in _main
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/main.py", line 641 in main
  File "/<<PKGBUILDDIR>>/Lib/test/libregrtest/main.py", line 719 in main
  File "/<<PKGBUILDDIR>>/Lib/test/__main__.py", line 2 in <module>
  File "/<<PKGBUILDDIR>>/Lib/runpy.py", line 86 in _run_code
  File "/<<PKGBUILDDIR>>/Lib/runpy.py", line 196 in _run_module_as_main
```
msg392974 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-05 06:26
Are you able to get a C stack trace somehow, e.g. from a dump or by login into the build system interactively and running gdb? The Python traceback isn't much helpful.
msg392975 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-05 06:29
Have you already tried ./python -X faulthandler -m test ... ?

Could you please also run tests with "./python -m test -j1 ..." in a new job? This executes every test module in a subprocess. I like to see if other tests are segfaulting, too.
msg392983 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-05 07:39
I have talked to our Fedora release engineers. They don't see any issues with armv7hl builds of Python 3.10.0b1. All tests are passing. https://koji.fedoraproject.org/koji/taskinfo?taskID=67234161

Ubuntu's armhf and Fedora's armv7hl both have the platform triplet "arm-linux-gnueabihf". Could the issue be caused by GCC version difference or compiler flags? Fedora uses GCC arch flags -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard.
msg392996 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-05 09:12
Given that this happens in hashlib, could this be related to the OpenSSL version?
msg392997 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-05 09:20
That's possible.

I suspect it's related to test_gil itself. Hash providers don't release the GIL for inputs < 2048 bytes. Hashing of small inputs is faster than releasing and re-acquiring the GIL. For inputs >= 2048 hashlib creates a per-object lock and release the GIL while holding the internal lock. The test case verifies that hashlib works correctly for larger inputs that release the GIL.
msg393000 - (view) Author: Miro HronĨok (hroncok) * Date: 2021-05-05 10:24
In Fedora, our build passed on Fedoras 32, 33, 34, 35.

We have:

F32: gcc 10.2.1, openssl 1.1.1k
F33: gcc 10.3.1, openssl 1.1.1k
F34: gcc 11.1.1, openssl 1.1.1k
F35: gcc 11.1.1, openssl 1.1.1k

Let me know how can I help to debug the difference in environment more.
msg393010 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2021-05-05 14:10
oddly enough, when I add `-X faulthandler` it passes

___

I did some research on the error message and it looks like the ubuntu maintainers have found the same thing and reported it here:

https://bugs.python.org/issue36445

this points at (intentional?) misaligned accesses being a problem on arm

looks like they've provided a patch as well here: https://bugs.python.org/issue36515
msg393016 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-05 15:51
Christian, could you look into that patch?
msg394300 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-25 07:26
The fix was merged into 3.10 almost 3 weeks ago, https://github.com/python/cpython/commit/3b2a45ff95a68acc8276b37678c98740a232f6d4 . I forgot to close this bug.
History
Date User Action Args
2021-05-25 07:26:46christian.heimessetstatus: open -> closed
resolution: fixed
messages: + msg394300

stage: resolved
2021-05-05 18:53:28gregory.p.smithsetdependencies: + unaligned memory access in the _sha3 extension
2021-05-05 15:51:20pablogsalsetmessages: + msg393016
2021-05-05 14:10:37Anthony Sottilesetmessages: + msg393010
2021-05-05 10:24:18hroncoksetnosy: + hroncok
messages: + msg393000
2021-05-05 09:20:21christian.heimessetmessages: + msg392997
2021-05-05 09:12:39pablogsalsetmessages: + msg392996
2021-05-05 07:39:52christian.heimessetmessages: + msg392983
2021-05-05 07:04:40pablogsalsetpriority: normal -> release blocker
2021-05-05 06:29:47christian.heimessetnosy: + vstinner, pablogsal
type: crash
messages: + msg392975
2021-05-05 06:26:01christian.heimessetnosy: + christian.heimes
messages: + msg392974
2021-05-05 05:31:25Anthony Sottilecreate