This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_signal.test_itimer_virtual() failed on AMD64 Fedora Rawhide: Linux kernel 5.15 regression
Type: Stage: resolved
Components: Tests Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: cstratak, hroncok, vstinner
Priority: normal Keywords:

Created on 2021-09-06 17:10 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
setitimer.c vstinner, 2021-09-06 18:02
Messages (10)
msg401162 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 17:10
AMD64 Fedora Rawhide 3.x:
https://buildbot.python.org/all/#/builders/285/builds/685

It is likely test_itimer_virtual() of test_signal which failed.

On the buildbot worker:

$ rpm -q glibc
glibc-2.34.9000-5.fc36.x86_64
$ uname -r
5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64

test.pythoninfo:

platform.libc_ver: glibc 2.34.9000
platform.platform: Linux-5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64-x86_64-with-glibc2.34.9000

Logs:

test test_signal crashed -- Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/libregrtest/runtest.py", line 340, in _runtest_inner
    refleak = _runtest_inner2(ns, test_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/libregrtest/runtest.py", line 297, in _runtest_inner2
    test_runner()
    ^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/libregrtest/runtest.py", line 261, in _test_module
    support.run_unittest(tests)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/support/__init__.py", line 1123, in run_unittest
    _run_suite(suite)
    ^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/support/__init__.py", line 998, in _run_suite
    result = runner.run(suite)
             ^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/runner.py", line 176, in run
    test(result)
    ^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/suite.py", line 122, in run
    test(result)
    ^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/suite.py", line 122, in run
    test(result)
    ^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/suite.py", line 122, in run
    test(result)
    ^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/case.py", line 652, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/case.py", line 569, in run
    result.startTest(self)
    ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/support/testresult.py", line 41, in startTest
    super().startTest(test)
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/runner.py", line 54, in startTest
    self.stream.write(self.getDescription(test))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/unittest/runner.py", line 44, in getDescription
    def getDescription(self, test):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-x86_64/build/Lib/test/test_signal.py", line 739, in sig_vtalrm
    raise signal.ItimerError("setitimer didn't disable ITIMER_VIRTUAL "
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
signal.itimer_error: setitimer didn't disable ITIMER_VIRTUAL timer.


Code of the test:

    def sig_vtalrm(self, *args):
        self.hndl_called = True

        if self.hndl_count > 3:
            # it shouldn't be here, because it should have been disabled.
            raise signal.ItimerError("setitimer didn't disable ITIMER_VIRTUAL "  # <==== HERE
                "timer.")     # <======================================================= HERE
        elif self.hndl_count == 3:
            # disable ITIMER_VIRTUAL, this function shouldn't be called anymore
            signal.setitimer(signal.ITIMER_VIRTUAL, 0)

        self.hndl_count += 1

    # Issue 3864, unknown if this affects earlier versions of freebsd also                
    @unittest.skipIf(sys.platform in ('netbsd5',),                                        
        'itimer not reliable (does not mix well with threading) on some BSDs.')           
    def test_itimer_virtual(self):                                                        
        self.itimer = signal.ITIMER_VIRTUAL
        signal.signal(signal.SIGVTALRM, self.sig_vtalrm)
        signal.setitimer(self.itimer, 0.3, 0.2)

        start_time = time.monotonic()
        while time.monotonic() - start_time < 60.0:
            # use up some virtual time by doing real work
            _ = pow(12345, 67890, 10000019)
            if signal.getitimer(self.itimer) == (0.0, 0.0):
                break # sig_vtalrm handler stopped this itimer
        else: # Issue 8424
            self.skipTest("timeout: likely cause: machine too slow or load too "
                          "high")

        # virtual itimer should be (0.0, 0.0) now
        self.assertEqual(signal.getitimer(self.itimer), (0.0, 0.0))
        # and the handler should have been called
        self.assertEqual(self.hndl_called, True)
msg401170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 18:02
I rewrote the test in C, see attached setitimer.c.

On Fedora 34:
---
$ rpm -q glibc
glibc-2.33-20.fc34.x86_64
$ uname -r
5.13.8-200.fc34.x86_64

$ gcc setitimer.c -o setitimer && ./setitimer
setitimer(<0.5, 0.5>)
SIGVTALRM
SIGVTALRM
SIGVTALRM
SIGVTALRM
setitimer(<0, 0>) ok
SIGVTALRM is disarmed
wait 3 seconds
setitimer(<0, 0>) ok
exit
---

On Fedora Rawhide:
---
$ rpm -q glibc
glibc-2.34.9000-5.fc36.x86_64
$ uname -r
5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64

$ gcc setitimer.c -o setitimer && ./setitimer
setitimer(<0.5, 0.5>)
SIGVTALRM
SIGVTALRM
SIGVTALRM
SIGVTALRM
setitimer(<0, 0>) ok
SIGVTALRM
FATAL ERROR: SIGVTALRM timer not disarmed!
Abandon (core dumped)
---
msg401171 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 18:13
It seems like the kernel was just upgrade from 5.14 to 5.15.rc0:

vstinner@python-builder-rawhide$ date
lun. 06 sept. 2021 14:05:43 EDT

vstinner@python-builder-rawhide$ grep kernel /var/log/dnf.log
2021-09-03T06:35:58-0400 DEBUG Installed: kernel-5.15.0-0.rc0.20210901git9e9fb7655ed5.2.fc36.x86_64
...
2021-09-04T06:31:34-0400 DEBUG Installed: kernel-5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64
...

vstinner@python-builder-rawhide$ rpm -q kernel
kernel-5.14.0-0.rc5.20210813gitf8e6dfc64f61.46.fc36.x86_64
kernel-5.15.0-0.rc0.20210901git9e9fb7655ed5.2.fc36.x86_64
kernel-5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64

vstinner@python-builder-rawhide$ uptime
 14:06:00 up  3:14,  1 user,  load average: 2,47, 2,58, 3,11

vstinner@python-builder-rawhide$ journalctl --list-boots
-1 04d2796c1f374367add41a55f48c7dad Sat 2021-08-28 14:55:16 EDT—Mon 2021-09-06 10:50:57 EDT
 0 1a10ace470bd4016ad19fd25d248fc57 Mon 2021-09-06 10:51:11 EDT—Mon 2021-09-06 14:12:36 EDT
msg401173 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-06 18:32
It may be a Linux kernel 5.15 regression. I'm now trying to report this issue to the Linux kernel.
msg401231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-07 10:44
The regression seems to come from this recent Linux kernel change:
https://github.com/torvalds/linux/commit/406dd42bd1ba0c01babf9cde169bb319e52f6147

If fixing the kernel takes too long, we can temporarily skip the test on Linux kernel 5.15 (or newer).
msg401521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-09 21:20
Python 3.10 is also affected. Example with aarch64 Fedora Rawhide 3.10:
https://buildbot.python.org/all/#builders/636/builds/286
msg401707 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-13 15:45
Update: Frederic Weisbecker proposed a kernel fix:
https://lkml.org/lkml/2021/9/13/1647
msg403292 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-06 10:21
The fix is part of Linux kernel v5.15-rc3:
https://github.com/torvalds/linux/commit/8cd9da85d2bd87ce889043e7b1735723dd10eb89#diff-40ae7979f67be398216a482f7b3937f9847a79e626665082886fb63e898b5247
msg403295 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-06 11:31
I upgraded Fedora Rawhide machines:

* x86-64 now runs: 5.15.0-0.rc4.33.fc36.x86_64
* aarch64 now runs: 5.15.0-0.rc4.33.fc36.aarch64

I also disabled systemd-coredump:
---
$ cat /etc/sysctl.d/50-coredump.conf 
# Disable systemd-coredump
kernel.core_pattern=

$ cat /proc/sys/kernel/core_pattern 
---

core_pattern is now empty.

Previously, /run was out of available inodes because systemd-coredump created one file per crash in /run/systemd/propagate/.
msg403296 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-06 11:35
And the ppc64e machine runs 5.15.0-0.rc4.33.fc36.ppc64le.

This machine was stuck at kernel 5.7, I'm not sure why. I had to regenerate the grub2 configuration.

Ok, all Rawhide buildbots should now run a kernel 5.15 without the ITIMER_VIRTUAL bug. I close the issue.
History
Date User Action Args
2022-04-11 14:59:49adminsetgithub: 89282
2021-10-06 11:35:17vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg403296

stage: resolved
2021-10-06 11:31:13vstinnersetmessages: + msg403295
2021-10-06 10:21:01vstinnersetmessages: + msg403292
2021-09-13 15:45:02vstinnersetmessages: + msg401707
2021-09-09 21:20:35vstinnersettitle: test_signal.test_itimer_virtual() failed on AMD64 Fedora Rawhide 3.x: Linux kernel 5.15 regression -> test_signal.test_itimer_virtual() failed on AMD64 Fedora Rawhide: Linux kernel 5.15 regression
messages: + msg401521
versions: + Python 3.10
2021-09-09 09:12:31hroncoksetnosy: + hroncok
2021-09-07 13:48:01vstinnersettitle: test_signal.test_itimer_virtual() failed on AMD64 Fedora Rawhide 3.x -> test_signal.test_itimer_virtual() failed on AMD64 Fedora Rawhide 3.x: Linux kernel 5.15 regression
2021-09-07 10:44:08vstinnersetmessages: + msg401231
2021-09-06 18:32:17vstinnersetmessages: + msg401173
2021-09-06 18:13:05vstinnersetmessages: + msg401171
2021-09-06 18:02:37vstinnersetfiles: + setitimer.c

messages: + msg401170
2021-09-06 17:11:30vstinnersetnosy: + cstratak
2021-09-06 17:10:56vstinnercreate