Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio: Race condition in signal handling on FreeBSD #65844

Closed
vstinner opened this issue Jun 2, 2014 · 21 comments
Closed

asyncio: Race condition in signal handling on FreeBSD #65844

vstinner opened this issue Jun 2, 2014 · 21 comments
Labels
tests Tests in the Lib/test dir topic-asyncio

Comments

@vstinner
Copy link
Member

vstinner commented Jun 2, 2014

BPO 21645
Nosy @gvanrossum, @pitrou, @vstinner, @giampaolo, @1st1

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-07-18.17:25:37.323>
created_at = <Date 2014-06-02.22:35:54.906>
labels = ['tests', 'expert-asyncio']
title = 'asyncio: Race condition in signal handling on FreeBSD'
updated_at = <Date 2014-09-25.03:42:00.185>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2014-09-25.03:42:00.185>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2014-07-18.17:25:37.323>
closer = 'vstinner'
components = ['Tests', 'asyncio']
creation = <Date 2014-06-02.22:35:54.906>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 21645
keywords = ['buildbot']
message_count = 21.0
messages = ['219647', '219650', '221954', '223049', '223052', '223053', '223145', '223146', '223148', '223184', '223227', '223321', '223326', '223340', '223343', '223346', '223350', '223369', '223372', '223423', '227503']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'pitrou', 'vstinner', 'giampaolo.rodola', 'python-dev', 'yselivanov']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue21645'
versions = ['Python 3.4']

@vstinner
Copy link
Member Author

vstinner commented Jun 2, 2014

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.4/builds/191/steps/test/logs/stdio

[ 8/389] test_asyncio
Timeout (1:00:00)!
Thread 0x0000000801407400 (most recent call first):
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/selectors.py", line 494 in select
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/asyncio/base_events.py", line 795 in _run_once
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/asyncio/base_events.py", line 184 in run_forever
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/asyncio/base_events.py", line 203 in run_until_complete
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/test_asyncio/test_streams.py", line 617 in test_read_all_from_pipe_reader
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/case.py", line 577 in run
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/case.py", line 625 in __call__
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/suite.py", line 125 in run
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/suite.py", line 87 in __call__
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/suite.py", line 125 in run
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/suite.py", line 87 in __call__
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/suite.py", line 125 in run
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/suite.py", line 87 in __call__
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/unittest/runner.py", line 168 in run
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/support/init.py", line 1724 in _run_suite
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/support/init.py", line 1758 in run_unittest
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/test_asyncio/init.py", line 29 in test_main
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/regrtest.py", line 1278 in runtest_inner
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/regrtest.py", line 967 in runtest
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/regrtest.py", line 763 in main
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/regrtest.py", line 1562 in main_in_temp_cwd
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/test/main.py", line 3 in <module>
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/runpy.py", line 85 in _run_code
File "/usr/home/buildbot/buildarea/3.4.krah-freebsd/build/Lib/runpy.py", line 170 in _run_module_as_main

@vstinner vstinner added the tests Tests in the Lib/test dir label Jun 2, 2014
@gvanrossum
Copy link
Member

Maybe see Tulip issue 168? That test was added to support that. Maybe the fix doesn't work for this platform?

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jun 30, 2014

New changeset 69d474dab479 by Victor Stinner in branch 'default':
Issue bpo-21645: asyncio: add a watchdog in test_read_all_from_pipe_reader() for
http://hg.python.org/cpython/rev/69d474dab479

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jul 14, 2014

New changeset a0e6a370755f by Victor Stinner in branch 'default':
Issue bpo-21645: Add debug code to analyze a failure on FreeBSD 9
http://hg.python.org/cpython/rev/a0e6a370755f

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jul 14, 2014

New changeset dbf991650441 by Victor Stinner in branch 'default':
Issue bpo-21645: test_asyncio, log debug trace into sys.__stderr__, not in
http://hg.python.org/cpython/rev/dbf991650441

@vstinner
Copy link
Member Author

Buildbot where the issue occurs:
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%203.x/

@vstinner
Copy link
Member Author

I created a new repository just for this issue:
http://hg.python.org/sandbox/issue21645

I added a lot of debug. It looks like a race condition: the SIGCHLD signal is handled in a thread B (a C thread, not a Python), whereas the asyncio event loop is running in a thread A (the main thread, a Python thread).

Debug of the test which blocks:
---

PIPE: (7, 8)
threads [<_MainThread(MainThread, started 34384933888)>]
[pid 78351] _run_once: thread=34384933888
Execute <Handle Task._step() created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/tasks.py:74>
[pid 78351] _run_once: thread=34384933888
Execute <Handle StreamReaderProtocol.connection_made(<_UnixReadPipeTransport fd=7 polling>) created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/unix_events.py:287>
Execute <Handle Future._set_result_unless_cancelled(None) created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/unix_events.py:290>
[pid 78351] _run_once: thread=34384933888
Execute <Handle Task._wakeup(<Future finished result=None>) created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/futures.py:239>
Read pipe 7 connected: (<_UnixReadPipeTransport fd=7 polling>, <asyncio.streams.StreamReaderProtocol object at 0x806e257a8>)
[pid 78351] _run_once: thread=34384933888
Execute <Handle _raise_stop_error(<Task finished coro=<BaseEventLoop.connect_read_pipe() done, defined at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:767> created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:258 result=(<_UnixReadPipeTransport fd=7 polling>, <asyncio.streams.StreamReaderProtocol object at 0x806e257a8>)>) at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:95 created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/futures.py:239>
Execute <Handle _raise_stop_error(<Task finished coro=<BaseEventLoop.connect_read_pipe() done, defined at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:767> created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:258 result=(<_UnixReadPipeTransport fd=7 polling>, <asyncio.streams.StreamReaderProtocol object at 0x806e257a8>)>) at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:95 created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/futures.py:239>: FAIL! _StopError()
add_signal_handler(Signals.SIGCHLD, <bound method SafeChildWatcher._sig_chld of <asyncio.unix_events.SafeChildWatcher object at 0x806e25810>>, ())
PY signal.signal(Signals.SIGCHLD)
[pid 78351] _run_once: thread=34384933888
Execute <Handle Task._step() created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/tasks.py:74>
execute program '/usr/home/buildbot/buildarea/custom.krah-freebsd/build/python'
process '/usr/home/buildbot/buildarea/custom.krah-freebsd/build/python' created: pid 78390
execute program '/usr/home/buildbot/buildarea/custom.krah-freebsd/build/python': <_UnixSubprocessTransport pid=78390>
[pid 78351] _run_once: thread=34384933888
Execute <Handle SubprocessStreamProtocol.connection_made(<_UnixSubprocessTransport pid=78390>) created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_subprocess.py:121>
[pid 78351] _run_once: thread=34384933888
Execute <Handle Task._wakeup(<Future finished result=None>) created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/futures.py:239>
[pid 78351] _run_once: thread=34384933888
Execute <Handle _raise_stop_error(<Task finished coro=<create_subprocess_exec() done, defined at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/subprocess.py:208> created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:258 result=<Process 78390>>) at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:95 created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/futures.py:239>
Execute <Handle _raise_stop_error(<Task finished coro=<create_subprocess_exec() done, defined at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/subprocess.py:208> created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:258 result=<Process 78390>>) at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/base_events.py:95 created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/futures.py:239>: FAIL! _StopError()
[pid 78351] _run_once: thread=34384933888
Execute <Handle Task._step() created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/tasks.py:74>
[pid 78351] _run_once: thread=34384933888
select(None)
(1) import
(2) dump traceback
select(None) -> [(SelectorKey(fileobj=7, fd=7, events=1, data=(<Handle _UnixReadPipeTransport._read_ready() created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/selector_events.py:183>, None)), 1)]
poll took 25.964 ms: [(SelectorKey(fileobj=7, fd=7, events=1, data=(<Handle _UnixReadPipeTransport._read_ready() created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/selector_events.py:183>, None)), 1)]
Execute <Handle _UnixReadPipeTransport._read_ready() created at /usr/home/buildbot/buildarea/custom.krah-freebsd/build/Lib/asyncio/selector_events.py:183>
[pid 78351] _run_once: thread=34384933888
select(None)
(3) write into 8
(4) close 8
(5) exit
[pid=78351] main_thread=34384933888
[78351] C signal.signal(20)
[78351] C signal_handler: sig_num=20, thread=34468857856
C signal_handler: trip signal (tripped=0)
[78351] trip_signal(20)
trip_signal(20); write()
select(None) -> [(SelectorKey(fileobj=5, fd=5, events=1, data=(<Handle BaseSelectorEventLoop._read_from_self()>, None)), 1)]
poll took 4.902 ms: [(SelectorKey(fileobj=5, fd=5, events=1, data=(<Handle BaseSelectorEventLoop._read_from_self()>, None)), 1)]
Execute <Handle BaseSelectorEventLoop._read_from_self()>
[thread=34384933888] _read_from_self
_read_from_self -> b'\x14'
[thread=34384933888] _read_from_self
_read_from_self -> blocking, stop
[pid 78351] _run_once: thread=34384933888
select(None)
trip_signal(20); write() -> 1
[78351] Py_AddPendingCall(checksignals_witharg), thread=34468857856
---

@vstinner
Copy link
Member Author

I'm running tests on the buildbot "AMD64 FreeBSD 9.0 custom":
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20custom/builds/27/steps/test/logs/stdio

@vstinner
Copy link
Member Author

(1) the thread B gets the SIGCHLD signal: it writes a byte into the "self" pipe of the event loop

C signal_handler: sig_num=20, thread=34468857856
trip_signal(20): write()

(2) in the main thread, the event loop is awaken by the write in the pipe... but there is nothing to do

_read_from_self -> b'\x14', thread=34384933888

(3) the thread B schedules the callback

Py_AddPendingCall(checksignals_witharg), thread=34468857856

(I modified the output a little bit for readability.)

IMO the problem is that asyncio relies on two events:

  • signal number written in the "self" pipe of the event loop
  • callback scheduled by the C handler

A solution would be to schedule the callback in the event loop. Since Python 3.3, the C signal handler writes the signal number, which should be enough to find and schedule the Python callback.

@vstinner vstinner changed the title test_read_all_from_pipe_reader() of test_asyncio hangs on FreeBSD 9 Race condition in signal handling on FreeBSD Jul 15, 2014
@python-dev
Copy link
Mannequin

python-dev mannequin commented Jul 16, 2014

New changeset 1ff9ce2204ee by Victor Stinner in branch 'default':
Issue bpo-21645, bpo-21985: Remove debug code
http://hg.python.org/cpython/rev/1ff9ce2204ee

@vstinner vstinner changed the title Race condition in signal handling on FreeBSD asyncio: Race condition in signal handling on FreeBSD Jul 16, 2014
@vstinner
Copy link
Member Author

"A solution would be to schedule the callback in the event loop. Since Python 3.3, the C signal handler writes the signal number, which should be enough to find and schedule the Python callback."

I wrote a patch implementing this idea to fix the issue:
https://code.google.com/p/tulip/issues/detail?id=192

@vstinner
Copy link
Member Author

I ran test_asyncio on my own FreeBSD VM, with the sandbox/issue21645 repository and I got a new (different) error on signal handling: RuntimeError('reentrant call inside') when writing into sys.stderr. To be fair: this bug was introduced by my changes in this repository adding debug traces ("logger.error("handle signal %s" % sig)").

IMO the Py_AddPendingCall() function in Python is dangerous: the callback should be "signal-safe" but this concept is not defined Python. In the C language, there is a strict list of allowed functions, in Python it's unclear: the reentrant call is just an example. It's probably easy to deadlock if you call the "wrong" function".

Py_AddPendingCall() injects arbitrary Python code between *any* other Python bytecode, it's like the evil greenlet with greenthreads.

That's why I consider that it would be safer to call _handle_signal() from the event loop, not using Py_AddPendingCall():
https://code.google.com/p/tulip/issues/detail?id=192

Output of the new bug on FreeBSD:
-----------
[1085] C signal_handler: sig_num=20, thread=34380739584
C signal_handler: trip signal (tripped=0)
[1085] trip_signal(20)
trip_signal(20); write()
trip_signal(20); write() -> 1
[1085] Py_AddPendingCall(checksignals_witharg), thread=34380739584
--- Logging error ---

Traceback (most recent call last):
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 980, in emit
    stream.write(self.terminator)
RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 980, in emit
    stream.write(self.terminator)
  File "/usr/home/haypo/prog/python/default/Lib/asyncio/unix_events.py", line 95, in _handle_signal
    logger.error("handle signal %s" % sig)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 1302, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 1408, in _log
    self.handle(record)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 1418, in handle
    self.callHandlers(record)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 1480, in callHandlers
    hdlr.handle(record)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 852, in handle
    self.emit(record)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 983, in emit
    self.handleError(record)
  File "/usr/home/haypo/prog/python/default/Lib/logging/__init__.py", line 904, in handleError
    sys.stderr.write('--- Logging error ---\n')
RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>
Call stack:
  File "/usr/home/haypo/prog/python/default/Lib/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/home/haypo/prog/python/default/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/home/haypo/prog/python/default/Lib/test/__main__.py", line 3, in <module>
    regrtest.main_in_temp_cwd()
  File "/usr/home/haypo/prog/python/default/Lib/test/regrtest.py", line 1562, in main_in_temp_cwd
    main()
  File "/usr/home/haypo/prog/python/default/Lib/test/regrtest.py", line 763, in main
    match_tests=ns.match_tests)
  File "/usr/home/haypo/prog/python/default/Lib/test/regrtest.py", line 978, in runtest
    display_failure=not verbose)
  File "/usr/home/haypo/prog/python/default/Lib/test/regrtest.py", line 1278, in runtest_inner
    test_runner()
  File "/usr/home/haypo/prog/python/default/Lib/test/test_asyncio/__init__.py", line 29, in test_main
    run_unittest(suite())
  File "/usr/home/haypo/prog/python/default/Lib/test/support/__init__.py", line 1758, in run_unittest
    _run_suite(suite)
  File "/usr/home/haypo/prog/python/default/Lib/test/support/__init__.py", line 1724, in _run_suite
    result = runner.run(suite)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/runner.py", line 168, in run
    test(result)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/suite.py", line 87, in __call__
    return self.run(*args, **kwds)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/suite.py", line 125, in run
    test(result)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/suite.py", line 87, in __call__
    return self.run(*args, **kwds)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/suite.py", line 125, in run
    test(result)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/suite.py", line 87, in __call__
    return self.run(*args, **kwds)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/suite.py", line 125, in run
    test(result)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/case.py", line 625, in __call__
    return self.run(*args, **kwds)
  File "/usr/home/haypo/prog/python/default/Lib/unittest/case.py", line 577, in run
    testMethod()
  File "/usr/home/haypo/prog/python/default/Lib/test/test_asyncio/test_streams.py", line 647, in test_read_all_from_pipe_reader
    self.loop.run_until_complete(proc.wait())
  File "/usr/home/haypo/prog/python/default/Lib/asyncio/base_events.py", line 265, in run_until_complete
    self.run_forever()
  File "/usr/home/haypo/prog/python/default/Lib/asyncio/base_events.py", line 238, in run_forever
    self._run_once()
  File "/usr/home/haypo/prog/python/default/Lib/asyncio/base_events.py", line 984, in _run_once
    logger.error("[pid %s] _run_once: thread=%s" % (os.getpid(), threading.get_ident()))
Message: '[pid 1085] _run_once: thread=34380739584'

@vstinner
Copy link
Member Author

IMO we must handle signals correctly when there are more than one thread. On Linux, it looks like the C signal handler is always called from the main thread. On FreeBSD, it looks like the C signal handler can be called in any thread, C thread, Python thread holding the GIL, Python thread not holding the GIL.

I found why test_read_all_from_pipe_reader() is running with more than one thread.

I reproduced the issue on my FreeBSD VM. run_test_server() and run_test_unix_server() functions of test_utils create Python threads, these functions are used in the test_asyncio.

The problem is that the PyThreadState_DeleteCurrent() function (called from t_bootstrap() of _threadmodule.c) may yield the CPU to another thread. Other Python threads are running when the line "MUTEX_UNLOCK(gil_mutex);" is executed in the drop_gil() function, before drop_gil() exits.

t_bootstrap() -> PyThreadState_DeleteCurrent() -> PyEval_ReleaseLock() -> drop_gil(NULL)

It means that Python threads are not deleted immediatly, but may be deleted "later".

@gvanrossum
Copy link
Member

I did my meditation. I now understand the race:

  • main thread is blocked in select() (or poll() etc.)
  • C-level signal handler is called in thread B and writes a byte to self-pipe
  • kernel immediately switches threads
  • main thread wakes up, reads data from self-pipe, goes back to block
  • C-level signal handles calls Py_AddPendingCall() to schedule Python-level signal handler
  • however the main thread is already back asleep

The intention if the code was that the Py_AddPendingCall() would complete before the thread switch. Perhaps a CPython fix could be to call it before writing a byte to the pipe??? But that would require someone else to meditate more on the C code.

I think your patch to Tulip is fine. I wouldn't worry about missing a signal when things are busy -- AFAIK the child process reaper will reap multiple processes.

@vstinner
Copy link
Member Author

The intention if the code was that the Py_AddPendingCall() would complete before the thread switch.

In fact I also expected Py_AddPendingCall() to be called *before* writing the signal number into the "wakeup fd". I guess that before nobody was relying on the "wakeup fd" *and* and the callback.

Perhaps a CPython fix could be to call it before writing a byte to the pipe??? But that would require someone else to meditate more on the C code.

It doesn't work for Tulip which targets Python 3.3.

Oh... I just saw that Python 2.7 calls Py_AddPendingCall() and *then* writes "\0" into wakeup fd... So Python 2.7 does not have the issue!

But yes, we may change the C signal handler in Python 3.5 because it makes more sense to call Py_AddPendingCall() to avoid this specific race condition.

I think your patch to Tulip is fine.

Good :) I will test it on FreeBSD. If it fixes the issue, I will apply it. We will see if it makes all buildbots happy.

I wouldn't worry about missing a signal when things are busy -- AFAIK the child process reaper will reap multiple processes.

Well, Py_AddPendingCall() also has a limit which is smaller and it does *not* warn when signals are dropped :-(

--

I don't know if it can be useful, but there is also the signal.pthread_sigmask() which can be used to "queue" pending signals. Unblocking signals will flush the queue. It might help to mitigate burst of signals in a short window.... I don't know if signal.pthread_sigmask() has a longer queue than Python. And it may be tricky to decide when and where the signals should be blocked and unblocked.

If a signal is blocked and the process gets the signal more than once, the signal is only raised once when the signal is unblocked. I don't know if it is an issue or not if asyncio only calls the signal handler once in this case.

@vstinner
Copy link
Member Author

I don't know if it can be useful, but there is also the signal.pthread_sigmask() which can be used to "queue" pending signals. (...)

Oh, I didn't remember that I opened the issue bpo-12060 to discuss "real time signals". The summary is that the wakeup fd should be used... Which is nice because it is the choice made by asyncio ;-)

FYI I also opened the issue bpo-21998: "asyncio: a new self-pipe should be created in the child process after fork" which is somehow related to this issue.

@gvanrossum
Copy link
Member

The more esoteric system calls you use the more you end up debugging it on
esoteric platforms. :-(

On Thu, Jul 17, 2014 at 9:17 AM, STINNER Victor <report@bugs.python.org>
wrote:

STINNER Victor added the comment:

> I don't know if it can be useful, but there is also the
signal.pthread_sigmask() which can be used to "queue" pending signals. (...)

Oh, I didn't remember that I opened the issue bpo-12060 to discuss "real time
signals". The summary is that the wakeup fd should be used... Which is nice
because it is the choice made by asyncio ;-)

FYI I also opened the issue bpo-21998: "asyncio: a new self-pipe should be
created in the child process after fork" which is somehow related to this
issue.

----------


Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue21645\>


@python-dev
Copy link
Mannequin

python-dev mannequin commented Jul 17, 2014

New changeset c4f053f1b47f by Victor Stinner in branch '3.4':
Python issue bpo-21645, Tulip issue 192: Rewrite signal handling
http://hg.python.org/cpython/rev/c4f053f1b47f

New changeset 2176496951a4 by Victor Stinner in branch 'default':
(Merge 3.4) Python issue bpo-21645, Tulip issue 192: Rewrite signal handling
http://hg.python.org/cpython/rev/2176496951a4

@vstinner
Copy link
Member Author

I commited the patch into Tulip (c149370c8027), Python 3.4 (c4f053f1b47f), Python 3.5 (2176496951a4).

I'm now waiting for the buildbot.

@vstinner
Copy link
Member Author

The test suite passed on FreeBSD 9 buildbot (3.4, 3.5 and my custom builds). I consider that the bug is now fixed.

@python-dev
Copy link
Mannequin

python-dev mannequin commented Sep 25, 2014

New changeset fe456770b454 by Yury Selivanov in branch 'default':
asyncio: Reverting 69d474dab479 as issue bpo-21645 is now closed and debug is no longer needed
https://hg.python.org/cpython/rev/fe456770b454

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-asyncio
Projects
None yet
Development

No branches or pull requests

3 participants