classification
Title: asyncio: Creating many subprocess generates lots of internal BlockingIOError
Type: Stage: resolved
Components: asyncio Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Sebastian.Kreft.Deezer, giampaolo.rodola, gvanrossum, pitrou, python-dev, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2014-05-28 16:18 by Sebastian.Kreft.Deezer, last changed 2014-06-19 11:10 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
test_subprocess_error.py Sebastian.Kreft.Deezer, 2014-05-28 16:18
asyncio_read_from_self.patch vstinner, 2014-05-28 21:13 review
asyncio_read_from_self_test.patch vstinner, 2014-06-06 12:19 review
Messages (9)
msg219288 - (view) Author: Sebastian Kreft (Sebastian.Kreft.Deezer) Date: 2014-05-28 16:18
Using the asyncio.create_subprocess_exec, generates lost of internal error messages. These messages are:

Exception ignored when trying to write to the signal wakeup fd:
BlockingIOError: [Errno 11] Resource temporarily unavailable

Getting the messages depeneds on how many subprocesses are active at the same time. In my system (Debian 7, kernel 3.2.0-4-amd64, python 3.4.1), with 3 or less processes at the same time I don't see any problem, but with 4 or more I got lot of messages.

On the other hand, these error messages seem to be innocuous, as no exception seems to be raised.

Attached is a test script that shows the problem.

It is run as:
bin/python3.4 test_subprocess_error.py <MAX_PROCESSES> <ITERATIONS>

it requires to have the du command.


Let me know if there are any (conceptual) mistakes in the attached code.
msg219299 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-28 21:13
"Exception ignored when trying to write to the signal wakeup fd" message comes from the signal handler in Modules/signalmodule.c. The problem is that Python gets a lot of SIGCHLD signals (the test scripts creates +300 processes per second on my computer). The producer (signal handler writing the signal number into the "self" pipe) is faster than the consumer (BaseSelectorEventLoop._read_from_self callback).

Attached patch should reduce the risk of seeing the message "Exception ignored when trying to write to the signal wakeup fd". The patch reads all pending of the self pipe, instead of just trying to read a signal byte.

The test script doesn't write the error message anymore when the patch is applied (the script creates more than 300 processes per second).

The patch doesn't solve completly the issue. Other possible enhancements:

* Add a flag in the signal handler to notify that a signal was received, and write a single byte until the flag is reset to False. It would avoid to fill the pipe. It requires to implement a custom signal handler implemented in C, different from signal handlers of the Python module.

* Add an higher priority to callbacks of signal handlers. Asyncio doesn't support priority on callbacks right now.

* Increaze the size of the pipe. On Linux, it looks like "fcntl(fd, F_SETPIPE_SZ, size);" can be used. The maximum size is /proc/sys/fs/pipe-max-size (ex: 1 MB of my Fedora 20).
msg219301 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-05-28 21:20
BaseProactorEventLoop._loop_self_reading() uses an overlapped read of 4096 bytes. I don't understand how it wakes up the event loop. When the operation is done, _loop_self_reading() is scheduled with call_soon() by the Future object. Is it enough to wake up the event loop?

Is BaseProactorEventLoop correct?

--

Oh, I forgot to explain this part of asyncio_read_from_self.patch:

+                data = self._ssock.recv(4096)
+                if not data:
+                    break

This break "should never occur". It should only occur if _ssock is no more blocking. But it would be a bug, because this pipe is private and set to non-blocking at its creation.

I chose to add the test because it should not hurt to add it "just in case" (and to avoid an unlimited busy loop).
msg219872 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-06 11:44
Can someone please review asyncio_read_from_self.patch?
msg219873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-06 11:44
Hum, maybe I need to add a unit test for it.
msg219875 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-06 12:19
asyncio_read_from_self_test.patch: Unit test to check that running the loop once reads all bytes. The unit test is ugly: it calls private methods, and it is completly different on UNIX (selector) and on Windows (proactor).

I would prefer to *not* add such test, and just enhance the code (apply  asyncio_read_from_self.patch).
msg220154 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-10 09:13
Can someone please review asyncio_read_from_self.patch?
msg220965 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-06-19 11:00
New changeset 46c251118799 by Victor Stinner in branch '3.4':
Closes #21595: asyncio.BaseSelectorEventLoop._read_from_self() now reads all
http://hg.python.org/cpython/rev/46c251118799

New changeset 513eea89b80a by Victor Stinner in branch 'default':
(Merge 3.4) Closes #21595: asyncio.BaseSelectorEventLoop._read_from_self() now
http://hg.python.org/cpython/rev/513eea89b80a
msg220966 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-19 11:10
I commited  asyncio_read_from_self.patch  into Tulip, Python 3.4 and 3.5. If someone is interested to work on more advanced enhancement, please open a new issue.

Oh by, a workaround is to limit the number of concurrent processes.

Without the patch, "./python test_subprocess_error.py 5 1000" (max: 5 concurrenet processes) emits a lot of "BlockingIOError: [Errno 11] Resource temporarily unavailable" message.

With the patch, I start getting messages with 140 concurrent processes, which is much better :-) IMO more than 100 concurrent processes is crazy, don't do that at home :-) I mean processes with a very short lifetime. The limit is the number of SIGCHLD per second, so the number of processes which end at the same second.
History
Date User Action Args
2014-06-19 11:10:17vstinnersetmessages: + msg220966
2014-06-19 11:00:16python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg220965

resolution: fixed
stage: resolved
2014-06-10 09:13:51vstinnersetmessages: + msg220154
2014-06-06 12:19:07vstinnersetfiles: + asyncio_read_from_self_test.patch

messages: + msg219875
2014-06-06 11:44:51vstinnersetmessages: + msg219873
2014-06-06 11:44:35vstinnersetmessages: + msg219872
2014-06-06 11:44:00vstinnersetcomponents: + asyncio
title: Creating many subprocess generates lots of internal BlockingIOError -> asyncio: Creating many subprocess generates lots of internal BlockingIOError
2014-05-28 21:20:32vstinnersetmessages: + msg219301
2014-05-28 21:13:35vstinnersetfiles: + asyncio_read_from_self.patch

nosy: + gvanrossum, pitrou, vstinner, giampaolo.rodola, yselivanov
messages: + msg219299

keywords: + patch
2014-05-28 16:18:25Sebastian.Kreft.Deezercreate