This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author holger+lp
Recipients asvetlov, holger+lp, yselivanov
Date 2018-02-05.21:22:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1517865762.95.0.467229070634.issue32776@psf.upfronthosting.co.za>
In-reply-to
Content
I intended to use the asyncio framework for building an end-to-end test for our software. In the test I would spawn somewhere between 5k to 10k processes and have the same number of sockets to manage.

When I built a prototype I ran into some scaling issues. Instead of launching our real software I tested it with calls to sleep 30. At some point started processes would finish, a SIGCHLD would be delivered to python and then it would fail:

 Exception ignored when trying to write to the signal wakeup fd:
 BlockingIOError: [Errno 11] Resource temporarily unavailable

Using strace I saw something like:

send(5, "\0", 1, 0)                     = -1 EAGAIN (Resource temporarily unavailable)
waitpid(12218, 0xbf8592d8, WNOHANG)     = 0
waitpid(12219, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12219
send(5, "\0", 1, 0)                     = -1 EAGAIN (Resource temporarily unavailable)
waitpid(12220, 0xbf8592d8, WNOHANG)     = 0
waitpid(12221, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12221
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12293, si_uid=1001, si_status=0, si_utime=0, si_stime=
0} ---
getpid()                                = 11832
write(5, "\21", 1)                      = -1 EAGAIN (Resource temporarily unavailable)
sigreturn({mask=[]})                    = 12221
write(2, "Exception ignored when trying to"..., 64) = 64
write(2, "BlockingIOError: [Errno 11] Reso"..., 61) = 61


Looking at the code I see that si_pid of the signal will be ignored and instead wait(2) will be called for all processes. This doesn't seem to scale well enough for my intended use case.

I think what could be done is one of the following:

* Switch to signalfd for the event notification?
* Take si_pid and instead of just notifying that work is there.. inform about the PID that exited?
* Use wait(-1,... if there can be only one SIGCHLD handler to collect any dead child
History
Date User Action Args
2018-02-05 21:22:42holger+lpsetrecipients: + holger+lp, asvetlov, yselivanov
2018-02-05 21:22:42holger+lpsetmessageid: <1517865762.95.0.467229070634.issue32776@psf.upfronthosting.co.za>
2018-02-05 21:22:42holger+lplinkissue32776 messages
2018-02-05 21:22:42holger+lpcreate