Message 311692 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	holger+lp
Recipients	asvetlov, holger+lp, yselivanov
Date	2018-02-05.21:22:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1517865762.95.0.467229070634.issue32776@psf.upfronthosting.co.za>
In-reply-to

Content
I intended to use the asyncio framework for building an end-to-end test for our software. In the test I would spawn somewhere between 5k to 10k processes and have the same number of sockets to manage. When I built a prototype I ran into some scaling issues. Instead of launching our real software I tested it with calls to sleep 30. At some point started processes would finish, a SIGCHLD would be delivered to python and then it would fail: Exception ignored when trying to write to the signal wakeup fd: BlockingIOError: [Errno 11] Resource temporarily unavailable Using strace I saw something like: send(5, "\0", 1, 0) = -1 EAGAIN (Resource temporarily unavailable) waitpid(12218, 0xbf8592d8, WNOHANG) = 0 waitpid(12219, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12219 send(5, "\0", 1, 0) = -1 EAGAIN (Resource temporarily unavailable) waitpid(12220, 0xbf8592d8, WNOHANG) = 0 waitpid(12221, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12221 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12293, si_uid=1001, si_status=0, si_utime=0, si_stime= 0} --- getpid() = 11832 write(5, "\21", 1) = -1 EAGAIN (Resource temporarily unavailable) sigreturn({mask=[]}) = 12221 write(2, "Exception ignored when trying to"..., 64) = 64 write(2, "BlockingIOError: [Errno 11] Reso"..., 61) = 61 Looking at the code I see that si_pid of the signal will be ignored and instead wait(2) will be called for all processes. This doesn't seem to scale well enough for my intended use case. I think what could be done is one of the following: * Switch to signalfd for the event notification? * Take si_pid and instead of just notifying that work is there.. inform about the PID that exited? * Use wait(-1,... if there can be only one SIGCHLD handler to collect any dead child

I intended to use the asyncio framework for building an end-to-end test for our software. In the test I would spawn somewhere between 5k to 10k processes and have the same number of sockets to manage.

When I built a prototype I ran into some scaling issues. Instead of launching our real software I tested it with calls to sleep 30. At some point started processes would finish, a SIGCHLD would be delivered to python and then it would fail:

 Exception ignored when trying to write to the signal wakeup fd:
 BlockingIOError: [Errno 11] Resource temporarily unavailable

Using strace I saw something like:

send(5, "\0", 1, 0)                     = -1 EAGAIN (Resource temporarily unavailable)
waitpid(12218, 0xbf8592d8, WNOHANG)     = 0
waitpid(12219, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12219
send(5, "\0", 1, 0)                     = -1 EAGAIN (Resource temporarily unavailable)
waitpid(12220, 0xbf8592d8, WNOHANG)     = 0
waitpid(12221, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12221
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12293, si_uid=1001, si_status=0, si_utime=0, si_stime=
0} ---
getpid()                                = 11832
write(5, "\21", 1)                      = -1 EAGAIN (Resource temporarily unavailable)
sigreturn({mask=[]})                    = 12221
write(2, "Exception ignored when trying to"..., 64) = 64
write(2, "BlockingIOError: [Errno 11] Reso"..., 61) = 61


Looking at the code I see that si_pid of the signal will be ignored and instead wait(2) will be called for all processes. This doesn't seem to scale well enough for my intended use case.

I think what could be done is one of the following:

* Switch to signalfd for the event notification?
* Take si_pid and instead of just notifying that work is there.. inform about the PID that exited?
* Use wait(-1,... if there can be only one SIGCHLD handler to collect any dead child

History
Date	User	Action	Args
2018-02-05 21:22:42	holger+lp	set	recipients: + holger+lp, asvetlov, yselivanov
2018-02-05 21:22:42	holger+lp	set	messageid: <1517865762.95.0.467229070634.issue32776@psf.upfronthosting.co.za>
2018-02-05 21:22:42	holger+lp	link	issue32776 messages
2018-02-05 21:22:42	holger+lp	create