Message 331014 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	methane
Recipients	Henrik Bengtsson, methane, vstinner
Date	2018-12-04.08:01:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAEfz+Ty_Oa_UDe6CtGqdxOzmtsCeNw8Qg8fyafgrFc1Eh53TNQ@mail.gmail.com>
In-reply-to	<1543886641.01.0.788709270274.issue35305@psf.upfronthosting.co.za>

Content
> > where the child process (`self.pid == 0`) get stuck while calling _dup2(c2pwrite = 4, 1) which in turn calls os.dup2(a = 4, b = 1). > Doesn't child process get stuck while writing stdout? > > It would also be interesting to understand exactly what causes the stall. Is it indeed the pipe that gets filled up? Is that because the kernel does not respect the pipe limit and just dumps all output at once (> 65,536 bytes), i.e. it is a bug? Or is it that Python or one of its dependencies runs into a race condition because, say, it does not have a chance to set up the parent-child communication before the child (== the kernel) dumps too much data? > In a normal case, when child process succeeded to `exec`, `errpipe_write` must be closed, by CLOEXEC flag. Then, parent process `_eintr_retry_call(os.read, errpipe_read, 1048576)` returns b"". So parent process can read from stdio pipes, and child process can write to stdio pipes more than 65536 bytes. In your case, `errpipe_write` is not closed when `exec` is succeeded. That's kernel bug. Parent process `_eintr_retry_call(os.read, errpipe_read, 1048576)` does not return until child process exits. But child process is blocked when writing to stdout/err more than 65536 bytes. Deadlock happened. > > A BROKEN DESIGN? > > Finally, I don't know if the fact that `/sbin/ldconfig` does not exist but you can yet call it is (i) poorly designed kernel, or (ii) a valid design in the Unix world. I don't know the answer to this and I don't claim one is more correct than the other. I also don't know if there are other kernels out there that does this type of interception. If it is indeed a valid design, then one could argue that Python and other software tools should be able to handle it. FYI, this far I've/we've only hit this issue with Python (>= 2.7.13), maybe because of pure luck. It did not cause a problem in Python (< 2.7.13) and it does not cause a problem if we use subprocess.Popen(..., 'shell = True'). On the other hand, if one would argue that it is a poor design, then would it make sense to protect against by for instance asserting that the executable actually exists before calling it: > I don't know (i) or (II). But I don't think the assertion makes sense. I expect OSError rather than RuntimeError.

>
> where the *child* process (`self.pid == 0`) get stuck while calling _dup2(c2pwrite = 4, 1) which in turn calls os.dup2(a = 4, b = 1).
>

Doesn't child process get stuck while writing stdout?

>
> It would also be interesting to understand exactly what causes the stall.  Is it indeed the pipe that gets filled up?  Is that because the kernel does *not* respect the pipe limit and just dumps all output at once (> 65,536 bytes), i.e. it is a bug?  Or is it that Python or one of its dependencies runs into a race condition because, say, it does not have a chance to set up the parent-child communication before the child (== the kernel) dumps too much data?
>

In a normal case, when child process succeeded to `exec`,
`errpipe_write` must be closed, by CLOEXEC flag.
Then, parent process `_eintr_retry_call(os.read, errpipe_read,
1048576)` returns b"".
So parent process can read from stdio pipes, and child process can
write to stdio pipes more than 65536 bytes.

In your case, `errpipe_write` is not closed when `exec` is succeeded.
That's kernel bug.
Parent process `_eintr_retry_call(os.read, errpipe_read, 1048576)`
does not return until child process exits.
But child process is blocked when writing to stdout/err more than 65536 bytes.
Deadlock happened.

>
> A BROKEN DESIGN?
>
> Finally, I don't know if the fact that `/sbin/ldconfig` does not exist but you can yet call it is (i) poorly designed kernel, or (ii) a valid design in the Unix world.  I don't know the answer to this and I don't claim one is more correct than the other.  I also don't know if there are other kernels out there that does this type of interception.  If it is indeed a valid design, then one could argue that Python and other software tools should be able to handle it.  FYI, this far I've/we've only hit this issue with Python (>= 2.7.13), maybe because of pure luck.  It did not cause a problem in Python (< 2.7.13) and it does not cause a problem if we use subprocess.Popen(..., 'shell = True').  On the other hand, if one would argue that it is a poor design, then would it make sense to protect against by for instance asserting that the executable actually exists before calling it:
>

I don't know (i) or (II).
But I don't think the assertion makes sense.  I expect OSError rather
than RuntimeError.

History
Date	User	Action	Args
2018-12-04 08:01:11	methane	set	recipients: + methane, vstinner, Henrik Bengtsson
2018-12-04 08:01:11	methane	link	issue35305 messages
2018-12-04 08:01:10	methane	create