This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gvanrossum
Recipients David.Edelsohn, db3l, gvanrossum, larry, ncoghlan, neologix, pitrou, python-dev, skrah
Date 2013-10-20.06:37:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1382251060.66.0.0369456211695.issue19293@psf.upfronthosting.co.za>
In-reply-to
Content
I'm trying to let go of the AIX hang. Here's a brain dump of what I've figured out so far.

* There were a lot of red herrings in the early discussion. This hang doesn't seem to have anything to do with nonblocking connect() or sockets, nor even signals.

* Summary of what the test (test_subprocess_interactive) tries to do: it starts an echo subprocess, writes a string to it, reads the string back, writes another string to it, reads that back, and then closes the transport.

* The test hangs after seeing the first string echoed back but not the second, and in between somehow the stdin pipe is broken.

* If I read David's truss log correctly, the following things have happened:

- the parent wrote 'Python ' to the pipe for the subprocess's stdin (this is not shown in the extract but it must have happened because we see the string arrive in the subprocess)
- the echo.py subprocess started and began to read from stdin
- the subprocess read 'Python ' from its stdin
- the subprocess wrote 'Python ' back to its stdout
- poll() in the parent woke up
- the parent allocated some memory and read 'Python ' from the pipe for the subprocess's stdout

At this point apparently the pipe for the subprocess stdin got closed so the subprocess received an EOF (over and over due to the missing test+break).

We also know that the parent now hangs in the last run_until_complete() call, which means that it has at least attempted to write 'The Winner' -- but there is no evidence of this in the truss extract so it is possible that that string is still in the transport's write buffer. It is also possible that David simply missed it in the endless stream of ineffective calls due to the looping bug.

I'm actually curious why it seems that poll() keeps returning 0 in the parent -- shouldn't it have an infinite timeout, since there's nothing left to do?

Another theory is one or more *connection_lost() methods on the protocol are actually being called but the test stubbornly keeps waiting until proto.got_data[1] becomes set.

I'd be very interested in the truss output with the fix to echo.py in place (which is now in the repo).
History
Date User Action Args
2013-10-20 06:37:40gvanrossumsetrecipients: + gvanrossum, db3l, ncoghlan, pitrou, larry, skrah, neologix, python-dev, David.Edelsohn
2013-10-20 06:37:40gvanrossumsetmessageid: <1382251060.66.0.0369456211695.issue19293@psf.upfronthosting.co.za>
2013-10-20 06:37:40gvanrossumlinkissue19293 messages
2013-10-20 06:37:39gvanrossumcreate