Author josiahcarlson
Recipients Andrew.Boettcher, ajaksu2, akira, astrand, cvrebert, ericpruitt, eryksun, giampaolo.rodola, janzert, josiahcarlson, ooooooooo, parameter, r.david.murray, rosslagerwall, sbt, techtonik, v+python
Date 2014-04-12.00:03:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1397261040.2.0.656148417625.issue1191964@psf.upfronthosting.co.za>
In-reply-to
Content
I added the chunking for Windows because in manual testing before finishing the patch, I found that large sends on Windows without actually waiting for the result can periodically result in zero data sent, despite a child process that wants to read.

Looking a bit more, this zero result is caused by ov.cancel() followed by ov.getresult() raising an OSError, specifically:
[WinError 995] The I/O operation has been aborted because of either a thread exit or an application request

That causes us to observe on the Python side of things that zero data was sent for some writes, but when looking at what the child process actually received, we discover that some data was actually sent. How much compared to what we thought we sent? That depends. I observed in testing today that the client could receive ~3.5 megs when we thought we had sent ~3 megs.

To make a long story short-ish, using Overlapped IO with WriteFile() and Overlapped.cancel(), without pausing between attempts with either a sleep or something else results in a difference in what we think vs. reality roughly 87% of the time with 512 byte chunks (87 trials out of 100), and roughly 100% of the time with 4096 byte chunks (100 trials out of 100). Note that this is when constantly trying to write data to the pipe. (each trial is as many Popen.write_nonblocking() calls as can complete in .25 seconds)

Inducing a 1 ms sleep between each overlapped.WriteFile() attempt drops the error rate to 0/100 trials and 1/100 trials for 512 byte and 4096 byte writes, respectively. Testing for larger block sizes suggests that 2048 bytes is the largest send that we can push through and actually get correct results.


So, according to my tests, there isn't a method by which we can both cancel an overlapped IO while at the same time guaranteeing that we will account exactly for the data that was actually sent without adding an implicit or explicit delay. Which makes sense as we are basically trying to interrupt another process in their attempts to read data that we said they could read, but doing so via a kernel call that interrupts another kernel call that is doing chunk-by-chunk copies from our write buffer (maybe to some kernel memory then) to their read buffer.

Anyway, by cutting down what we attempt to send at any one time, and forcing delays between attempted sends, we can come pretty close to guaranteeing that child processes don't have any sends that we can't account for. I'll try to get a patch out this weekend that encompasses these ideas with a new test that demonstrates the issue on Windows (for those who want to verify my results).
History
Date User Action Args
2014-04-12 00:04:00josiahcarlsonsetrecipients: + josiahcarlson, astrand, parameter, techtonik, giampaolo.rodola, ajaksu2, ooooooooo, v+python, r.david.murray, cvrebert, ericpruitt, akira, Andrew.Boettcher, rosslagerwall, sbt, janzert, eryksun
2014-04-12 00:04:00josiahcarlsonsetmessageid: <1397261040.2.0.656148417625.issue1191964@psf.upfronthosting.co.za>
2014-04-12 00:04:00josiahcarlsonlinkissue1191964 messages
2014-04-12 00:03:58josiahcarlsoncreate