classification
Title: Socket timeout can cause file-like readline() method to lose data
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, beazley, dabeaz, flox, gregory.p.smith, loewis, martin.panter, ned.deily, pitrou, r.david.murray, rosslagerwall, roysmith
Priority: normal Keywords: patch

Created on 2009-11-14 17:16 by beazley, last changed 2015-10-27 11:52 by martin.panter.

Files
File name Uploaded Description Edit
test-issue7322.py roysmith, 2010-11-20 21:14 test case
7322_v1.patch rosslagerwall, 2010-12-31 08:14 Patch + test for issue
issue7322_new.patch rosslagerwall, 2011-01-13 09:27
i7322.patch rosslagerwall, 2011-01-14 05:21 Disallow further reads after timeout
Messages (27)
msg95245 - (view) Author: David M. Beazley (beazley) Date: 2009-11-14 17:16
Consider a socket that has had a file-like wrapper placed around it 
using makefile()

# s is a socket created previously
f = s.makefile()

Now, suppose that this socket has had a timeout placed on it.

s.settimeout(15)

If you try to read data from f, but nothing is available. You'll 
eventually get a timeout. For example:

f.readline()   # Now, just wait
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File 
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.
py", line 406, in readline
    data = self._sock.recv(self._rbufsize)
socket.timeout: timed out

However, now consider the case where you're reading a line of data, but 
the receiver has only received a partial line and it's waiting for the 
rest of the data to arrive.   For example, type this:

f.readline()

Now, go to the other end of the socket connection and send a buffer with 
no newline character.  For example, send the message "Hello".

Since no newline character has been received, the readline() method will 
eventually fail with a timeout as before.   However, if you now retry 
the read operation f.readline() and send more data such as the message 
"World\n", you'll find that the "Hello" message gets lost.  In other 
words, the repeated readline() operation discards any buffers 
corresponding to previously received line data and just returns the new 
data.

Admittedly this is a corner case, but you probably don't want data to be 
discarded on a TCP connection even if a timeout occurs.

Hope that makes some sense :-).  (It helps to try it out).
msg121777 - (view) Author: Roy Smith (roysmith) Date: 2010-11-20 21:08
I'm looking into this now.  In the meantime, I've opened a marginally-related bug, issue10473
msg121779 - (view) Author: Roy Smith (roysmith) Date: 2010-11-20 21:14
Ataching a test case which demonstrates the bug.
msg121833 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-11-21 02:05
This would seem to be an invalid test case.  It is specifically documented that socket.makefile does not support this: "The socket must be in blocking mode (it can not have a timeout)".

http://docs.python.org/py3k/library/socket.html#socket.socket.makefile

I suppose socket.makefile could initially check the socket and throw an exception if the socket is non_blocking but the program could later change the socket to non_blocking and the same issue would presumably arise.  

Recommend closing as invalid.
msg121839 - (view) Author: Roy Smith (roysmith) Date: 2010-11-21 02:36
This is kind of ugly.  On the one hand, I'm all for adding a check in makefile() to catch it being called on a non-blocking socket.

On the other hand, you are correct that a user could change the mode leter.  Even if we added checks for this in socket.setblocking(), there's plenty of ways to get around that; it's easy to grab a raw file descriptor and do whatever you want with it behind the library's back.

On the third hand, maybe a check could be added to SocketIO.readinto() to verify that the socket was in blocking mode each time it was called?
msg121885 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-11-21 08:45
I see Issue7995 also addresses the issue of accept sockets inheriting nonblocking status and provides a suggested patch.
msg124957 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2010-12-31 08:14
Attached is a patch which fixes the issue.

Instead of allowing the readline method to lose data, it adds a check to SocketIO.readinto() to ensure that the socket does not have a timeout and throws an IOError if it does. Also does the same for SocketIO.write().

I think this is a better approach - just failing immediately when a readline on a nonblocking socket occurs instead of failing sometimes and losing data.
msg124962 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-31 10:47
While this patch looks conformant to the documentation, it is very likely to break code in the wild. Even in the stdlib, there are uses of makefile() + socket timeouts (e.g. in http.client and urllib). It would be better to find a way to make readline() functional even with socket timeouts.
msg126163 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-01-13 09:27
How about this?

Instead of just losing the data that's been read so far in readline(), this patch adds the data as a new field to the exception that is thrown - this way the semantics remain exactly the same but the data is not discarded when a timeout occurs, it is still accessible via the exception.
msg126170 - (view) Author: David Beazley (dabeaz) Date: 2011-01-13 14:10
Have any other programming environments ever had a feature where a socket timeout returns an exception containing partial data?    I'm not aware of one offhand and speaking as a systems programmer, something like this might be somewhat unexpected.

My concern is that in the presence of timeouts, the programmer will be forced to reassemble the message themselves from fragments returned in the exception.  However, one reason for using readline() in the first place is precisely so that you don't have to do that sort of thing.

Is there any reason why the input buffer can't be preserved across calls?   You've already got a file-like wrapper around the socket.  Just keep the unconsumed buffer in that instance.
msg126171 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-13 14:24
This is an interesting approach. The problem is that AFAICT the issue is not limited to readline. If you call e.g. read(10000) and the socket times out after having returned the first 5000 bytes, then those 5000 bytes might get lost as well (depending on specifics e.g. buffer size in the IO stack).

Generally there is no guarantee that a buffered object works "properly" when the raw IO object raises some exception intermittently; perhaps this should be fixed in a systemic way, although this would complicate things quite a bit.

Also, I don't think people try to reuse a socket after a timeout (or even try to salvage whatever data could be read before the timeout); usually they would instead abort the connection and treat the remote resource as unavailable. IMO, that's the one obvious use case for socket timeouts.
msg126172 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-13 14:27
By the way, I recently fixed the makefile() documentation:

“The socket must be in blocking mode; it can have a timeout, but the file object’s internal buffer may end up in a inconsistent state if a timeout occurs.”
(in http://docs.python.org/dev/library/socket.html#socket.socket.makefile)

I also added a small section dedicated to socket timeouts:
http://docs.python.org/dev/library/socket.html#notes-on-socket-timeouts
msg126190 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-01-13 17:47
"""Generally there is no guarantee that a buffered object works "properly" when the raw IO object raises some exception intermittently"""

I disagree.  EINTR is a classic case of this and is something that buffering IO layers deal with all the time.  (readline is just one example of a buffering io layer)

if there is a timeout and we can't determine if there is enough data to return for readline, we should buffer it and not return.

maybe this means we need to disallow readline() with timeouts on unbuffered sockets since we can't determine if data will need to be buffered or not due to such a condition in advance.

The normal behavior for code calling readline() on a socket with a timeout is likely going to be to close it.  Anything else does not make much sense.  (someone may try, but really they're writing their I/O code wrong if they are using a socket timeout a poor form of task switching ;)
msg126192 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-13 18:06
> """Generally there is no guarantee that a buffered object works
> "properly" when the raw IO object raises some exception
> intermittently"""
> 
> I disagree.  EINTR is a classic case of this and is something that
> buffering IO layers deal with all the time.  (readline is just one
> example of a buffering io layer)

EINTR is a different matter. To handle EINTR in Python, it is enough to
call the signal handlers and then retry the system call (that's what is
done in SocketIO.readinto, although FileIO doesn't have such logic).
Only if the signal handler raises an exception (which it probably
shouldn't do, since asynchronous exceptions are very bad) do you abort
the operation.

You can't apply the same logic to a socket timeout; the timeout is
really an error condition and you certainly shouldn't retry the system
call (that would defeat the point of using a timeout). So, to handle it
in an entirely correct way, you need to add some out-of-band buffering
logic where you store the pending raw reads which have been done but
could not be returned to the user. That complicates things quite a bit,
especially given that it has to be grafted on at least two layers of the
IO stack (the raw IO layer, and the buffered IO layer). Ross' patch does
it, but incompletely (it lets the user handle the out-of-band data) and
only for readline() (while buffered read() would probably need it too).

> The normal behavior for code calling readline() on a socket with a
> timeout is likely going to be to close it.  Anything else does not
> make much sense.  (someone may try, but really they're writing their
> I/O code wrong if they are using a socket timeout a poor form of task
> switching ;)

That's my opinion too. So, instead, of doing the above surgery inside
the IO stack, the SocketIO layer could detect the timeout and disallow
further access. What do you think?
msg126197 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-01-13 20:25
> That complicates things quite a bit,
> especially given that it has to be grafted on at least two layers of the
> IO stack (the raw IO layer, and the buffered IO layer).

Also the TextIO layer I think.

> That's my opinion too. So, instead, of doing the above surgery inside
> the IO stack, the SocketIO layer could detect the timeout and disallow
> further access. What do you think?

So after a timeout occurs the file-object basically becomes worthless? Would it make sense to automatically call the close method of the file-object after this occurs?
msg126198 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-13 20:34
> > That's my opinion too. So, instead, of doing the above surgery inside
> > the IO stack, the SocketIO layer could detect the timeout and disallow
> > further access. What do you think?
> 
> So after a timeout occurs the file-object basically becomes worthless?
> Would it make sense to automatically call the close method of the
> file-object after this occurs?

Actually, we only need to forbid further reads (writes would always
work). I think we should still let the user call the close method
themselves.
msg126228 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-01-14 05:21
Attached patch disallows further reads after a timeout.
msg126296 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-14 19:51
Looks good and simple enough. I would probably shift the timeout test after the closed test, but that's almost a detail.
msg126297 - (view) Author: David Beazley (dabeaz) Date: 2011-01-14 20:00
Just wanted to say that I agree it's nonsense to continue reading on a socket that timed out (I'm not even sure what I might have been thinking when I first submitted this bug other than just experimenting with edge cases of the socket interface).    It's still probably good to precisely specify what the behavior is in any case.
msg129468 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-25 23:15
Committed in r88622 (3.3) and r88623 (3.2). The 2.7 implementation is too different for the patch to apply, so if you want to fix it too, feel free to upload a patch. Thank you!
msg146025 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-10-20 16:27
This patch has caused a non-trivial regression between 3.2 and 3.2.1.  The scenario in which I observed it is poplib.  I create a POP3 connection with a timeout.  At one point in its processing, poplib is reading lines until it gets a line '.\r\n', at which point the transaction is complete and it returns data to the caller.  If the pop server fails to terminate the transaction, we get a timeout on the read.  However, the POP server may still be alive, it may just have failed to close the transaction (servers have been observed in the wild that do this[*]).  Before this patch, one could catch the socket.timeout and recover from the failed transaction (loosing the transaction data, but that's OK because the transaction was incomplete...it would be better to get the partial transaction, but that's a poplib issue, not a socket issue).  One could then continue processing, sending new transactions to the POP server and getting responses.  After the patch, once the socket error is raised there is no way to continue poplib processing short of tearing down the connection and rebuilding it, and restarting the POP processing from the beginning.

Now, this is clearly an abnormal situation (a POP server randomly not completing its transactions), but it was observed in the wild, and does represent a regression.  I think that Antoine's idea of making readline functional despite timeouts was the better approach.

Also note that Antoine's change to the makefile documentation is wrong with this patch in place, since a timeout invalidates the makefile rather than just "leaving the internal buffers in an inconsistent state".

Backing out this patch would probably be better than leaving it in place, if a better fix can't be found.

[*] The regression was detected testing against a test POP server designed to exhibit defective behaviors that have been observed over the years by the maintainers of the test server.  I can't point to specific existing servers that exhibit the broken behavior, but it did happen in the past and no doubt someone will write a buggy POP server that has the same broken behavior some time in the future as well.
msg146038 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-20 17:38
> One could then continue processing, sending new transactions to the POP 
> server and getting responses. 

That's optimistic. You don't know how much data has been lost in readline(). Sure, again your test server, it happens to work :) But again other kinds of failing servers, your protocol session would end up confused.
So the only robust option is the following:

> tearing down the
> connection and rebuilding it, and restarting the POP processing from
> the beginning
msg146040 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-10-20 18:01
I don't think it is optimistic.  The poplib transaction pattern is: send a command, get a response.  If the response is not properly terminated, throw it away.  Send a new command, get a response.  There's no ambiguity there.  In addition, this is a common tcp client-server model, so I think it applies more widely than just poplib.

Please note that the timeout is *not* because the socket data transmission has timed out and data was lost in transit.  There are no partially filled readline buffers in this scenario.  The timeout is because the client is waiting for a *line* of data that the server never sends.  Again, this is likely to be a common failure mode in tcp client/server applications, and to my mind is exactly what the timeout parameter to the constructor is most useful for.
msg146048 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-10-20 20:33
If the server failed to close a transaction the protocol stream is over
unless you mime relying on hope and luck. Poplib has a nasty set of server
implementation bugs to work around here.

Readline as defined today no longer suits its needs but I still strongly
believe the behavior of shutting reading down after a timeout is a good one.

One thing that would solve your common case: don't shut down reading on a
readline timeout if zero data was received into the internal line buffer.
Readline could indicate this by modifying the timeout exception being raised
to indicate if it can recover or not. The flag should only be set if it was
unrecoverable.
On Oct 20, 2011 11:01 AM, "R. David Murray" <report@bugs.python.org> wrote:

>
> R. David Murray <rdmurray@bitdance.com> added the comment:
>
> I don't think it is optimistic.  The poplib transaction pattern is: send a
> command, get a response.  If the response is not properly terminated, throw
> it away.  Send a new command, get a response.  There's no ambiguity there.
>  In addition, this is a common tcp client-server model, so I think it
> applies more widely than just poplib.
>
> Please note that the timeout is *not* because the socket data transmission
> has timed out and data was lost in transit.  There are no partially filled
> readline buffers in this scenario.  The timeout is because the client is
> waiting for a *line* of data that the server never sends.  Again, this is
> likely to be a common failure mode in tcp client/server applications, and to
> my mind is exactly what the timeout parameter to the constructor is most
> useful for.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue7322>
> _______________________________________
>
msg146049 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-10-20 20:40
Your suggestion sounds good to me.

I still think that it is a common failure mode in a client server transaction for the server to fail to send a (complete) line that the client is expecting, and vice versa, requiring a timeout, but not necessarily a "restart from scratch".  Often the client/server protocol has a useful checkpoint to restart from short of start from scratch.  In the case of many protocols, that would be "client issues a new command".
msg253437 - (view) Author: David Beazley (dabeaz) Date: 2015-10-25 21:28
This bug is still present in Python 3.5, but it occurs if you attempt to do a readline() on a socket that's in non-blocking mode.  In that case, you probably DO want to retry at a later time (unlike the timeout case).
msg253530 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-27 11:52
IMO it might make sense in some cases to disallow subsequent reading from a buffered socket reader (or probably any BufferedReader) that raises an exception (not just a timeout). But the restriction seems unnecessary for unbuffered raw readers, and it also seems to go against the “consenting adults” philosophy for David Murray’s test server case.

David Beazley: For non-blocking sockets, the documentation currently says “the socket must be in blocking mode”. I’m not sure why that restriction is necessary; maybe it could be lifted at least for raw unbuffered streams.

Maybe you could make an argument for caching the partial data in the BufferedReader if a timeout (or no more non-blocking data, or other exception) occurs. The biggest problem is that it could mean storing more than the normal buffer size. I would think this would be a new feature (for 3.6+) rather than a behavioural bug fix though.

And see also Issue 13322 about inconsistencies with buffered reading non-blocking streams in general.
History
Date User Action Args
2015-10-27 11:52:45martin.pantersetnosy: + martin.panter
messages: + msg253530
2015-10-25 21:28:19dabeazsetmessages: + msg253437
2011-10-20 20:40:43r.david.murraysetmessages: + msg146049
2011-10-20 20:33:31gregory.p.smithsetmessages: + msg146048
2011-10-20 20:03:47floxsetnosy: + flox
2011-10-20 18:01:14r.david.murraysetmessages: + msg146040
2011-10-20 17:38:34pitrousetmessages: + msg146038
2011-10-20 16:27:47r.david.murraysetstatus: closed -> open

nosy: + r.david.murray
messages: + msg146025

resolution: fixed ->
stage: resolved ->
2011-02-25 23:15:19pitrousetstatus: open -> closed
versions: + Python 3.2
nosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg129468

resolution: fixed
stage: patch review -> resolved
2011-01-14 20:00:50dabeazsetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126297
2011-01-14 19:51:18pitrousetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126296
2011-01-14 05:21:00rosslagerwallsetfiles: + i7322.patch
nosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126228
2011-01-13 20:34:58pitrousetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126198
2011-01-13 20:25:02rosslagerwallsetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126197
2011-01-13 18:06:31pitrousetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126192
2011-01-13 17:47:14gregory.p.smithsetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126190
2011-01-13 14:27:50pitrousetnosy: loewis, beazley, gregory.p.smith, amaury.forgeotdarc, roysmith, pitrou, ned.deily, dabeaz, rosslagerwall
messages: + msg126172
2011-01-13 14:24:09pitrousetversions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2
nosy: + amaury.forgeotdarc

messages: + msg126171

stage: needs patch -> patch review
2011-01-13 14:10:29dabeazsetnosy: + dabeaz
messages: + msg126170
2011-01-13 09:27:59rosslagerwallsetfiles: + issue7322_new.patch
nosy: loewis, beazley, gregory.p.smith, roysmith, pitrou, ned.deily, rosslagerwall
messages: + msg126163
2010-12-31 10:47:38pitrousetassignee: gregory.p.smith ->
messages: + msg124962
nosy: loewis, beazley, gregory.p.smith, roysmith, pitrou, ned.deily, rosslagerwall
2010-12-31 08:26:53rosslagerwallsetnosy: + pitrou, loewis
2010-12-31 08:15:00rosslagerwallsetfiles: + 7322_v1.patch

nosy: + rosslagerwall
messages: + msg124957

keywords: + patch
2010-11-21 08:45:24ned.deilysetmessages: + msg121885
2010-11-21 02:36:03roysmithsetmessages: + msg121839
2010-11-21 02:05:01ned.deilysetnosy: + ned.deily
messages: + msg121833
2010-11-20 21:14:11roysmithsetfiles: + test-issue7322.py

messages: + msg121779
2010-11-20 21:08:57roysmithsetnosy: + roysmith
messages: + msg121777
2010-07-11 10:25:15BreamoreBoysetstage: needs patch
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-11-14 20:16:10gregory.p.smithsetpriority: normal
assignee: gregory.p.smith

nosy: + gregory.p.smith
2009-11-14 17:16:31beazleycreate