classification
Title: socket's send can raise errno 35 under OS X, which causes problems in sendall
Type: Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: exarkun, mdcowles, neologix, pitrou
Priority: normal Keywords:

Created on 2010-04-21 20:40 by mdcowles, last changed 2010-04-27 19:45 by mdcowles. This issue is now closed.

Messages (18)
msg103908 - (view) Author: Matthew Cowles (mdcowles) Date: 2010-04-21 20:40
[From a question first posted to python-help]

A socket's send function may return 0 if no bytes have been sent. Under at least OS X 10.6.2, it may also raise errno 35 (resource temporarily unavailable) if no network buffers are available. If a Python coder is using socket.send() that's no problem. They can catch the exception and try again. but it makes socket.sendall() (which is implemented as calls to send() ) not very useful. I expect that it would be fairly easy to have it check for that error number in addition to checking for an incomplete send.

As far as I'm aware, it's only OS X that ever does that.
msg103974 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-22 17:44
The problem is that if you run out of "network buffers" (I guess it's equivalent to Unix socket buffers), what do you do ?
If the network or the receiving host is congested, spinning around the send call trying to resend the data isn't going to improve things, you should probably wait a little before retrying. Furthermore, if you get this type of error, it's probably because you're using non-blocking sockets. And if it's the case, you should be prepared to this type of transient error, and use socket.send instead (so that you can sleep, retry, etc if necesssary).
msg104024 - (view) Author: Matthew Cowles (mdcowles) Date: 2010-04-23 15:43
> if you get this type of error, it's probably because you're using non-
> blocking sockets

That's what I thought at first too. But the user's sockets were set to blocking.

> spinning around the send call trying to resend the data isn't going to 
> improve things, you should probably wait a little before retrying

The user switched to using send() and adding a short delay before retrying the send solved the problem.

In fact, I think it's a little silly that OS X raises the error rather than just saying that 0 bytes were sent (which is what I suppose that other OSes do).

But I think it's also not ideal that Python's socket.sendall() can't be used with confidence under OS X because it can fail under pretty normal circumstances.
msg104202 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-26 09:22
> That's what I thought at first too. But the user's sockets were set to blocking.

That's one broken networking stack...

> In fact, I think it's a little silly that OS X raises the error rather than just saying that 0 bytes were sent (which is what I suppose that other OSes do).

"Normal" OS just block inside the send() call whenever socket buffers are full (unless there're set to non-blocking). So you can resume sending as soon as buffer space is available, and you don't have to resort to this send()/fail/sleep/re-send() scheme... 

> But I think it's also not ideal that Python's socket.sendall() can't be used with confidence under OS X because it can fail under pretty normal circumstances.

Agreed, but it's really a OS X issue here. How would you circumvent this problem anyway ? Add a timeout option to sendall() as a hint to how much we should wait before retrying when errno 35 is returned ? It would be really hacky...

Maybe the user could try increasing SO_SNDBUF, but this won't necessarily solve his problem...

@exarkun: ideas on this ?
msg104204 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-26 09:48
> That's what I thought at first too. But the user's sockets were set to 
> blocking.

If you set a timeout on a socket, it is really non-blocking internally (from the OS' point of view). So perhaps this is what you are witnessing.

By the way, rather than sleeping a fixed amount of time before retrying, you could probably use select() on the socket.
msg104205 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-26 09:54
What is the mnemonic corresponding to errno 35 under OS X?
(under Linux I get EDEADLOCK, which probably isn't the right one)
msg104227 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-26 12:32
> What is the mnemonic corresponding to errno 35 under OS X?

EAGAIN
msg104228 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-26 12:44
Matthew, can you confirm whether the socket had a timeout set to it?
(either through settimeout() or setglobaltimeout())

I think this is a bug in Python's socket module.

recv()-like functions are written so as to first call select() before actually receiving data, but send()-like functions aren't. I guess blocking sends are quite rare thanks to in-kernel buffering, but we should do the correct thing and use the same logic for send() as we do for recv().
msg104230 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-26 12:48
> What is the mnemonic corresponding to errno 35 under OS X?
(under Linux I get EDEADLOCK, which probably isn't the right one)

From the first message: "errno 35 (resource temporarily unavailable)". It's actually EAGAIN on Linux (which makes sense on a non-blocking socket).

> By the way, rather than sleeping a fixed amount of time before retrying, you could probably use select() on the socket.

select doesn't mean much for sockets you want to send to, because it will report the socket in the writable set as soon as there's one byte free in the send buffer, so you can very well block (or fail in that case) even when select reports the socket as ready for writting.

> If you set a timeout on a socket, it is really non-blocking internally (from the OS' point of view). So perhaps this is what you are witnessing.

Yes. Maybe it would help to have the sample code ?

Googling a little bit gave this: http://lists.freebsd.org/pipermail/freebsd-hackers/2004-January/005369.html

> > I have written a test program,
> > http://www.infres.enst.fr/~pook/send/server.c, that shows that send does
> > not block on FreeBSD.  It does with Linux and Solaris.
> 
> Do you know what the behaviour of Net- and/or OpenBSD is?

NetBSD is the same as FreeBSD.  I have not tested OpenBSD.
MacOS X is similiar to FreeBSD in that send doesn't block, howver
the send does not give an error: the packet is just thrown away.

So there seems to be an issue on some systems when you run out of socket buffers, and since select doesn't seem to work either, I guess the only option is this sleep-a-little-hoping-the-buffer-gets-drained approach.

> I think this is a bug in Python's socket module.
> recv()-like functions are written so as to first call select() before actually receiving data, but send()-like functions aren't. I guess blocking sends are quite rare thanks to in-kernel buffering, but we should do the correct thing and use the same logic for send() as we do for recv().

When I look at trunk, I see this:

do {
		timeout = internal_select(s, 1);
		n = -1;
		if (timeout)
			break;
#ifdef __VMS
		n = sendsegmented(s->sock_fd, buf, len, flags);
#else
		n = send(s->sock_fd, buf, len, flags);
#endif
		if (n < 0) {
#ifdef EINTR
			/* We must handle EINTR here as there is no way for
			 * the caller to know how much was sent otherwise.  */
			if (errno == EINTR) {
				/* Run signal handlers.  If an exception was
				 * raised, abort and leave this socket in
				 * an unknown state. */
				if (PyErr_CheckSignals())
					return NULL;
				continue;
			}
#endif
			break;
		}
		buf += n;
		len -= n;
	} while (len > 0);

we call internal_select(s, 1) (1 for writting) before sending.
But as I said, it's not reliable.
msg104232 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-26 12:58
[...]
> we call internal_select(s, 1) (1 for writting) before sending.

Oh, sorry, you are right.
msg104235 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-26 13:35
> But as I said, it's not reliable.

I don't see any evidence in support of this statement.  Did you notice that the FreeBSD thread you referenced is:

  * 6 years old
  * about UDP

It's not obvious to me that it's actually relevant here.

> Maybe it would help to have the sample code ?

This seems to be an excellent idea, though.  Without actually knowing what program triggers this behavior, any change is just a wild guess and probably a waste of time.
msg104240 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-26 15:22
> I don't see any evidence in support of this statement. 

From Microfost Windows' documentation:

"The parameter writefds identifies the sockets that are to be checked for writability. If a socket is processing a connect call (nonblocking), a socket is writeable if the connection establishment successfully completes. If the socket is not processing a connect call, writability means a send, sendto, or WSASendto are guaranteed to succeed. However, they can block on a blocking socket if the len parameter exceeds the amount of outgoing system buffer space available. It is not specified how long these guarantees can be assumed to be valid, particularly in a multithreaded environment."

For the Linux kernel, a few years ago a socket would be returned as writable if it had a fixed amount of send socket buffer queue available, so if you send more than is available at that time, you're either going to block, or get an error (now the threshold is 1/2 of the used buffer space, but this doesn't change the problem).
That's why I said it's not reliable.

> Did you notice that the FreeBSD thread you referenced is:

  * 6 years old
  * about UDP

So ? 

> It's not obvious to me that it's actually relevant here.

I'm not saying that it's the source of the problem, I'm just saying that there's a record of send() calls failing on lack of socket buffers, on blocking sockets.
msg104241 - (view) Author: Matthew Cowles (mdcowles) Date: 2010-04-26 15:47
[Replying to various posts]

[neologix]
> That's one broken networking stack...

I'm not disagreeing, but you'd have to take that up with Apple.

> How would you circumvent this problem anyway ?

The code has to go around again in the case of an incomplete send. I imagine that it could go around again if the return value was -1 and errno was 35. Doing that in Python fixed the problem for the user.

[pitrou]
> Matthew, can you confirm whether the socket had a timeout set to it?
> (either through settimeout() or setglobaltimeout())

I double- and triple-checked that the user wasn't setting a timeout. The problem came up in the context of the ftplib module and a look at the code (it was Python 2.6.2) suggests that that module doesn't set a timeout unless you ask it to.

[neologix]
> Maybe it would help to have the sample code ?

I'm sorry but I can't quite tell which code you mean here. The original user's traceback ended with:

File "/Library/Frameworks/Python.framework/Versions/
        2.6/lib/python2.6/ftplib.py",
line 452, in storbinary
    conn.sendall(buf)
    File "<string>", line 1, in sendall
error: [Errno 35] Resource temporarily unavailable

If there's something else that would be useful and I can provide it, I'd be glad to.
msg104242 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-26 15:48
None of that has much relevance when the socket is in *non-blocking* mode.
msg104243 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-26 15:48
> If there's something else that would be useful and I can provide it, I'd be glad to.

A minimal example which reproduces the behavior. :)
msg104272 - (view) Author: Matthew Cowles (mdcowles) Date: 2010-04-26 20:29
> A minimal example which reproduces the behavior. :)

Unfortunately the problem wasn't mine originally. I'm just the guy on python-help who happened to figure out the answer. But if someone can get me access to an FTP server on the other end of a slow link, I'd be glad to do what I can <half-wink>.
msg104276 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-26 20:36
> But if someone can get me access to an FTP server on the other end of a slow link, I'd be glad to do what I can <half-wink>.

It's easy to get a slow FTP server.  Twisted's FTP support lets you do all kinds of customization; making a server that doesn't read very fast (or at all!) would be a snap.

Ultimately, I don't think you should actually need an FTP server to reproduce this, though.  A discard server should work just as well.

> Unfortunately the problem wasn't mine originally. I'm just the guy on python-help who happened to figure out the answer.

Perhaps you can encourage the OP to take a look at this issue and post some code.
msg104349 - (view) Author: Matthew Cowles (mdcowles) Date: 2010-04-27 19:45
Apologies! Further investigation indicates that the user had set a timeout in the ftplib module. I'll close this. In an ideal world, errors due to timeouts would look like they were related to timeouts. But that's a different matter entirely.
History
Date User Action Args
2010-04-27 19:45:28mdcowlessetstatus: open -> closed

messages: + msg104349
2010-04-26 20:36:25exarkunsetmessages: + msg104276
2010-04-26 20:29:59mdcowlessetmessages: + msg104272
2010-04-26 15:48:49exarkunsetmessages: + msg104243
2010-04-26 15:48:08exarkunsetmessages: + msg104242
2010-04-26 15:47:24mdcowlessetmessages: + msg104241
2010-04-26 15:22:37neologixsetmessages: + msg104240
2010-04-26 13:35:12exarkunsetmessages: + msg104235
2010-04-26 12:58:16pitrousetmessages: + msg104232
2010-04-26 12:48:02neologixsetmessages: + msg104230
2010-04-26 12:44:17pitrousetmessages: + msg104228
2010-04-26 12:32:52exarkunsetmessages: + msg104227
2010-04-26 09:54:19pitrousetmessages: + msg104205
2010-04-26 09:48:47pitrousetnosy: + pitrou
messages: + msg104204
2010-04-26 09:22:45neologixsetmessages: + msg104202
2010-04-23 15:43:53mdcowlessetmessages: + msg104024
2010-04-22 17:44:22neologixsetnosy: + exarkun, neologix
messages: + msg103974
2010-04-21 20:40:21mdcowlescreate