This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: socket timeouts even in blocking mode
Type: Stage:
Components: Documentation, Library (Lib), Windows Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: georg.brandl, gregory.p.smith, loewis, techtonik
Priority: normal Keywords:

Created on 2009-02-17 10:06 by techtonik, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (11)
msg82311 - (view) Author: anatoly techtonik (techtonik) Date: 2009-02-17 10:06
The code below exits with timeout after about 20 secs on Windows +
Python 2.5.4

import socket
# address of server routable, but offline
server = "192.168.1.2"
s = socket.socket()
s.setblocking(1)
s.connect((server, 139))
s.close()

The output is:

Traceback (most recent call last):
  File "D:\.env\test.py", line 6, in <module>
    s.connect((server, 139))
  File "<string>", line 1, in connect
socket.error: (10060, 'Operation timed out')

If timeout is set to 1 it exits almost immediately. If timeout is large
it waits for about 20 seconds and exits.

I use socket to wait for the network service to appear. The target
machine 192.168.1.2 belongs to local network, but offline.
msg82353 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-17 18:24
Why do you think this is a bug in Python?
msg82355 - (view) Author: anatoly techtonik (techtonik) Date: 2009-02-17 19:14
Because documentation doesn't say that Python should timeout after 20
seconds after entering blocking mode if socket to remote host can not be
opened.
msg82356 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-02-17 19:18
You can't use a connect() call for the purpose of waiting for your 
network to be up.  This has nothing to do with Python.  This is how all 
network APIs work regardless of OS and language.

The "timeout" is due to the network stack being unable to find the 
remote host (read up on ARP) and eventually returning an error.  You 
need to deal with that in your own code.
msg82367 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-17 20:51
> Because documentation doesn't say that Python should timeout after 20
> seconds after entering blocking mode if socket to remote host can not be
> opened.

That's not true: The documentation says "In blocking mode, operations
block until complete." It takes 20 seconds for the connect attempt
to complete (with error 10060), therefore, you block 20s.
msg82370 - (view) Author: anatoly techtonik (techtonik) Date: 2009-02-17 21:30
After rewriting my reply several times I've noticed my mistake, but it
took more time to understand the problem than could be expected for a
language that we all would like to see as easy and intuitive as
possible. That why I still would like to see this bugreport reopened. At
first it seemed that adding missing details to documentation would be
enough, but now I see that this problem can be deeper.

The problem:
As far as I informed, the socket module is the only way to wait for
service on server:139 to appear. socket documentation doesn't reflect
that will happen if network server is down (server is not network adapter).

Analysis:
In this specific timeout condition when server is offline socket.connect
can throw two different errors: "socket.error: (10060, 'Operation timed
out')" and "socket.timeout: timed out"  Which one will fire and should
be catched depends on the visible timeout settings for the socket and on
invisible timeout value of underlying network library. Whichever occurs
first - wins. For example, this code will warn you about network timeout:

import socket
s = socket.socket()
s.settimeout(12.0)
try:
  s.connect(("192.168.1.2", 139))
except socket.timeout:
  print "connect timeout"

But this one won't:

import socket
s = socket.socket()
s.settimeout(120.0)
try:
  s.connect(("192.168.1.2", 139))
except socket.timeout:
  print "connect timeout"

So, for reliable socket programming you should catch both.

Solution:
If there is a possibility for a socket to timeout when it is not
expected then at least it should be documented. Alternative solution
would be to document and merge socket.error: 10060 into socket.timeout
exception.
msg82376 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-17 22:03
10060 is a winsock error, and there are many, MANY more of them. Read
the winsock documentation for details. It's both impossible and
pointless to document them, since some never occur, others aren't
documented by Microsoft well enough in the first place. Many aspects of
the Microsoft TCP stack are also fairly obscure, and can also change
across Windows releases.
msg82387 - (view) Author: anatoly techtonik (techtonik) Date: 2009-02-17 23:07
Isn't it a job of crossplatform programming language to abstract from
low-level platform details?

The scope of this bug is not about handling all possible Winsock errors.
It is about proper handling the sole timeout error from the list
http://www.winsock-error.com/ to make socket.connect() interface
consistent for both windows and linux.

In addition I believe that new socket.create_connection() function is
vulnerable to the same issue and its only a matter of time when somebody
reports that its additional "timeout" argument should be less than
mystic system network timeout value. That's why some sort of generalized
socket.connection_timeout exception is still needed.

BTW, I have tested the behaviour on linux - the system timeout on socket
does occur, but with different error code.

socket.error: (110, 'Connection timed out')

Note that the error message is different too. That means that to
properly wait for service to appear (or retry to reconnect later if
server is not available) you need to handle three error cases.
msg82390 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-17 23:25
> Isn't it a job of crossplatform programming language to abstract from
> low-level platform details?

It's certainly not Python's job. It's an explicit design goal, and a
long tradition, to expose system interfaces *as is*, with the very
same parameters, and the very same error codes. This is useful for
developers who need to understand what problem they encounter - they
can trust that Python doesn't second-guess the operating system.

It might be useful to put a layer on top of the system interfaces,
but such a layer needs to use different names.

> The scope of this bug is not about handling all possible Winsock errors.
> It is about proper handling the sole timeout error from the list
> http://www.winsock-error.com/ to make socket.connect() interface
> consistent for both windows and linux.

That's nearly impossible, with respect to specific error conditions.
The TCP stacks are too different.

You can easily define an common (but useless) error handling scheme
yourself: catch Exception, and interpret it as "it didn't work".

> BTW, I have tested the behaviour on linux - the system timeout on socket
> does occur, but with different error code.
> 
> socket.error: (110, 'Connection timed out')
> 
> Note that the error message is different too. That means that to
> properly wait for service to appear (or retry to reconnect later if
> server is not available) you need to handle three error cases.

That's correct. However, you shouldn't look at the error message when
handling the error on Linux. Instead, you should check whether the
error code is errno.ETIMEDOUT. The error message is only meant for
a human reader. Also notice that possible other errors returned from
connect are EACCES, EPERM, EADDRINUSE, EAFNOSUPPORT, EAGAIN,
EALREADY, EBADF, ECONNREFUSED, EFAULT, EINPROGRESS, EINTR, EISCONN,
ENETUNREACH, ENOTSOCK.
msg82409 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-02-18 05:48
Yes it is annoying to have to deal with the different OS specific error
numbers when handling socket.error, OSError, IOError or EnvironmentError
subclasses in general but that is life.  Python does not attempt to
figure out what all possible behaviors and errors are and coerce them
into some common representation because there often is not a common
thing.  Fortunately while there are many network stack behaviors, there
are really only two APIs (posix and windows) so there are only two sets
of error numbers to check for.  The burden is not that great in cross
platform code.

The issue that prompted this bug report: calling socket.connect() once
is not and has never been a way to wait for a server on the network to
come up.  If the server isn't there at the time it was called, it will
return an error once the OS has decided that it has no way to connect. 
The absense of a timeout being specified does not imply that it will
retry the underlying system call for you.  Merely that it won't bail out
early.

I have updated the socket module documentation to clarify this a bit in
r69731.
msg82548 - (view) Author: anatoly techtonik (techtonik) Date: 2009-02-20 21:35
Thanks for pointing me to the list of possible network errors. This
information is invaluable. Too bad it is easily lost among other
details. I've seen similar errors in other modules that use socket
module and it's no wonder now why people can't handle them correctly. I
still feel that the information about error handling should be specified
at the very beginning before references to wizard books.

In the meanwhile I made a script that probes remote service with proper
timeout checks that can be included in examples chapter. It requires
time module to calculate timeout shared between two exceptions.
http://code.activestate.com/recipes/576655/
History
Date User Action Args
2022-04-11 14:56:45adminsetgithub: 49543
2009-02-20 21:35:13techtoniksetmessages: + msg82548
2009-02-18 05:49:00gregory.p.smithsetnosy: + georg.brandl
resolution: not a bug -> fixed
messages: + msg82409
components: + Documentation
assignee: georg.brandl
2009-02-17 23:25:16loewissetmessages: + msg82390
2009-02-17 23:07:27techtoniksetmessages: + msg82387
2009-02-17 22:03:25loewissetmessages: + msg82376
2009-02-17 21:30:08techtoniksetmessages: + msg82370
2009-02-17 20:51:37loewissetmessages: + msg82367
2009-02-17 19:18:44gregory.p.smithsetstatus: open -> closed
nosy: + gregory.p.smith
resolution: not a bug
messages: + msg82356
2009-02-17 19:14:24techtoniksetmessages: + msg82355
2009-02-17 18:24:33loewissetnosy: + loewis
messages: + msg82353
2009-02-17 10:06:33techtonikcreate