classification
Title: socket.getpeername() failure on broken TCP/IP connection
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: GeorgeY, martin.panter, r.david.murray
Priority: normal Keywords:

Created on 2016-10-15 01:13 by GeorgeY, last changed 2016-10-24 15:18 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
socket错误不识别.png GeorgeY, 2016-10-15 01:13 error caught
Messages (19)
msg278679 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-15 01:13
I need to know the IP address on the other side of a broken TCP/IP connection. 

"socket.getpeername()" fails to do the job sometimes because the connection has been closed, and Windows Error 10038 tells the connection is no longer a socket so that the method getpeername is wrongly used.

Here goes the code in main thread:
-----------
mailbox = queue.Queue()

read_sockets, write_sockets, error_sockets = select.select(active_socks,[],[],TIMEOUT)
for sock in read_sockets:
......
    except:
        mailbox.put( (("sock_err",sock), 'localhost') )
=========

The sub thread get this message from mailbox and try to analyze the broken socket, to simplify I put the code and output together:

-------------
print(sock)>>>
<socket.socket [closed] fd=-1, family=AFNET, type=SOCKSTREAM, proto=0>
sock.getpeername()>>>
OS.Error[WinError10038]an operation was attempted on something that is not a socket
=======

Surprisingly,  this kind of error happen occasionally - sometimes the socket object is normal and getpeername() works fine.

So once a connection is broken, there is no way to tell the address to whom it connected?
msg278680 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-15 01:31
The getpeername() method is just a wrapper around the OS function, so it is not going to work if the socket file descriptor is closed or invalid (-1).

You haven’t provided enough code or information for someone else to reproduce the problem. But it sounds like you may be closing the socket in one thread, and trying to use it in another thread. This is going to be unreliable and racy, depending on which thread acts on the socket first. Perhaps you should save the peer address in the same thread that closes it, so you can guarantee when it is open and when it is closed. Or use something else to synchronize the two threads and ensure the socket is always closed after getpeername() is called.

BTW it looks like I have to remove George’s username from the nosy list because it contains a comma!
msg278684 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-15 01:58
I have changed my Username, thanks martin.

" But it sounds like you may be closing the socket in one thread, and trying to use it in another thread"
-- I do not attempt to "close" it in main thread. Main only detect the connection failure and report the socket object to the sub thread. sub thread tries to identify the socket object (retrieve the IP address) before closing it.

The question is - once the TCP connection is broken (e.g. client's program get a crash), how can I get to know the original address of that connection? 

It seems like once someone(socket) dies, I am not allowed to know the name(address)!
msg278691 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-15 03:46
This indicated to me that the socket object has indeed been closed _before_ you call getpeername():

-------------
print(sock)>>>
<socket.socket [closed] fd=-1, family=AFNET, type=SOCKSTREAM, proto=0>
sock.getpeername()>>>
OS.Error[WinError10038]an operation was attempted on something that is not a socket
=======

In this case, I think “[closed] fd=-1” means that both the Python-level socket object, and all objects returned by socket.makefile(), have been closed, so the OS-level socket has probably been closed. In any case, getpeername() is probably trying the invalid file descriptor -1. If there are no copies of the OS-level socket open (e.g. in other processes), then the TCP connection is probably also shut down, but I suspect the problem is the socket object, not the TCP connection.

Without code or something demonstrating the bug, I’m pretty sure it is a bug in your program, not in Python.
msg278693 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-15 04:41
"Without code or something demonstrating the bug, I’m pretty sure it is a bug in your program"

Here is the main Thread
-----------------------

mailbox = queue.Queue()


while True:
    #print(addr_groups)


    unknown_clients=[]
    for key in yellow_page.keys():
        if yellow_page[key][0] ==None:
            unknown_clients.append(key)

    print("\n", name_groups)
    if len(unknown_clients) >0:
        print("unknown from:"+str(unknown_clients))
    print(time.strftime(ISOTIMEFORMAT, time.localtime(time.time())) + '\n')

    # Get the list sockets which are ready to be read through select
    read_sockets, write_sockets, error_sockets = select.select(active_socks,[],[],TIMEOUT)

    for sock in read_sockets:
        #New connection
        if sock ==server_sock:
            # New Client coming in
            clisock, addr = server_sock.accept()  
            ip = addr[0]
            if ip in IPALLOWED:
                active_socks.append(clisock)                
                yellow_page[addr] = [None,None,clisock] 
            else:
                clisock.close()
         
        #Some incoming message from a client
        else:
            # Data recieved from client, process it
            try:
                data = sock.recv(BUFSIZ)
                if data:
                    fromwhere = sock.getpeername()
                    mail_s = data.split(SEG_) 
                    del mail_s[0]
                    for mail_ in mail_s:

                        mail = mail_.decode()                        
                       
            except:
                mailbox.put( (("sock_err",sock), 'localhost') )
                continue
=====================

so the sub thread's job is to analyze the exception put into "mailbox"

Here is the run function of sub thread
-----------------------------------
    def run(self):
        
        while True:
            msg, addr = mailbox.get()  
            if msg[0] =="sock_err":
                print("sock_err @ ", msg[1])  #<<<Here comes the print of socket object
                handle_sock_err(msg[1])
                continue ##jump off
            else: ......
==========

Let us see how the handle_sock_err does to the broken socket:

---------------
def handle_sock_err(sock): #sock是出错的网络连接,要注销它并且提示出错
    global active_socks, yellow_page, addr_groups, name_groups 
    addr_del = sock.getpeername()  #<<<ERROR 10038
    
    name_del, job_del = yellow_page[addr_del][ 0:2] 
    yellow_page.pop(addr_del)
    
    tag = 0
    try:

        addr_groups[job_del].remove(addr_del);   tag =1
        name_groups[job_del].remove(name_del);   tag =2
        
        active_socks.remove(sock) 
        tag =3

        print(name_del+" offline!")

    except:
        if tag <3:
            active_socks.remove(sock)
        else:
            pass

=============

I do believe that the broken socket can tell me the address it connected to, so there is even no "try" in getpeername()

Why do I need to find the address of that broken socket found by select in main?
Simple, the server recognizes the user name once the connection has sent correct login information. When the connection is down, the user shall be automatically removed from online user list "yellow_page" and all other dynamic books like "addr_groups", "name_groups"...

This is a very common and reasonable practice of online system. I am not particularly interested in why getpeername() is ineffective in getting the address stopped connection,

but How I get the address that stopped connection.

I do not know why python can only tell me a line has broke, but where it was leading to. And I believe this is a big issue in establishing an effective server, do you agree with me?
msg278735 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-15 21:41
I still think something is closing your socket object. I cannot see what it is from the code you posted though. If you update the print() call, I expect you will see that it is closed, and the file descriptor is set to -1:

print("sock_err @ ", msg[1], msg[1]._closed, msg[1].fileno())  # Expect True, -1
msg278795 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-17 03:31
Yes that is definitely a closed socket. But it is strange that in a single thread server without select module, the socket is never closed until I explicitly use close() method. 

------------
except:
  print(sock)  #<- here it looks normal
  time.sleep(3)
  print(sock)  #<- here it still looks normal 
  sock.close()
  print(sock)  #<- finally the [closed] tag appears and all the details lost
============

So I guess the "Socket Automatically Closing" effect associate with "select" module? For when I run the single-thread server in the IDLE and called time.sleep(), it has been already treated as multi-thread.
msg278796 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-17 04:20
So is your “automatic closing” due to your program, or a bug in Python? You will have to give more information if you want anyone else to look at this. When I run the code you posted (with various modules imported) all I get is

NameError: name 'yellow_page' is not defined
msg278968 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-19 08:29
As your request, I simplify the server here:
----------------------------------------------------------
import socket
import select, time
import queue, threading

ISOTIMEFORMAT = '%Y-%m-%d %X'
BUFSIZ = 2048
TIMEOUT = 10
ADDR = ('', 15625)

SEG = "◎◎"
SEG_ = SEG.encode()

active_socks = []
socks2addr = {}


server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) 
server_sock.bind(ADDR)
server_sock.listen(10)
active_socks.append(server_sock)

mailbox = queue.Queue()

#<helper functions>

def send(mail):   
    mail_ = SEG_+ mail.encode()
    ##The SEG_ at the beginning can seperate messeges for recepient when internet busy
    
    for sock in active_socks[1:]:
        try:
            sock.send(mail_)
        except:
            handle_sock_err(sock)

def handle_sock_err(sock): 
    try:
        addr_del = sock.getpeername() 
    except:
        addr_del = socks2addr[sock]


    active_socks.remove(sock) 
    socks2addr.pop(sock) 
    sock.close()
    
    send("OFFLIN"+str(addr_del) )

#<sub Thread>
class Sender(threading.Thread):
    #process 'mails' - save and send
    def __init__(self, mailbox):
        super().__init__()
        self.queue = mailbox

    def analyze(self, mail, fromwhere):
        send( ' : '.join((fromwhere, mail)) )

    def run(self):
        
        while True:
            msg, addr = mailbox.get()  ###
              
            if msg[0] =="sock_err":
                print("sock_err @ ", msg[1]) 
                #alternative> print("sock_err @ " + repr( msg[1] ) )
                #the alternaive command greatly reduces socket closing

                handle_sock_err(msg[1])
                continue 
                
            self.analyze(msg, addr)

sender = Sender(mailbox)
sender.daemon = True
sender.start()

#<main Thread>
while True:
    onlines = list(socks2addr.values()) 
    print( '\n'+time.strftime(ISOTIMEFORMAT, time.localtime(time.time())) )
    print( 'online: '+str(onlines))

    read_sockets, write_sockets, error_sockets = select.select(active_socks,[],[],TIMEOUT)

    for sock in read_sockets:
        #New connection
        if sock ==server_sock:
            # New Client coming in
            clisock, addr = server_sock.accept() 
            ip = addr[0]

            active_socks.append(clisock)                
            socks2addr[clisock] = addr
         
        #Some incoming message from a client
        else:
            # Data recieved from client, process it
            try:
                data = sock.recv(BUFSIZ)
                if data:
                    fromwhere = sock.getpeername()
                    mail_s = data.split(SEG_)   ##seperate messages
                    del mail_s[0]
                    for mail_ in mail_s:
                        mail = mail_.decode()                        
                        print("recv>"+ mail)
                       
            except:
                mailbox.put( (("sock_err",sock), 'Server') )
                continue
 
server_sock.close()
  

==========================================================

The client side can be anything that tries to connect the server.
The original server has a bulletin function that basically echoes every message from any client to all clients. But you can ignore this function and limit the client from just connecting to this server and do nothing before close.

I find the error again:
----------------------
sock_err @ <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:/Users/user/Desktop/SelectWinServer.py", line 39, in handle_sock_err
    addr_del = sock.getpeername()
OSError: [WinError 10038] 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python34\lib\threading.py", line 911, in _bootstrap_inner
    self.run()
  File "C:/Users/user/Desktop/SelectWinServer.py", line 67, in run
    handle_sock_err(msg[1])
  File "C:/Users/user/Desktop/SelectWinServer.py", line 41, in handle_sock_err
    addr_del = socks2addr[sock]
KeyError: <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>
=================

It seems that "socks2addr" has little help when socket is closed and "getpeername()" fails - it will fail too.

However, I do find that altering

print("sock_err @ ", msg[1])
to
print("sock_err @ " + repr( msg[1] ) )

can reduce socket closing. Don't understand why and how important it is. 

BTW, on Windows 7 or Windows 10.
msg278970 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-19 10:01
I haven’t tried running your program, but I don’t see anything stopping multiple references to the same socket appearing in the “mailbox” queue. Once the first reference has been processed, the socket will be closed, so subsequent getpeername() calls will be invalid.
msg278971 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-19 10:57
so when do you think the error socket closes?
msg279001 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-19 23:30
When I run your program on Linux (natively, and I also tried Wine), the worst behaviour I get is a busy loop as soon as a client shuts down the connection and recv() returns an empty string. I would have to force an exception in the top level code to trigger the rest of the code.

Anyway, my theory is your socket is closed in a previous handle_sock_err() call. Your KeyError from socks2addr is further evidence of this. I suggest to look at why handle_sock_err() is being called, what exceptions are being handled, where they were raised, what the contents and size of “mailbox” is, etc.

I suggest you go elsewhere for general help with Python programming (e.g. the python-list mailing list), unless it actually looks like a bug in Python.
msg279004 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-20 00:58
I have changed the code to report any error that occurs in receiving message,

and it reports: [WinError10054] An existing connection was forcibly closed by the remote host

Well, this error is the one we need to handle, right? A server need to deal with abrupt offlines of clients. Yes the romote host has dropped and connection has been broken, but that does not mean we cannot recall its address. 

If this is not a bug, I don't know what is a bug in socket module.

----------------------------------------------------------
import socket
import select, time
import queue, threading

ISOTIMEFORMAT = '%Y-%m-%d %X'
BUFSIZ = 2048
TIMEOUT = 10
ADDR = ('', 15625)

SEG = "◎◎"
SEG_ = SEG.encode()

active_socks = []
socks2addr = {}


server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) 
server_sock.bind(ADDR)
server_sock.listen(10)
active_socks.append(server_sock)

mailbox = queue.Queue()

#<helper functions>

def send(mail):   
    mail_ = SEG_+ mail.encode()
    ##The SEG_ at the beginning can seperate messeges for recepient when internet busy
    
    for sock in active_socks[1:]:
        try:
            sock.send(mail_)
        except:
            handle_sock_err(sock)

def handle_sock_err(sock): 
    try:
        addr_del = sock.getpeername() 
    except:
        addr_del = socks2addr[sock]


    active_socks.remove(sock) 
    socks2addr.pop(sock) 
    sock.close()
    
    send("OFFLIN"+str(addr_del) )

#<sub Thread>
class Sender(threading.Thread):
    #process 'mails' - save and send
    def __init__(self, mailbox):
        super().__init__()
        self.queue = mailbox

    def analyze(self, mail, fromwhere):
        send( ' : '.join((fromwhere, mail)) )

    def run(self):
        
        while True:
            msg, addr = mailbox.get()  ###
              
            if msg[0] =="sock_err":
                print("sock_err @ ", msg[1]) 
                #alternative> print("sock_err @ " + repr( msg[1] ) )
                #the alternaive command greatly reduces socket closing

                handle_sock_err(msg[1])
                continue 
                
            self.analyze(msg, addr)

sender = Sender(mailbox)
sender.daemon = True
sender.start()

#<main Thread>
while True:
    onlines = list(socks2addr.values()) 
    print( '\n'+time.strftime(ISOTIMEFORMAT, time.localtime(time.time())) )
    print( 'online: '+str(onlines))

    read_sockets, write_sockets, error_sockets = select.select(active_socks,[],[],TIMEOUT)

    for sock in read_sockets:
        #New connection
        if sock ==server_sock:
            # New Client coming in
            clisock, addr = server_sock.accept() 
            ip = addr[0]

            active_socks.append(clisock)                
            socks2addr[clisock] = addr
         
        #Some incoming message from a client
        else:
            # Data recieved from client, process it
            try:
                data = sock.recv(BUFSIZ)
                if data:
                    fromwhere = sock.getpeername()
                    mail_s = data.split(SEG_)   ##seperate messages
                    del mail_s[0]
                    for mail_ in mail_s:
                        mail = mail_.decode()                        
                        print("recv>"+ mail)
                       
            except Exception as err:
                print( "SOCKET ERROR: "+str(err) )
                mailbox.put( (("sock_err",sock), 'Server') )
                continue
 
server_sock.close()
  

==========================================================
msg279012 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-20 04:21
The socket close accident is not caused by queue or calling handle_sock_error at all, it happened right after select error

After changing the Exception handling of main Thread:
------------------------
            except Exception as err:
                print("error:"+str(err))
                print(sock.getpeername())
                mailbox.put( (("sock_err",sock), 'Server') )
                continue
 
server_sock.close()
========================

I also get the same type of error:

------------------------
Traceback (most recent call last):
  File "C:\Users\user\Desktop\SelectWinServer.py", line 112, in <module>
    data = sock.recv(BUFSIZ)
ConnectionResetError: [WinError 10054] connection forcibly close

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\user\Desktop\SelectWinServer.py", line 123, in <module>
    print(sock.getpeername())
OSError: [WinError 10038] not a socket
========================
msg279036 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-10-20 14:57
Unless I'm missing something, this indicates that the problem is that once the far end closes, Windows will no longer return the peer name.  

And, unless I'm misreading, the behavior will be the same on Unix.  The man page for getpeername says that ENOTCONN is returned if the socket is not connected.

This isn't a bug in Python.  Or Windows, though the error message is a bit counter-intuitive to a unix programmer.
msg279099 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-21 02:07
Hello David,

   Yes I had the same thought with you that the information of socket is lost at operating syetem level.

   However, I hope at Python level this kind of information will not be lost.

   Once the socket has been created by incoming connection, the address information of 'laddr' and 'raddr' has been known, and print(socket) will show them. It is not necessarily lost when the connection is broken. Any static method, like assigning an attribute of address to the socket will help.

   To the the least, Python shall not automatically destroy the socket object simply because it has been closed by Windows. Otherwise any attempt to record the address information of the socket will fail after it is destoyed.

   The error shown in message 278968 has clearly shown that even as a key, the socket object cannot function because it is already destroyed.

----------------------
sock_err @ <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:/Users/user/Desktop/SelectWinServer.py", line 39, in handle_sock_err
    addr_del = sock.getpeername()
OSError: [WinError 10038] 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python34\lib\threading.py", line 911, in _bootstrap_inner
    self.run()
  File "C:/Users/user/Desktop/SelectWinServer.py", line 67, in run
    handle_sock_err(msg[1])
  File "C:/Users/user/Desktop/SelectWinServer.py", line 41, in handle_sock_err
    addr_del = socks2addr[sock]
KeyError: <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>
=================
msg279127 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-10-21 13:44
The socket module is a relatively thin wrapper around the C socket library.  'getpeername' is inspecting the *current* peer of the socket, and if there is no current peer, there is no current peer name.  Retaining information the socket library does not is out of scope for the python socket library.  It could be done via a higher level wrapper library, but that would be out of scope for the stdlib unless someone develops something that is widely popular and used by many many people.
msg279303 - (view) Author: Georgey (GeorgeY) * Date: 2016-10-24 12:02
Not only does the getpeername() method not work, but the socket instance itself has been destroyed as garbage by python. 
- I understand the former, but cannot accept the latter.
msg279312 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-10-24 15:18
Your example does not show a destroyed socket object, so to what are you referring?  Python won't recycle an object as garbage until there are no remaining references to it.

If you think that there is information the socket object "knows" that it is throwing away when the socket is closed, you might be correct (I haven't checked the code), but that would be *correct* behavior at this API level and design: since the socket is no longer connected, that information is no longer valid.

Please leave the issue closed until you convince us there's a bug :)  If you want to propose some sort of enhancement, the correct forum for this level of enhancement would be the python-ideas mailing list.
History
Date User Action Args
2016-10-24 15:18:23r.david.murraysetstatus: pending -> closed
resolution: wont fix -> not a bug
messages: + msg279312
2016-10-24 12:02:57GeorgeYsetstatus: closed -> pending
resolution: not a bug -> wont fix
messages: + msg279303
2016-10-21 13:44:40r.david.murraysetstatus: open -> closed
resolution: remind -> not a bug
messages: + msg279127
2016-10-21 02:07:31GeorgeYsetstatus: closed -> open
resolution: not a bug -> remind
messages: + msg279099
2016-10-20 14:57:01r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg279036

resolution: wont fix -> not a bug
stage: test needed -> resolved
2016-10-20 04:21:35GeorgeYsetmessages: + msg279012
2016-10-20 00:58:46GeorgeYsetstatus: closed -> open
resolution: not a bug -> wont fix
messages: + msg279004
2016-10-19 23:30:03martin.pantersetstatus: open -> closed
type: crash -> behavior
resolution: remind -> not a bug
messages: + msg279001
2016-10-19 10:57:40GeorgeYsetmessages: + msg278971
2016-10-19 10:01:55martin.pantersetmessages: + msg278970
2016-10-19 08:29:38GeorgeYsetmessages: + msg278968
2016-10-17 04:20:25martin.pantersetmessages: + msg278796
2016-10-17 03:31:02GeorgeYsetmessages: + msg278795
2016-10-15 21:41:29martin.pantersetmessages: + msg278735
2016-10-15 04:41:51GeorgeYsetstatus: closed -> open
resolution: not a bug -> remind
messages: + msg278693
2016-10-15 03:46:10martin.pantersetstatus: open -> closed
resolution: remind -> not a bug
messages: + msg278691

stage: test needed
2016-10-15 01:58:49GeorgeYsetresolution: not a bug -> remind

messages: + msg278684
nosy: + GeorgeY
2016-10-15 01:31:39martin.pantersetnosy: + martin.panter, - GeorgeY
resolution: not a bug
messages: + msg278680
2016-10-15 01:13:17GeorgeYcreate