classification
Title: Socket.recv hangs
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Barney Stratford, jstasiak
Priority: normal Keywords: patch

Created on 2020-11-30 11:55 by Barney Stratford, last changed 2021-01-07 16:53 by Barney Stratford. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23567 closed Barney Stratford, 2020-11-30 14:01
Messages (6)
msg382145 - (view) Author: Barney Stratford (Barney Stratford) * Date: 2020-11-30 11:55
import socket
self.__socket = socket.create_connection ([host, port], 10000)
value = self.__socket.recv (4096)

This code sometimes hangs despite having a non-None timeout specified. GDB says:

(gdb) bt
#0  0x76d33c94 in __GI___poll (fds=0x7ea55148, nfds=1, timeout=10000)
    at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x001e8014 in poll (__timeout=<optimized out>, __nfds=<optimized out>, 
    __fds=<optimized out>, __fds=<optimized out>, __nfds=<optimized out>, 
    __timeout=<optimized out>)
    at /usr/include/arm-linux-gnueabihf/bits/poll2.h:46
#2  internal_select (writing=writing@entry=0, interval=<optimized out>, 
    connect=0, connect@entry=4156908, s=<optimized out>)
    at ../Modules/socketmodule.c:745
#3  0x001ec588 in sock_call_ex (s=s@entry=0x76979d68, writing=writing@entry=0, 
    sock_func=sock_func@entry=0x1e736c <sock_recv_impl>, data=0x7ea551b0, 
    data@entry=0x7ea551a8, connect=connect@entry=0, err=err@entry=0x0, 
    timeout=10000000000) at ../Modules/socketmodule.c:840
#4  0x001ed394 in sock_call (data=0x7ea551a8, func=0x1e736c <sock_recv_impl>, 
    writing=0, s=0x76979d68) at ../Modules/socketmodule.c:3287
#5  sock_recv_guts (s=s@entry=0x76979d68, cbuf=<optimized out>, 
    len=<optimized out>, flags=<optimized out>)
    at ../Modules/socketmodule.c:3287
#6  0x001ed51c in sock_recv (s=0x76979d68, args=<optimized out>)
    at ../Modules/socketmodule.c:3318

Googling for this problem turned up this:

https://stackoverflow.com/questions/56038224/poll-waits-indefinitely-although-timeout-is-specified

If we look at socket module.c line 756 (Python 3.7.9 version), we see that we're indeed not checking for the pollfd.revents, and are therefore missing socket errors.

PR coming up in a few days.
msg383851 - (view) Author: Barney Stratford (Barney Stratford) * Date: 2020-12-27 13:56
Still waiting for the instrumented code to hang. It sometimes runs for a month or two before freezing.
msg384586 - (view) Author: Barney Stratford (Barney Stratford) * Date: 2021-01-07 13:38
The instrumented code froze today, so I'm finally able to probe the bug. Despite what that website said, it's looking like checking the return fron the poll syscall isn't the problem here. I'm probably going to close this bug report as not a bug, but want to check fully before I do so.
msg384593 - (view) Author: Barney Stratford (Barney Stratford) * Date: 2021-01-07 16:08
Instrumented code shows that the poll is in fact working completely correctly, and I've run into a documented edge-case in an unrelated area. Hence, closing the bug report and cancelling the pull request.
msg384594 - (view) Author: Jakub Stasiak (jstasiak) * Date: 2021-01-07 16:13
If the edge-case is vaguely socket/file descriptor-related and not application-specific or otherwise secret do you mind sharing what is it? (I'm just curious)
msg384596 - (view) Author: Barney Stratford (Barney Stratford) * Date: 2021-01-07 16:53
Sure. So, I'm using STOMP to connect to a messaging server. STOMP uses heartbeats to detect and close failed connections. The problem was that if the connection fails before the protocol has set up its heartbeats then there's nothing to stop the whole thing hanging. I had a timeout on the socket.create_connection, thinking this would protect against this edge-case, but it wasn't sufficient.

STOMP is such a simple protocol that it's often worth writing your own code to handle it. Indeed, this is actively encouraged. My own STOMP code is about 250 lines in a single source file, compared to nearly 3500 lines in 13 files for stomp.py. I very much prefer to simplify things as much as possible, and smaller is almost always better in my view. Simplify!

So, I kept seeing this very occasional hang-up in my code, and probing it with gdb showed that the execution was always stuck inside the poll. Of course it was, as that's where it sits to wait for something to happen. Then I was lead astray by finding that website, so I saddled up to save the world, as one does in such situations.
History
Date User Action Args
2021-01-07 16:53:45Barney Stratfordsetmessages: + msg384596
2021-01-07 16:13:41jstasiaksetmessages: + msg384594
2021-01-07 16:08:42Barney Stratfordsetstatus: open -> closed
resolution: not a bug
messages: + msg384593

stage: patch review -> resolved
2021-01-07 13:38:33Barney Stratfordsetmessages: + msg384586
2020-12-27 23:51:33jstasiaksetnosy: + jstasiak
2020-12-27 13:56:47Barney Stratfordsetmessages: + msg383851
2020-11-30 14:01:57Barney Stratfordsetkeywords: + patch
stage: patch review
pull_requests: + pull_request22448
2020-11-30 11:55:49Barney Stratfordcreate