Issue44805
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-08-02 07:45 by ntc2, last changed 2022-04-11 14:59 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
server.py | ntc2, 2021-08-02 07:45 | Example server code | ||
client.py | ntc2, 2021-08-02 07:45 | Example client code |
Messages (3) | |||
---|---|---|---|
msg398731 - (view) | Author: Nathan Collins (ntc2) * | Date: 2021-08-02 07:45 | |
Problem ======= When using asyncio streams via (r,w) = asyncio.open_connection(sock=socket) with a already connected socket `socket`, if you call `socket.close()` but not `w.close()` when you're done, then when the OS later reuses the file descriptor of `socket` for a new socket, and that new socket is used with (r,w) = asyncio.open_connection(sock=socket) again, the `r.read(...)` for the new `r` can hang indefinitely, even when data is available on the underlying socket. When the hang happens, closing the socket on the writer side doesn't help, and the socket gets stuck forever in the CLOSE_WAIT state on the reader side. Using `strace` shows that the reader side is stuck in `epoll_wait(...)`. Client and server programs that reproduce the bug ================================================= Run the server in one shell and then run the client in the other shell. They each take one argument, that controls how they close their sockets/streams when they're done. Usage: python3 client.py CLOSE_MODE Usage: python3 server.py CLOSE_MODE Where CLOSE_MODE can be * "": don't close the socket in any way * "S": close `open_connection` socket using `socket.socket.close()` * "A": close `open_connection` socket using `asyncio.StreamWriter.close()` * "SA": close `open_connection` socket both ways These are also attached, but here's the source. The `client.py`: ``` python import asyncio, socket, sys async def client(src_ip, close_mode): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) src_ip_port = (src_ip, 0) s.bind(src_ip_port) dst_ip_port = ('127.0.0.2', 12345) s.connect(dst_ip_port) print(f'Connected from {src_ip}') print(s) try: (r,w) = await asyncio.open_connection(sock=s) print('<- ', end='', flush=True) in_line = await r.read(100) print(in_line) out_line = b'client' print('-> ', end='', flush=True) w.write(out_line) await w.drain() print(out_line) finally: if 'S' in close_mode: s.close() if 'A' in close_mode: w.close() await w.wait_closed() print('Closed socket') print() async def main(close_mode): await client('127.0.0.3', close_mode) await client('127.0.0.4', close_mode) close_mode = sys.argv[1] asyncio.run(main(close_mode)) ``` The `server.py`: ``` python import asyncio, socket, sys async def server(close_mode): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) ip = '127.0.0.2' port = 12345 print(f'Listening on {ip}:{port}') print(s) try: s.bind((ip, port)) s.listen() while True: (a, (a_ip, a_port)) = s.accept() print(f'Client connected from {a_ip}:{a_port}') print(a) try: (r,w) = await asyncio.open_connection(sock=a) print('-> ', end='', flush=True) out_line = b'server' w.write(out_line) await w.drain() print(out_line) print('<- ', end='', flush=True) in_line = await r.read(100) print(in_line) finally: if 'S' in close_mode: a.close() if 'A' in close_mode: w.close() await w.wait_closed() print('Closed client socket') print() finally: s.close() print('Closed server socket') close_mode = sys.argv[1] asyncio.run(server(close_mode)) ``` Example session: `server.py S` and `client.py A` ================================================ Note that file descriptor 7 is reused on the server side, before the server hangs on `r.read`. Run the server in one shell: $ python3 server.py S Listening on 127.0.0.2:12345 <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)> Client connected from 127.0.0.3:34135 <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.3', 34135)> -> b'server' <- b'client' Closed client socket Client connected from 127.0.0.4:46639 <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.4', 46639)> -> b'server' <- The server is hanging on `r.read` above. Run the client in another shell, after starting the server: $ python3 client.py A Connected from 127.0.0.3 <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.3', 34135), raddr=('127.0.0.2', 12345)> <- b'server' -> b'client' Closed socket Connected from 127.0.0.4 <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.4', 46639), raddr=('127.0.0.2', 12345)> <- b'server' -> b'client' Closed socket $ lsof -ni @127.0.0.2 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 26692 conathan 6u IPv4 9992763 0t0 TCP 127.0.0.2:12345 (LISTEN) python3 26692 conathan 7u IPv4 9991149 0t0 TCP 127.0.0.2:12345->127.0.0.4:46639 (CLOSE_WAIT) Example session: `server.py ''` and `client.py A` ================================================ Note that file descriptors are not reused on the server side now, and nothing hangs. Server: $ python3 server.py '' Listening on 127.0.0.2:12345 <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)> Client connected from 127.0.0.3:37833 <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.3', 37833)> -> b'server' <- b'client' Closed client socket Client connected from 127.0.0.4:39463 <socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.4', 39463)> -> b'server' <- b'client' Closed client socket Client: $ python3 client.py A Connected from 127.0.0.3 <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.3', 37833), raddr=('127.0.0.2', 12345)> <- b'server' -> b'client' Closed socket Connected from 127.0.0.4 <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.4', 39463), raddr=('127.0.0.2', 12345)> <- b'server' -> b'client' Closed socket Behavior for different combinations of closure modes ==================================================== Perhaps this is overkill, but here's what happens for all 15 possible combinations of how the client and server close their connections. For example, "Client=S, Server=S" below means we run `$ python3 client.py S` and `$ python3 server.py S`. Sometimes multiple combinations have the same behavior, so they're grouped together below. Client=S, Server=''; Client=S, Server=S; Client=S, Server=A; Client=S, Server=SA ------------------- Client hangs on `r.read` on second connection, and killing the server on the other end has no effect, with the socket stuck in CLOSE_WAIT on the client side forever. Client='', Server=S; Client=A, Server=S ------------------ Server hangs on `r.read` on second connection, and client exits normally, with the socket stuck in CLOSE_WAIT on the server side forever. Client=SA, Server=S ------------------- Everything works the first time, but if you run the client in a loop, e.g. with $ while true; do python3 client.py SA; done then the server will eventually hang on `r.read` after ~3 client sessions. Client='', Server=''; Client='', Server=A; Client=A, Server=''; Client=SA, Server='' ------------------- Everything works! But here we see that the client and/or server (whichever side is using '' for mode) is not reusing the socket file descriptors right away (have to wait for GC). This is evidence that the problem is due to stale state in asyncio tied to the reused file descriptors. Client=A, Server=A; Client=A, Server=SA; Client=SA, Server=A; Client=SA, Server=SA -------------------- Everything works, but this is not surprising because both sides closed the `StreamWriter` with `w.close()`. Possibly related bugs ===================== https://bugs.python.org/issue43253: Windows only, calling socket.socket.close() on a socket used with asyncio.open_connection(sock=socket). https://bugs.python.org/issue41317: reused file descriptors across different connections. https://bugs.python.org/issue34795, https://bugs.python.org/issue30064: closing a socket used with asyncio. https://bugs.python.org/issue35065: reading from a closed stream can hang. Could be related if the problem is due to aliased streams from reused file descriptors. https://bugs.python.org/issue43183: sockets used with asyncio getting stuck in WAIT_CLOSED. System info =========== $ python3 --version Python 3.9.6 $ lsb_release -a LSB Version: core-9.20170808ubuntu1-noarch:printing-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic |
|||
msg398734 - (view) | Author: Nathan Collins (ntc2) * | Date: 2021-08-02 08:50 | |
Oh, and I can't count: there are 16 = 4x4 possible combinations of socket closure modes for the client and server. The one I missed was Client='', Server=SA, where everything works because the client doesn't reuse file descriptors and the server closes its sockets correctly. |
|||
msg404916 - (view) | Author: Nathan Collins (ntc2) * | Date: 2021-10-24 08:09 | |
Just wanted to clarify: my previous "where everything works" comment is not saying this bug doesn't exist, I just mean I missed one case in my analysis of the bug. The bug is very much there, and easy to reproduce using the example programs I attached. Bump! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:48 | admin | set | github: 88968 |
2021-10-24 08:09:32 | ntc2 | set | messages: + msg404916 |
2021-08-02 08:50:52 | ntc2 | set | messages: + msg398734 |
2021-08-02 07:45:52 | ntc2 | set | files: + client.py |
2021-08-02 07:45:23 | ntc2 | create |