classification
Title: _read_ready and _write_ready should respect _conn_lost
Type: behavior Stage: resolved
Components: asyncio Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lukasz.langa Nosy List: asvetlov, gvanrossum, lukasz.langa, python-dev, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2016-06-05 01:20 by lukasz.langa, last changed 2016-06-11 15:43 by yselivanov. This issue is now closed.

Files
File name Uploaded Description Edit
issue27223.diff lukasz.langa, 2016-06-05 01:21 review
cli.py lukasz.langa, 2016-06-05 01:21
srv.py lukasz.langa, 2016-06-05 01:21
Messages (3)
msg267360 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2016-06-05 01:20
Currently _read_ready and _write_ready callbacks do not respect _conn_lost, which makes them vulnerable to the following race condition:
- the client protocol connects to a server
- the client protocol sends some data
- the server sends some data back
- in the mean time, something causes the connection to be lost and that information gets handled by the client protocol first
- the client transport receives _read_ready but now the connection is already marked as lost and cleared on the transport

This causes an ugly exception to be raised:

	Exception in callback _SelectorSocketTransport._read_ready()
	handle: <Handle _SelectorSocketTransport._read_ready()>
	Traceback (most recent call last):
	File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 664, in _read_ready
		data = self._sock.recv(self.max_size)
	AttributeError: 'NoneType' object has no attribute 'recv'

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/events.py", line 125, in _run
		self._callback(*self._args)
	File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 668, in _read_ready
		self._fatal_error(exc, 'Fatal read error on socket transport')
	File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 587, in _fatal_error
		self._loop.call_exception_handler({
	AttributeError: 'NoneType' object has no attribute 'call_exception_handler'

This can be reproduced by running srv.py and cli.py attached here. Since this is an event loop timing issue, it might take a few tries of cli.py to the issue to surface. I get it to repro more consistently with PYTHONASYNCIODEBUG=1.

In production I get this issue most often when there's a torrent of simultaneous connections being created while existing protocol connections are already being handled.

To fix this, I propose to add guards for _conn_lost on both _ready_ready and _write_ready. See attached patch. This doesn't break any existing assumptions because we can't really respond to I/O events on a connection that's reset.

The patch also changes 3 existing unit tests, which I believe are currently invalid. test_write_ready_exception_close simulates filling the buffer and calling _write_ready after the transport has been closed. This means _call_connection_lost has been called and now _sock is None and _loop is None. We'd have AttributeError raised (like in the traceback above) and not OSError like the test believes. That would cause the same except clause to be invoked but then (like in the traceback above) _fatal_error fails due to _loop being None. I renamed the test test_transport_close_remove_writer and removed the _write_ready assumptions in it.

test_write_ready_send_closing required a change so that the buffer is non-empty BEFORE transport.close() is called. In this case close() doesn't remove the writer and schedule connection_lost. Instead, this is done by _write_ready after the write has been sent out. I also removed the following assert: `self.assertFalse(self.loop.writers)` because it's true during the entire execution of the test (in other words, checks nothing).

test_write_ready_send_closing_empty_buffer checks for _call_connection_lost to be called but in fact _call_connection_lost is scheduled to be called_soon in transport.close(). I changed the test to reflect this.
msg268217 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-06-11 15:20
New changeset 9512cfd53903 by Yury Selivanov in branch '3.5':
Issue #27223: aio: Fix _read_ready and _write_ready to respect _conn_lost.
https://hg.python.org/cpython/rev/9512cfd53903

New changeset d56b3e5ebfe2 by Yury Selivanov in branch 'default':
Merge 3.5 (issue #27223)
https://hg.python.org/cpython/rev/d56b3e5ebfe2
msg268218 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-06-11 15:22
Thanks, Łukasz!
History
Date User Action Args
2016-06-11 15:43:02yselivanovsetstatus: open -> closed
type: behavior
resolution: fixed
stage: patch review -> resolved
2016-06-11 15:22:11yselivanovsetmessages: + msg268218
2016-06-11 15:21:00python-devsetnosy: + python-dev
messages: + msg268217
2016-06-10 23:56:22lukasz.langasettitle: _ready_ready and _write_ready should respect _conn_lost -> _read_ready and _write_ready should respect _conn_lost
stage: patch review
2016-06-05 01:21:50lukasz.langasetfiles: + srv.py
2016-06-05 01:21:40lukasz.langasetfiles: + cli.py
2016-06-05 01:21:31lukasz.langasetfiles: + issue27223.diff
keywords: + patch
2016-06-05 01:20:33lukasz.langacreate