Message 267360 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lukasz.langa
Recipients	asvetlov, gvanrossum, lukasz.langa, vstinner, yselivanov
Date	2016-06-05.01:20:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1465089633.03.0.133113775178.issue27223@psf.upfronthosting.co.za>
In-reply-to

Content
Currently _read_ready and _write_ready callbacks do not respect _conn_lost, which makes them vulnerable to the following race condition: - the client protocol connects to a server - the client protocol sends some data - the server sends some data back - in the mean time, something causes the connection to be lost and that information gets handled by the client protocol first - the client transport receives _read_ready but now the connection is already marked as lost and cleared on the transport This causes an ugly exception to be raised: Exception in callback _SelectorSocketTransport._read_ready() handle: <Handle _SelectorSocketTransport._read_ready()> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 664, in _read_ready data = self._sock.recv(self.max_size) AttributeError: 'NoneType' object has no attribute 'recv' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/events.py", line 125, in _run self._callback(*self._args) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 668, in _read_ready self._fatal_error(exc, 'Fatal read error on socket transport') File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 587, in _fatal_error self._loop.call_exception_handler({ AttributeError: 'NoneType' object has no attribute 'call_exception_handler' This can be reproduced by running srv.py and cli.py attached here. Since this is an event loop timing issue, it might take a few tries of cli.py to the issue to surface. I get it to repro more consistently with PYTHONASYNCIODEBUG=1. In production I get this issue most often when there's a torrent of simultaneous connections being created while existing protocol connections are already being handled. To fix this, I propose to add guards for _conn_lost on both _ready_ready and _write_ready. See attached patch. This doesn't break any existing assumptions because we can't really respond to I/O events on a connection that's reset. The patch also changes 3 existing unit tests, which I believe are currently invalid. test_write_ready_exception_close simulates filling the buffer and calling _write_ready after the transport has been closed. This means _call_connection_lost has been called and now _sock is None and _loop is None. We'd have AttributeError raised (like in the traceback above) and not OSError like the test believes. That would cause the same except clause to be invoked but then (like in the traceback above) _fatal_error fails due to _loop being None. I renamed the test test_transport_close_remove_writer and removed the _write_ready assumptions in it. test_write_ready_send_closing required a change so that the buffer is non-empty BEFORE transport.close() is called. In this case close() doesn't remove the writer and schedule connection_lost. Instead, this is done by _write_ready after the write has been sent out. I also removed the following assert: `self.assertFalse(self.loop.writers)` because it's true during the entire execution of the test (in other words, checks nothing). test_write_ready_send_closing_empty_buffer checks for _call_connection_lost to be called but in fact _call_connection_lost is scheduled to be called_soon in transport.close(). I changed the test to reflect this.

Currently _read_ready and _write_ready callbacks do not respect _conn_lost, which makes them vulnerable to the following race condition:
- the client protocol connects to a server
- the client protocol sends some data
- the server sends some data back
- in the mean time, something causes the connection to be lost and that information gets handled by the client protocol first
- the client transport receives _read_ready but now the connection is already marked as lost and cleared on the transport

This causes an ugly exception to be raised:

Exception in callback _SelectorSocketTransport._read_ready()
handle: <Handle _SelectorSocketTransport._read_ready()>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 664, in _read_ready
data = self._sock.recv(self.max_size)
AttributeError: 'NoneType' object has no attribute 'recv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/events.py", line 125, in _run
self._callback(*self._args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 668, in _read_ready
self._fatal_error(exc, 'Fatal read error on socket transport')
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/selector_events.py", line 587, in _fatal_error
self._loop.call_exception_handler({
AttributeError: 'NoneType' object has no attribute 'call_exception_handler'

This can be reproduced by running srv.py and cli.py attached here. Since this is an event loop timing issue, it might take a few tries of cli.py to the issue to surface. I get it to repro more consistently with PYTHONASYNCIODEBUG=1.

In production I get this issue most often when there's a torrent of simultaneous connections being created while existing protocol connections are already being handled.

To fix this, I propose to add guards for _conn_lost on both _ready_ready and _write_ready. See attached patch. This doesn't break any existing assumptions because we can't really respond to I/O events on a connection that's reset.

The patch also changes 3 existing unit tests, which I believe are currently invalid. test_write_ready_exception_close simulates filling the buffer and calling _write_ready after the transport has been closed. This means _call_connection_lost has been called and now _sock is None and _loop is None. We'd have AttributeError raised (like in the traceback above) and not OSError like the test believes. That would cause the same except clause to be invoked but then (like in the traceback above) _fatal_error fails due to _loop being None. I renamed the test test_transport_close_remove_writer and removed the _write_ready assumptions in it.

test_write_ready_send_closing required a change so that the buffer is non-empty BEFORE transport.close() is called. In this case close() doesn't remove the writer and schedule connection_lost. Instead, this is done by _write_ready after the write has been sent out. I also removed the following assert: `self.assertFalse(self.loop.writers)` because it's true during the entire execution of the test (in other words, checks nothing).

test_write_ready_send_closing_empty_buffer checks for _call_connection_lost to be called but in fact _call_connection_lost is scheduled to be called_soon in transport.close(). I changed the test to reflect this.

History
Date	User	Action	Args
2016-06-05 01:20:33	lukasz.langa	set	recipients: + lukasz.langa, gvanrossum, vstinner, asvetlov, yselivanov
2016-06-05 01:20:33	lukasz.langa	set	messageid: <1465089633.03.0.133113775178.issue27223@psf.upfronthosting.co.za>
2016-06-05 01:20:32	lukasz.langa	link	issue27223 messages
2016-06-05 01:20:30	lukasz.langa	create