classification
Title: Intermittent asyncio.open_connection / futures.InvalidStateError
Type: behavior Stage:
Components: asyncio, IO Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, giampaolo.rodola, gvanrossum, pitrou, python-dev, ryder.lewis, vstinner, yselivanov
Priority: normal Keywords:

Created on 2014-05-06 18:28 by ryder.lewis, last changed 2014-07-05 13:31 by python-dev. This issue is now closed.

Messages (8)
msg217999 - (view) Author: Ryder Lewis (ryder.lewis) * Date: 2014-05-06 18:28
Intermittently, when using asyncio.wait_for() with asyncio.open_connection() to cancel the open_connection() task after a timeout period, an asyncio.futures.InvalidStateError is raised. It seems to be a race condition, if the open_connection() call succeeds roughly at the same time as the timeout.

In order to recreate this issue, I used a slightly-modified version of the example at https://docs.python.org/3.4/library/asyncio-stream.html?highlight=open_connection#example

==== example.py ====
import asyncio
import urllib.parse
import sys

@asyncio.coroutine
def print_http_headers(url, timeout):
    url = urllib.parse.urlsplit(url)
    try:
        reader, writer = yield from asyncio.wait_for(asyncio.open_connection(url.hostname, 80), timeout)
    except asyncio.TimeoutError:
        print('Timed out with {}-second timeout'.format(timeout))
        return

    query = ('HEAD {url.path} HTTP/1.0\r\n'
             'Host: {url.hostname}\r\n'
             '\r\n').format(url=url)
    writer.write(query.encode('latin-1'))
    while True:
        line = yield from reader.readline()
        if not line:
            break
        line = line.decode('latin1').rstrip()
        if line:
            print('HTTP header> %s' % line)

    print('Success with {}-second timeout'.format(timeout))

url = sys.argv[1]
loop = asyncio.get_event_loop()

for timeout in range(5, 0, -1):
    task = asyncio.async(print_http_headers(url, timeout/100))
    loop.run_until_complete(task)

loop.close()


==== Output of "./example.py http://www.yahoo.com/" after running multiple times ====
HTTP header> HTTP/1.0 301 Redirect
HTTP header> Date: Tue, 06 May 2014 18:07:43 GMT
HTTP header> Connection: close
HTTP header> Via: http/1.1 ir12.fp.ne1.yahoo.com (ApacheTrafficServer/4.0.2)
HTTP header> Server: ATS
HTTP header> Cache-Control: no-store
HTTP header> Content-Type: text/html; charset=utf-8
HTTP header> Content-Language: en
HTTP header> Location: https://www.yahoo.com/
HTTP header> Content-Length: 214
Success with 0.04-second timeout
Timed out with 0.03-second timeout
Timed out with 0.02-second timeout
Exception in callback <bound method Future.set_result of Future<CANCELLED>>(None,)
handle: Handle(<bound method Future.set_result of Future<CANCELLED>>, (None,))
Traceback (most recent call last):
  File "/opt/python3/lib/python3.4/asyncio/events.py", line 39, in _run
    self._callback(*self._args)
  File "/opt/python3/lib/python3.4/asyncio/futures.py", line 298, in set_result
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
asyncio.futures.InvalidStateError: CANCELLED: Future<CANCELLED>
Timed out with 0.01-second timeout
msg218003 - (view) Author: Ryder Lewis (ryder.lewis) * Date: 2014-05-06 19:00
Another run raised a different exception,
again running ./example.py http://www.yahoo.com/


Timed out with 0.05-second timeout
HTTP header> HTTP/1.0 301 Redirect
HTTP header> Date: Tue, 06 May 2014 18:58:53 GMT
HTTP header> Connection: close
HTTP header> Via: http/1.1 ir38.fp.bf1.yahoo.com (ApacheTrafficServer/4.0.2)
HTTP header> Server: ATS
HTTP header> Cache-Control: no-store
HTTP header> Content-Type: text/html; charset=utf-8
HTTP header> Content-Language: en
HTTP header> Location: https://www.yahoo.com/
HTTP header> Content-Length: 214
Success with 0.04-second timeout
HTTP header> HTTP/1.0 301 Redirect
HTTP header> Date: Tue, 06 May 2014 18:58:53 GMT
HTTP header> Connection: close
HTTP header> Via: http/1.1 ir45.fp.bf1.yahoo.com (ApacheTrafficServer/4.0.2)
HTTP header> Server: ATS
HTTP header> Cache-Control: no-store
HTTP header> Content-Type: text/html; charset=utf-8
HTTP header> Content-Language: en
HTTP header> Location: https://www.yahoo.com/
HTTP header> Content-Length: 214
Success with 0.03-second timeout
Timed out with 0.02-second timeout
Timed out with 0.01-second timeout
exception calling callback for <Future at 0x7f24e43aa710 state=finished returned list>
Traceback (most recent call last):
  File "/opt/python3/lib/python3.4/concurrent/futures/_base.py", line 297, in _invoke_callbacks
    callback(self)
  File "/opt/python3/lib/python3.4/asyncio/futures.py", line 370, in <lambda>
    new_future._copy_state, fut))
  File "/opt/python3/lib/python3.4/asyncio/base_events.py", line 285, in call_soon_threadsafe
    self._write_to_self()
  File "/opt/python3/lib/python3.4/asyncio/selector_events.py", line 91, in _write_to_self
    self._csock.send(b'x')
AttributeError: 'NoneType' object has no attribute 'send'
msg218018 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-05-06 20:53
The second error is easy to explain and fix: it's a race condition between the OS thread used to call getaddrinfo() and the main thread. The method _write_to_self() in selector_events.py is hopelessly naive. It should probably become something like this:

    def _write_to_self(self):
        csock = self._csock
        if csock is not None:
            try:
                self._csock.send(b'x')
            except OSError:
		pass

It is possible that the main thread closes csock at any time, and calling send() on a closed socket will raise OSError with errno=9 (EBADF). Fortunately this is because close() sets the fd to -1; so there is no worry about reuse of the fd.

I will investigate the first traceback next.
msg218025 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-05-06 21:47
New changeset afe8c4bd3230 by Guido van Rossum in branch '3.4':
asyncio: Fix the second half of issue #21447: race in _write_to_self().
http://hg.python.org/cpython/rev/afe8c4bd3230

New changeset c0538334f4df by Guido van Rossum in branch 'default':
Merge 3.4->default: asyncio: Fix the second half of issue #21447: race in _write_to_self().
http://hg.python.org/cpython/rev/c0538334f4df
msg218027 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-05-06 22:13
I'm a little closer to understanding the first (more common) traceback. I can repro it by running your demo program in a loop -- it may take a while but occasionally I do get the same InvalidStateError.

This appears to be an example of the problem speculated about in message #5 of https://code.google.com/p/tulip/issues/detail?id=58.

In order to find out which occurrence of call_later(future.set_result, None) is causing this, I replaced all of them (about half a dozen) with an equivalent lambda -- the lambda shows up in the traceback, unlike the location where call_soon() is called.

In this particular example, it is the last line of _SelectorSocketTransport.__init__() in selector_events.py (line 446).

I suspect there are other places where a similar problem can occur.  I still have to think more about how to fix this (without changing set_result() to always ignore a cancelled Future -- such a change would mask certain errors that I find important to catch).
msg220999 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-19 16:00
I'm unable to reproduce the issue with Python 3.5 (development version).
msg221962 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-30 14:09
This issue contains two sub-issues:
- race condition in_write_to_self() => already fixed
- race condition with scheduled call to future.set_result(), InvalidStateError => I just opened the issue #21886 to discuss it

@Ryder: If you are able to reproduce the second issue (InvalidStateError), please use set the environment variable PYTHONASYNCIODEBUG=1 to see the traceback where the call to set_result() was scheduled. It requires the latest development version of Tulip, Python 3.4 or Python 3.5 to get the traceback.

I close this issue because I prefer to discuss the InvalidStateError in the issue #21886.

Thanks for the report Ryder. Thanks for the fix for the race condition in _write_to_self() Guido.
msg222358 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-07-05 13:31
New changeset d7e4efd5e279 by Victor Stinner in branch '3.4':
Closes #21886, #21447: Fix a race condition in asyncio when setting the result
http://hg.python.org/cpython/rev/d7e4efd5e279

New changeset 50c995bdc00a by Victor Stinner in branch 'default':
(Merge 3.4) Closes #21886, #21447: Fix a race condition in asyncio when setting
http://hg.python.org/cpython/rev/50c995bdc00a
History
Date User Action Args
2014-07-05 13:31:25python-devsetmessages: + msg222358
2014-06-30 14:09:02vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg221962
2014-06-19 16:00:03vstinnersetmessages: + msg220999
2014-06-06 11:42:47vstinnersetcomponents: + asyncio
2014-05-07 13:40:24asvetlovsetnosy: + asvetlov
2014-05-06 22:13:27gvanrossumsetmessages: + msg218027
2014-05-06 21:47:23python-devsetnosy: + python-dev
messages: + msg218025
2014-05-06 20:53:53gvanrossumsetmessages: + msg218018
2014-05-06 19:51:02ned.deilysetnosy: + gvanrossum, pitrou, vstinner, giampaolo.rodola, yselivanov
2014-05-06 19:00:33ryder.lewissetmessages: + msg218003
2014-05-06 18:28:19ryder.lewiscreate