classification
Title: Closing socket raises AttributeError: 'collections.deque' object has no attribute '_decref_socketios'
Type: crash Stage: resolved
Components: IO Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: reidfaiv, rhettinger, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2017-09-26 11:04 by reidfaiv, last changed 2017-10-06 17:54 by reidfaiv. This issue is now closed.

Messages (4)
msg303030 - (view) Author: (reidfaiv) Date: 2017-09-26 11:04
We have one application misbehaving in production environment under the load: it segfaults occasionally and throws exception which does not seem to make sense. We have tested both on 3.4 and 3.5.

For a background - we have taken a bit unusual path - as server is heavy on IO, bound to wait for external services but sometimes requires a lot of CPU to process next batch, then we use a many threads. We suspect that issues are related to multi-threading or running out some system level resources. We create connections (sockets) in each thread and do not use these across different threads - due this we have large number of sockets in use.

We have seen such segfaults:

#0  0x00000000004998a6 in sock_dealloc.50395 (s=0x7fafc2573800) at ../Modules/socketmodule.c:3864
#1  0x00000000005356c3 in subtype_dealloc.15489 (self=<unknown at remote 0x7fafc2573800>) at ../Objects/typeobject.c:1201
#2  0x000000000048b0a2 in PyEval_EvalFrameEx (f=f@entry=Frame 0x2281a08, for file /usr/lib/python3.4/ssl.py, line 669, in getpeercert (self=<unknown at remote 0x7fafc2573800>, binary_form=False), throwflag=throwflag@entry=0)
    at ../Python/ceval.c:2421
#3  0x000000000048e45b in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=0x7fb0232080b0, kwcount=0, defs=0x7fb075447178, defcount=1,
    kwdefs=0x0, closure=0x0) at ../Python/ceval.c:3588
#4  0x000000000048a673 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fb00f4ee500, func=<function at remote 0x7fb075443bf8>) at ../Python/ceval.c:4344
#5  call_function (oparg=<optimized out>, pp_stack=0x7fb00f4ee500) at ../Python/ceval.c:4262
#6  PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7fb023207f08, for file /usr/local/lib/python3.4/dist-packages/boto/https_connection.py, line 132, in connect (self=<CertValidatingHTTPSConnection(_tunnel_headers={}, response_class=<ABCMeta(_abc_cache=<WeakSet(_remove=<function at remote 0x7fb074e850d0>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77b00>, _abc_registry=<WeakSet(_remove=<function at remote 0x7fb074e85048>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77860>, __init__=<function at remote 0x7fb074e7cea0>, __doc__=None, _abc_negative_cache=<WeakSet(_remove=<function at remote 0x7fb074e85158>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77da0>, __abstractmethods__=frozenset(), __module__='boto.connection', read=<function at remote 0x7fb074e7cf28>, _abc_negative_cache_version=28) at remote 0x1a48bb8>, ca_certs='/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', _tunnel_port=None, port=443, host='dynamodb.eu-west-1.amazonaws.com', source_address=No...(truncated), throwflag=throwflag@entry=0) at ../Python/ceval.c:2838


and

#0  PyObject_SetAttr (v=v@entry=<unknown at remote 0x7f8eadb00938>, name='keyfile', value=value@entry=None) at ../Objects/object.c:913
#1  0x0000000000486e0f in PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7f8ef2246e88, for file /usr/lib/python3.4/ssl.py, line 546, in __init__ (self=<unknown at remote 0x7f8eadb00938>, sock=<socket at remote 0x7f8eef12e0a8>, keyfile=None, certfile=None, server_side=False, cert_reqs=2, ssl_version=2, ca_certs='/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', do_handshake_on_connect=True, family=<AddressFamily(_value_=2, _name_='AF_INET', __objclass__=<EnumMeta(_value2member_map_={0: <AddressFamily(_value_=0, _name_='AF_UNSPEC', __objclass__=<...>) at remote 0x7f8f4fc51748>, 1: <AddressFamily(_value_=1, _name_='AF_UNIX', __objclass__=<...>) at remote 0x7f8f4fc52108>, 2: <...>, 3: <AddressFamily(_value_=3, _name_='AF_AX25', __objclass__=<...>) at remote 0x7f8f4fc516c8>, 4: <AddressFamily(_value_=4, _name_='AF_IPX', __objclass__=<...>) at remote 0x7f8f4fc519c8>, 5: <AddressFamily(_value_=5, _name_='AF_APPLETALK', __objclass__=<...>) at remote 0x7f8f4fc52188>, 6: <AddressFamily(_value_=6, _name_='AF_NETROM', __objclass__=<...>) at remote 0x7f8f4fc51bc8>, 7: <...(truncated), throwflag=throwflag@entry=0) at ../Python/ceval.c:2123
#2  0x000000000048f2df in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=<optimized out>, defs=0x7f8f4edc5060, kwcount=<optimized out>, kws=0x7f8ed4376a20, argcount=1, args=<optimized out>, locals=0x0, globals=<optimized out>,
    _co=<optimized out>) at ../Python/ceval.c:3588
#3  function_call.78485 (func=<optimized out>, arg=<optimized out>, kw=<optimized out>) at ../Objects/funcobject.c:632
#4  0x000000000053493d in PyObject_Call (
    kw={'ciphers': None, 'suppress_ragged_eofs': True, 'sock': <socket at remote 0x7f8eef12e0a8>, 'server_side': False, 'ca_certs': '/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', 'do_handshake_on_connect': True, 'cert_reqs': 2, 'ssl_version': 2, 'certfile': None, 'keyfile': None}, arg=(<unknown at remote 0x7f8eadb00938>,), func=<function at remote 0x7f8f4edd27b8>) at ../Objects/abstract.c:2040


There are also Python level exceptions:

AttributeError: 'collections.deque' object has no attribute '_decref_socketios'

  .. application stack snipped ...

  File "application/readers/s3_state_reader.py", line 50, in read_state
    bucket = s3_connection.get_bucket(conf.appconf[_version.version]['s3.bucket_name'])
  File "boto/s3/connection.py", line 506, in get_bucket
    return self.head_bucket(bucket_name, headers=headers)
  File "boto/s3/connection.py", line 526, in head_bucket
    body = response.read()
  File "boto/connection.py", line 410, in read
    self._cached_response = http_client.HTTPResponse.read(self)
  File "http/client.py", line 442, in read
    self._close_conn()
  File "http/client.py", line 403, in _close_conn
    fp.close()
  File "python3.5/socket.py", line 645, in close
    self._sock._decref_socketios()

I have looked Python source code, both library and C code, and I am unable to figure out how _sock ends up with deque object.

I would hope that some of developers looks at these traces and helps forward with debugging. I am happy to spend time on this and try to reproduce in isolated manner or in debug environment.
msg303110 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-27 07:34
It looks like somewhere in your application a deque object is being passed where a socket was expected.  The easiest way to find the culprit is to edit the class SocketIO in Lib/socket.py:

    def __init__(self, sock, mode):
        if mode not in ("r", "w", "rw", "rb", "wb", "rwb"):
            raise ValueError("invalid mode: %r" % mode)
        io.RawIOBase.__init__(self)
        self._sock = sock
+       assert not isinstance(sock, deque)
msg303375 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-09-29 23:48
reidfaiv: bpo issues are for patching cypthon, including the stdlib and docs.  Debugging help should be requested on other forums, such as python-list or stackoverflow.

On the other hand, segfaults in pure python code that uses *current* python and does not use cypes or third party extension modules is of concern to us.  Because of such concern, there are crash bugs fixed in every new version and some maintenance releases.  Upgrading your application from 3.4 to 3.6.3 (or soon, 3.7,0) might benefit from the cumulative work and fix your problem.
msg303840 - (view) Author: (reidfaiv) Date: 2017-10-06 17:54
I will withdraw this bug report. I am unable to isolate that issue, hence I can not confirm if this is purely Python crash or caused by some extension. It looks memory corruption to me as segfault moves around and produces different stack traces - the network code is probably just a victim.
History
Date User Action Args
2017-10-06 17:54:19reidfaivsetstatus: open -> closed
resolution: rejected
messages: + msg303840

stage: resolved
2017-09-29 23:48:56terry.reedysetnosy: + terry.reedy
messages: + msg303375
2017-09-27 07:50:20serhiy.storchakasetnosy: + serhiy.storchaka
2017-09-27 07:34:05rhettingersetnosy: + rhettinger
messages: + msg303110
2017-09-26 11:04:28reidfaivcreate