We have one application misbehaving in production environment under the load: it segfaults occasionally and throws exception which does not seem to make sense. We have tested both on 3.4 and 3.5.
For a background - we have taken a bit unusual path - as server is heavy on IO, bound to wait for external services but sometimes requires a lot of CPU to process next batch, then we use a many threads. We suspect that issues are related to multi-threading or running out some system level resources. We create connections (sockets) in each thread and do not use these across different threads - due this we have large number of sockets in use.
We have seen such segfaults:
#0 0x00000000004998a6 in sock_dealloc.50395 (s=0x7fafc2573800) at ../Modules/socketmodule.c:3864
#1 0x00000000005356c3 in subtype_dealloc.15489 (self=<unknown at remote 0x7fafc2573800>) at ../Objects/typeobject.c:1201
#2 0x000000000048b0a2 in PyEval_EvalFrameEx (f=f@entry=Frame 0x2281a08, for file /usr/lib/python3.4/ssl.py, line 669, in getpeercert (self=<unknown at remote 0x7fafc2573800>, binary_form=False), throwflag=throwflag@entry=0)
at ../Python/ceval.c:2421
#3 0x000000000048e45b in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=0x7fb0232080b0, kwcount=0, defs=0x7fb075447178, defcount=1,
kwdefs=0x0, closure=0x0) at ../Python/ceval.c:3588
#4 0x000000000048a673 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fb00f4ee500, func=<function at remote 0x7fb075443bf8>) at ../Python/ceval.c:4344
#5 call_function (oparg=<optimized out>, pp_stack=0x7fb00f4ee500) at ../Python/ceval.c:4262
#6 PyEval_EvalFrameEx (
f=f@entry=Frame 0x7fb023207f08, for file /usr/local/lib/python3.4/dist-packages/boto/https_connection.py, line 132, in connect (self=<CertValidatingHTTPSConnection(_tunnel_headers={}, response_class=<ABCMeta(_abc_cache=<WeakSet(_remove=<function at remote 0x7fb074e850d0>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77b00>, _abc_registry=<WeakSet(_remove=<function at remote 0x7fb074e85048>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77860>, __init__=<function at remote 0x7fb074e7cea0>, __doc__=None, _abc_negative_cache=<WeakSet(_remove=<function at remote 0x7fb074e85158>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77da0>, __abstractmethods__=frozenset(), __module__='boto.connection', read=<function at remote 0x7fb074e7cf28>, _abc_negative_cache_version=28) at remote 0x1a48bb8>, ca_certs='/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', _tunnel_port=None, port=443, host='dynamodb.eu-west-1.amazonaws.com', source_address=No...(truncated), throwflag=throwflag@entry=0) at ../Python/ceval.c:2838
and
#0 PyObject_SetAttr (v=v@entry=<unknown at remote 0x7f8eadb00938>, name='keyfile', value=value@entry=None) at ../Objects/object.c:913
#1 0x0000000000486e0f in PyEval_EvalFrameEx (
f=f@entry=Frame 0x7f8ef2246e88, for file /usr/lib/python3.4/ssl.py, line 546, in __init__ (self=<unknown at remote 0x7f8eadb00938>, sock=<socket at remote 0x7f8eef12e0a8>, keyfile=None, certfile=None, server_side=False, cert_reqs=2, ssl_version=2, ca_certs='/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', do_handshake_on_connect=True, family=<AddressFamily(_value_=2, _name_='AF_INET', __objclass__=<EnumMeta(_value2member_map_={0: <AddressFamily(_value_=0, _name_='AF_UNSPEC', __objclass__=<...>) at remote 0x7f8f4fc51748>, 1: <AddressFamily(_value_=1, _name_='AF_UNIX', __objclass__=<...>) at remote 0x7f8f4fc52108>, 2: <...>, 3: <AddressFamily(_value_=3, _name_='AF_AX25', __objclass__=<...>) at remote 0x7f8f4fc516c8>, 4: <AddressFamily(_value_=4, _name_='AF_IPX', __objclass__=<...>) at remote 0x7f8f4fc519c8>, 5: <AddressFamily(_value_=5, _name_='AF_APPLETALK', __objclass__=<...>) at remote 0x7f8f4fc52188>, 6: <AddressFamily(_value_=6, _name_='AF_NETROM', __objclass__=<...>) at remote 0x7f8f4fc51bc8>, 7: <...(truncated), throwflag=throwflag@entry=0) at ../Python/ceval.c:2123
#2 0x000000000048f2df in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=<optimized out>, defs=0x7f8f4edc5060, kwcount=<optimized out>, kws=0x7f8ed4376a20, argcount=1, args=<optimized out>, locals=0x0, globals=<optimized out>,
_co=<optimized out>) at ../Python/ceval.c:3588
#3 function_call.78485 (func=<optimized out>, arg=<optimized out>, kw=<optimized out>) at ../Objects/funcobject.c:632
#4 0x000000000053493d in PyObject_Call (
kw={'ciphers': None, 'suppress_ragged_eofs': True, 'sock': <socket at remote 0x7f8eef12e0a8>, 'server_side': False, 'ca_certs': '/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', 'do_handshake_on_connect': True, 'cert_reqs': 2, 'ssl_version': 2, 'certfile': None, 'keyfile': None}, arg=(<unknown at remote 0x7f8eadb00938>,), func=<function at remote 0x7f8f4edd27b8>) at ../Objects/abstract.c:2040
There are also Python level exceptions:
AttributeError: 'collections.deque' object has no attribute '_decref_socketios'
.. application stack snipped ...
File "application/readers/s3_state_reader.py", line 50, in read_state
bucket = s3_connection.get_bucket(conf.appconf[_version.version]['s3.bucket_name'])
File "boto/s3/connection.py", line 506, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "boto/s3/connection.py", line 526, in head_bucket
body = response.read()
File "boto/connection.py", line 410, in read
self._cached_response = http_client.HTTPResponse.read(self)
File "http/client.py", line 442, in read
self._close_conn()
File "http/client.py", line 403, in _close_conn
fp.close()
File "python3.5/socket.py", line 645, in close
self._sock._decref_socketios()
I have looked Python source code, both library and C code, and I am unable to figure out how _sock ends up with deque object.
I would hope that some of developers looks at these traces and helps forward with debugging. I am happy to spend time on this and try to reproduce in isolated manner or in debug environment. |