Author reidfaiv
Recipients reidfaiv
Date 2017-09-26.11:04:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1506423868.44.0.968850601385.issue31591@psf.upfronthosting.co.za>
In-reply-to
Content
We have one application misbehaving in production environment under the load: it segfaults occasionally and throws exception which does not seem to make sense. We have tested both on 3.4 and 3.5.

For a background - we have taken a bit unusual path - as server is heavy on IO, bound to wait for external services but sometimes requires a lot of CPU to process next batch, then we use a many threads. We suspect that issues are related to multi-threading or running out some system level resources. We create connections (sockets) in each thread and do not use these across different threads - due this we have large number of sockets in use.

We have seen such segfaults:

#0  0x00000000004998a6 in sock_dealloc.50395 (s=0x7fafc2573800) at ../Modules/socketmodule.c:3864
#1  0x00000000005356c3 in subtype_dealloc.15489 (self=<unknown at remote 0x7fafc2573800>) at ../Objects/typeobject.c:1201
#2  0x000000000048b0a2 in PyEval_EvalFrameEx (f=f@entry=Frame 0x2281a08, for file /usr/lib/python3.4/ssl.py, line 669, in getpeercert (self=<unknown at remote 0x7fafc2573800>, binary_form=False), throwflag=throwflag@entry=0)
    at ../Python/ceval.c:2421
#3  0x000000000048e45b in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=0x7fb0232080b0, kwcount=0, defs=0x7fb075447178, defcount=1,
    kwdefs=0x0, closure=0x0) at ../Python/ceval.c:3588
#4  0x000000000048a673 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fb00f4ee500, func=<function at remote 0x7fb075443bf8>) at ../Python/ceval.c:4344
#5  call_function (oparg=<optimized out>, pp_stack=0x7fb00f4ee500) at ../Python/ceval.c:4262
#6  PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7fb023207f08, for file /usr/local/lib/python3.4/dist-packages/boto/https_connection.py, line 132, in connect (self=<CertValidatingHTTPSConnection(_tunnel_headers={}, response_class=<ABCMeta(_abc_cache=<WeakSet(_remove=<function at remote 0x7fb074e850d0>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77b00>, _abc_registry=<WeakSet(_remove=<function at remote 0x7fb074e85048>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77860>, __init__=<function at remote 0x7fb074e7cea0>, __doc__=None, _abc_negative_cache=<WeakSet(_remove=<function at remote 0x7fb074e85158>, _pending_removals=[], _iterating=set(), data=set()) at remote 0x7fb074e77da0>, __abstractmethods__=frozenset(), __module__='boto.connection', read=<function at remote 0x7fb074e7cf28>, _abc_negative_cache_version=28) at remote 0x1a48bb8>, ca_certs='/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', _tunnel_port=None, port=443, host='dynamodb.eu-west-1.amazonaws.com', source_address=No...(truncated), throwflag=throwflag@entry=0) at ../Python/ceval.c:2838


and

#0  PyObject_SetAttr (v=v@entry=<unknown at remote 0x7f8eadb00938>, name='keyfile', value=value@entry=None) at ../Objects/object.c:913
#1  0x0000000000486e0f in PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7f8ef2246e88, for file /usr/lib/python3.4/ssl.py, line 546, in __init__ (self=<unknown at remote 0x7f8eadb00938>, sock=<socket at remote 0x7f8eef12e0a8>, keyfile=None, certfile=None, server_side=False, cert_reqs=2, ssl_version=2, ca_certs='/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', do_handshake_on_connect=True, family=<AddressFamily(_value_=2, _name_='AF_INET', __objclass__=<EnumMeta(_value2member_map_={0: <AddressFamily(_value_=0, _name_='AF_UNSPEC', __objclass__=<...>) at remote 0x7f8f4fc51748>, 1: <AddressFamily(_value_=1, _name_='AF_UNIX', __objclass__=<...>) at remote 0x7f8f4fc52108>, 2: <...>, 3: <AddressFamily(_value_=3, _name_='AF_AX25', __objclass__=<...>) at remote 0x7f8f4fc516c8>, 4: <AddressFamily(_value_=4, _name_='AF_IPX', __objclass__=<...>) at remote 0x7f8f4fc519c8>, 5: <AddressFamily(_value_=5, _name_='AF_APPLETALK', __objclass__=<...>) at remote 0x7f8f4fc52188>, 6: <AddressFamily(_value_=6, _name_='AF_NETROM', __objclass__=<...>) at remote 0x7f8f4fc51bc8>, 7: <...(truncated), throwflag=throwflag@entry=0) at ../Python/ceval.c:2123
#2  0x000000000048f2df in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=<optimized out>, defs=0x7f8f4edc5060, kwcount=<optimized out>, kws=0x7f8ed4376a20, argcount=1, args=<optimized out>, locals=0x0, globals=<optimized out>,
    _co=<optimized out>) at ../Python/ceval.c:3588
#3  function_call.78485 (func=<optimized out>, arg=<optimized out>, kw=<optimized out>) at ../Objects/funcobject.c:632
#4  0x000000000053493d in PyObject_Call (
    kw={'ciphers': None, 'suppress_ragged_eofs': True, 'sock': <socket at remote 0x7f8eef12e0a8>, 'server_side': False, 'ca_certs': '/usr/local/lib/python3.4/dist-packages/boto/cacerts/cacerts.txt', 'do_handshake_on_connect': True, 'cert_reqs': 2, 'ssl_version': 2, 'certfile': None, 'keyfile': None}, arg=(<unknown at remote 0x7f8eadb00938>,), func=<function at remote 0x7f8f4edd27b8>) at ../Objects/abstract.c:2040


There are also Python level exceptions:

AttributeError: 'collections.deque' object has no attribute '_decref_socketios'

  .. application stack snipped ...

  File "application/readers/s3_state_reader.py", line 50, in read_state
    bucket = s3_connection.get_bucket(conf.appconf[_version.version]['s3.bucket_name'])
  File "boto/s3/connection.py", line 506, in get_bucket
    return self.head_bucket(bucket_name, headers=headers)
  File "boto/s3/connection.py", line 526, in head_bucket
    body = response.read()
  File "boto/connection.py", line 410, in read
    self._cached_response = http_client.HTTPResponse.read(self)
  File "http/client.py", line 442, in read
    self._close_conn()
  File "http/client.py", line 403, in _close_conn
    fp.close()
  File "python3.5/socket.py", line 645, in close
    self._sock._decref_socketios()

I have looked Python source code, both library and C code, and I am unable to figure out how _sock ends up with deque object.

I would hope that some of developers looks at these traces and helps forward with debugging. I am happy to spend time on this and try to reproduce in isolated manner or in debug environment.
History
Date User Action Args
2017-09-26 11:04:29reidfaivsetrecipients: + reidfaiv
2017-09-26 11:04:28reidfaivsetmessageid: <1506423868.44.0.968850601385.issue31591@psf.upfronthosting.co.za>
2017-09-26 11:04:28reidfaivlinkissue31591 messages
2017-09-26 11:04:27reidfaivcreate