classification
Title: Multiple test failures in GCC and Clang optional builds on Travis CI
Type: behavior Stage:
Components: Tests Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: pablogsal, vstinner, xtreak
Priority: normal Keywords:

Created on 2019-03-24 07:35 by xtreak, last changed 2019-12-10 07:50 by xdegaye.

Messages (10)
msg338725 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-03-24 07:35
I am not able to reproduce the errors on GCC built CPython binary and running tests with virtualenv (no coverage). Seems the dangling thread error takes up the whole 50 minutes time limit. Since GCC build is not maintained or tracked is it worth stopping it on Travis since this wastes a lot of build minutes. Clang on Mac optional build never starts running the tests too.

Reference build failures : 

https://travis-ci.org/python/cpython/jobs/510447289
https://travis-ci.org/python/cpython/jobs/510447290
msg338737 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-03-24 15:24
Possibly first occurrence of this error : https://travis-ci.org/python/cpython/jobs/506783665 after which it's more or less consistent. Almost all the builds I checked before this build did not have this failure. The commit for the build seems to be unrelated but just in case : https://github.com/python/cpython/commit/86082c22d23285995a32aabb491527c9f5629556
msg338876 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-26 12:37
> https://travis-ci.org/python/cpython/jobs/510447289

This failure is on the master branch.

./python.exe  ./Tools/scripts/run_tests.py -j 1 -u all -W --slowest --fail-env-changed --timeout=1200 -j4 -uall,-cpu
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/Users/travis/build/python/cpython/Lib/hashlib.py", line 244, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/Users/travis/build/python/cpython/Lib/hashlib.py", line 113, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
(...)

Travis CI config has been changed to use a more recent Ubuntu version, it can explain the failure.

commit 74ae50e53e59bbe39d6287b902757f0cd01327dc
Author: CAM Gerlach <CAM.Gerlach@Gerlach.CAM>
Date:   Mon Mar 18 05:44:58 2019 -0500

    bpo-36307: Travis: upgrade to Xenial environment (GH-12356)
msg338877 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-26 12:40
> https://travis-ci.org/python/cpython/jobs/510447290

That's a run on the master branch ("CRON").

xvfb-run ./venv/bin/python -m coverage run --pylib -m test --fail-env-changed -uall,-cpu -x test_multiprocessing_fork -x test_multiprocessing_forkserver -x test_multiprocessing_spawn -x test_concurrent_futures
== CPython 3.8.0a2+ (heads/master:a7987e7, Mar 23 2019, 23:53:10) [GCC 5.4.0 20160609]
== Linux-4.15.0-1028-gcp-x86_64-with-glibc2.17 little-endian
== cwd: /home/travis/build/python/cpython/build/test_python_26699
== CPU count: 2
== encodings: locale=UTF-8, FS=utf-8
Run tests sequentially
0:00:00 load avg: 1.49 [  1/416] test_grammar
0:00:03 load avg: 1.45 [  2/416] test_opcodes
0:00:03 load avg: 1.45 [  3/416] test_dict
0:00:12 load avg: 1.41 [  4/416] test_builtin
0:00:18 load avg: 1.35 [  5/416] test_exceptions
Exception ignored in: <function ExceptionTests.test_unraisable.<locals>.BrokenDel.__del__ at 0x7f9e77198dc0>
Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/test/test_exceptions.py", line 1182, in __del__
    raise exc
ValueError: del is broken
Exception ignored in: <function ExceptionTests.test_unraisable.<locals>.BrokenExceptionDel.__del__ at 0x7f9e771988c0>
Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/test/test_exceptions.py", line 1188, in __del__
    raise exc
test.test_exceptions.BrokenStrException: <exception str() failed>
test test_exceptions failed -- multiple errors occurred; run in verbose mode for details
0:00:22 load avg: 1.35 [  6/416/1] test_types -- test_exceptions failed
test test_types failed -- Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/test/test_types.py", line 1433, in test_duck_gen
    self.assertIsInstance(gen, collections.abc.Generator)
AssertionError: <MagicMock spec='GenLike' id='140318583278608'> is not an instance of <class 'collections.abc.Generator'>

0:00:25 load avg: 1.32 [  7/416/2] test_unittest -- test_types failed
test test_unittest failed -- multiple errors occurred; run in verbose mode for details
0:01:33 load avg: 1.11 [  8/416/3] test_doctest -- test_unittest failed in 1 min 7 sec
0:01:50 load avg: 1.08 [  9/416/3] test_doctest2
0:01:50 load avg: 1.08 [ 10/416/3] test_support
0:02:11 load avg: 1.05 [ 11/416/3] test___all__
0:02:31 load avg: 1.04 [ 12/416/3] test___future__
0:02:32 load avg: 1.04 [ 13/416/3] test__locale
0:02:32 load avg: 1.04 [ 14/416/3] test__opcode
0:02:34 load avg: 1.03 [ 15/416/3] test__osx_support
0:02:34 load avg: 1.03 [ 16/416/3] test__xxsubinterpreters
Warning -- threading._dangling was modified by test__xxsubinterpreters
  Before: <_weakrefset.WeakSet object at 0x7f9e7751a160>
  After:  <_weakrefset.WeakSet object at 0x7f9e752abb20> 
0:02:50 load avg: 1.03 [ 17/416/4] test_abc -- test__xxsubinterpreters failed (env changed)
0:02:51 load avg: 1.03 [ 18/416/4] test_abstract_numbers
0:02:52 load avg: 1.03 [ 19/416/4] test_aifc
0:02:55 load avg: 1.02 [ 20/416/4] test_argparse
0:05:14 load avg: 1.02 [ 21/416/4] test_array -- test_argparse passed in 2 min 19 sec
0:05:38 load avg: 1.01 [ 22/416/4] test_asdl_parser
0:05:39 load avg: 1.01 [ 23/416/4] test_ast
0:05:51 load avg: 1.01 [ 24/416/4] test_asyncgen


0:05:54 load avg: 0.93 [ 25/416/4] test_asynchat
Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 2)
Dangling thread: <echo_server(Thread-11, stopped 140318458308352)>
(...)
Dangling thread: <echo_server(Thread-43, stopped 140318458308352)>
Dangling thread: <_MainThread(MainThread, started 140318701401856)>
Dangling thread: <echo_server(Thread-19, stopped 140318458308352)>
Warning -- threading._dangling was modified by test_asynchat
  Before: <_weakrefset.WeakSet object at 0x7f9e70f061c0>
  After:  <_weakrefset.WeakSet object at 0x7f9e6fff7af0> 


0:15:10 load avg: 0.96 [ 26/416/5] test_asyncio -- test_asynchat failed (env changed) in 9 min 15 sec
Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 2)
Dangling thread: <_MainThread(MainThread, started 140318701401856)>
Dangling thread: <Thread(ThreadPoolExecutor-0_0, stopped daemon 140318458308352)>
Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 6)
Dangling thread: <Thread(Thread-46, stopped 140318458308352)>
Dangling thread: <Thread(Thread-47, stopped 140318458308352)>
Dangling thread: <Thread(Thread-48, stopped 140318458308352)>
(...)
Dangling thread: <Thread(ThreadPoolExecutor-8_6, stopped daemon 140318402094848)>
Dangling thread: <Thread(QueueManagerThread, stopped daemon 140318427272960)>
Dangling thread: <Thread(ThreadPoolExecutor-8_7, stopped daemon 140317923735296)>
Dangling thread: <Thread(ThreadPoolExecutor-16_0, stopped daemon 140318427272960)>
Dangling thread: <_MainThread(MainThread, started 140318701401856)>


The job exceeded the maximum time limit for jobs, and has been terminated.
msg338879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-26 12:45
> Possibly first occurrence of this error : https://travis-ci.org/python/cpython/jobs/506783665 after which it's more or less consistent.

That's the first build including my change:

commit 86082c22d23285995a32aabb491527c9f5629556
Author: Victor Stinner <vstinner@redhat.com>
Date:   Fri Mar 15 14:57:52 2019 +0100

    bpo-36235: Fix CFLAGS in distutils customize_compiler() (GH-12236)
    
    Fix CFLAGS in customize_compiler() of distutils.sysconfig: when the
    CFLAGS environment variable is defined, don't override CFLAGS variable with
    the OPT variable anymore.
    
    Initial patch written by David Malcolm.
    
    Co-Authored-By: David Malcolm <dmalcolm@redhat.com>

The build starts with:

Setting environment variables from .travis.yml
$ export OPENSSL=1.1.0i
$ export OPENSSL_DIR="$HOME/multissl/openssl/${OPENSSL}"
$ export PATH="${OPENSSL_DIR}/bin:$PATH"
$ export CFLAGS="-I${OPENSSL_DIR}/include -O3"
$ export LDFLAGS="-L${OPENSSL_DIR}/lib"
$ export LD_RUN_PATH="${OPENSSL_DIR}/lib"
$ export OPTIONAL=true

Extract of .travis.yml:

env:
  global:
    - OPENSSL=1.1.0i
    - OPENSSL_DIR="$HOME/multissl/openssl/${OPENSSL}"
    - PATH="${OPENSSL_DIR}/bin:$PATH"
    # Use -O3 because we don't use debugger on Travis-CI
    - CFLAGS="-I${OPENSSL_DIR}/include -O3"
    - LDFLAGS="-L${OPENSSL_DIR}/lib"
    # Set rpath with env var instead of -Wl,-rpath linker flag
    # OpenSSL ignores LDFLAGS when linking bin/openssl
    - LD_RUN_PATH="${OPENSSL_DIR}/lib"

Maybe it's a bad idea to set CFLAGS globally, and they should only set when building Python itself, not when building C extensions?

To be honest, I don't understand well the relationship between CFLAGS and new "Dangling thread: ..." errors. Maybe it's just unrelated.

Another question is why Travis CI is just fine on PR, but fails on "CRON" jobs?
msg338881 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-03-26 12:52
https://bugs.python.org/issue36414#msg338876

> Travis CI config has been changed to use a more recent Ubuntu version, it can explain the failure.

I am confused since the commit changes the linux build to use xenial but the failure is on Mac OS X and it occurs even before the change to xenial that was committed on (March 18, 2019) . 

commit 74ae50e53e59bbe39d6287b902757f0cd01327dc
Author: CAM Gerlach <CAM.Gerlach@Gerlach.CAM>
Date:   Mon Mar 18 05:44:58 2019 -0500

Sample failure before the change : https://travis-ci.org/python/cpython/jobs/506168147 (March 14, 2019)
msg339641 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-04-08 13:57
https://github.com/python/cpython/pull/12708 that seems to fix similar issue (issue36544) for Ubuntu that helps in making Mac OS build green again. 

Successful build : https://travis-ci.org/python/cpython/jobs/516821454
msg339789 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2019-04-09 17:22
FWIW PR 12708 has been merged.
msg342352 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-05-13 16:00
The builds are now running since issue36684 changed the build process splitting the coverage and there are now three test failures in test_gc, test_descr and test_typing (issue36905) unrelated to the original report : 

https://travis-ci.org/python/cpython/jobs/531845094#L1873

test test_gc failed -- Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/test/test_gc.py", line 817, in test_get_objects_arguments
    self.assertEqual(len(gc.get_objects()),
AssertionError: 103063 != 103064

https://travis-ci.org/python/cpython/jobs/531845094#L1816

test test_descr failed -- Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/test/test_descr.py", line 1272, in test_slots
    self.assertEqual(orig_objects, new_objects)
AssertionError: 94174 != 94180


This happens in C coverage test suite

https://travis-ci.org/python/cpython/jobs/531845095#L2486

======================================================================
ERROR: test_build_ext (distutils.tests.test_build_ext.BuildExtTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/distutils/tests/test_build_ext.py", line 91, in test_build_ext
    import xx
ImportError: /tmp/tmpamh6bkg7/xx.cpython-38-x86_64-linux-gnu.so: undefined symbol: __gcov_merge_add
----------------------------------------------------------------------

I am not sure whether to keep this open for three test failures above or to have separate issues. I opened one for test_typing.
msg343712 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-27 23:49
I looked at at recent PR. It's getting better.


"Test code coverage (C)" fails with:

======================================================================
ERROR: test_build_ext (distutils.tests.test_build_ext.BuildExtTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/distutils/tests/test_build_ext.py", line 91, in test_build_ext
    import xx
ImportError: /tmp/tmpyufwrt3r/xx.cpython-38-x86_64-linux-gnu.so: undefined symbol: __gcov_merge_add

Maybe this test is failing for a long time. I don't know.

"Test code coverage (Python)":


> 4 tests failed: test_asyncio test_descr test_gc test_typing

test test_descr failed -- Traceback (most recent call last):
  File "/home/travis/build/python/cpython/Lib/test/test_descr.py", line 1272, in test_slots
    self.assertEqual(orig_objects, new_objects)
AssertionError: 95538 != 95544

Warning -- sys.gettrace was modified by test_audit
  Before: <coverage.CTracer object at 0x7faa2f7bfe70>
  After:  None
History
Date User Action Args
2019-12-10 07:50:10xdegayesetnosy: - xdegaye
2019-05-27 23:49:11vstinnersetmessages: + msg343712
2019-05-13 16:00:39xtreaksetmessages: + msg342352
2019-04-09 17:22:59xdegayesetmessages: + msg339789
2019-04-08 13:57:56xtreaksetnosy: + xdegaye
messages: + msg339641
2019-03-26 12:52:41xtreaksetmessages: + msg338881
2019-03-26 12:45:00vstinnersetmessages: + msg338879
2019-03-26 12:40:32vstinnersetmessages: + msg338877
2019-03-26 12:37:52vstinnersetmessages: + msg338876
2019-03-24 15:24:14xtreaksetmessages: + msg338737
2019-03-24 07:35:20xtreakcreate