Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_gdb fails in s390x SLES buildbots #78188

Closed
pablogsal opened this issue Jun 30, 2018 · 14 comments
Closed

test_gdb fails in s390x SLES buildbots #78188

pablogsal opened this issue Jun 30, 2018 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@pablogsal
Copy link
Member

BPO 34007
Nosy @vstinner, @pablogsal, @miss-islington, @tirkarthi
PRs
  • bpo-34007: Skip traceback tests if the Program Counter is not available #9018
  • [3.6] bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9018) #9019
  • [2.7] bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9018) #9020
  • [3.7] bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9018) #9021
  • [3.6] bpo-34007: Skip traceback tests if the Program Counter is not available (GH-9018) #9022
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-08-31.22:23:30.435>
    created_at = <Date 2018-06-30.00:24:59.579>
    labels = ['3.7', '3.8', 'type-bug', 'tests']
    title = 'test_gdb fails in s390x SLES buildbots'
    updated_at = <Date 2018-09-07.22:10:31.836>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2018-09-07.22:10:31.836>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-08-31.22:23:30.435>
    closer = 'pablogsal'
    components = ['Tests']
    creation = <Date 2018-06-30.00:24:59.579>
    creator = 'pablogsal'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34007
    keywords = ['patch']
    message_count = 14.0
    messages = ['320747', '320766', '321158', '322805', '323808', '324194', '324200', '324204', '324205', '324350', '324446', '324450', '324451', '324807']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'pablogsal', 'miss-islington', 'xtreak']
    pr_nums = ['9018', '9019', '9020', '9021', '9022']
    priority = 'normal'
    resolution = None
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue34007'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @pablogsal
    Copy link
    Member Author

    test_gdb is currently failing on the following buildbots:

    http://buildbot.python.org/all/#/builders/54/builds/465
    http://buildbot.python.org/all/#/builders/66/builds/178
    http://buildbot.python.org/all/#/builders/16/builds/1279
    http://buildbot.python.org/all/#/builders/122/builds/481

    Sample error lines:

    ======================================================================
    FAIL: test_threads (test.test_gdb.PyBtTests)
    Verify that "py-bt" indicates threads that are waiting for the GIL
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/dje/cpython-buildarea/3.7.edelsohn-sles-z/build/Lib/test/test_gdb.py", line 776, in test_threads
        cmds_after_breakpoint=['thread apply all py-bt'])
      File "/home/dje/cpython-buildarea/3.7.edelsohn-sles-z/build/Lib/test/test_gdb.py", line 213, in get_stack_trace
        self.assertEqual(unexpected_errlines, [])
    AssertionError: Lists differ: ["Python Exception <class 'gdb.error'> PC [58 chars]ved'] != []
    First list contains 2 additional elements.
    First extra element 0:
    "Python Exception <class 'gdb.error'> PC not saved: "
    + []
    - ["Python Exception <class 'gdb.error'> PC not saved: ",
    -  'Error occurred in Python command: PC not saved']

    Ran 46 tests in 19.888s
    FAILED (failures=1)
    1 test failed again:
    test_gdb

    @pablogsal pablogsal added 3.7 (EOL) end of life 3.8 only security fixes tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Jun 30, 2018
    @vstinner
    Copy link
    Member

    In gdb source code, frame_unwind_pc(), I see:

    else if (this_frame->prev_pc.status == CC_NOT_SAVED)
    throw_error (OPTIMIZED_OUT_ERROR, _("PC not saved"));

    @tirkarthi
    Copy link
    Member

    All the builds fail consistently after http://buildbot.python.org/all/#/builders/54/builds/465 and things were passing before 465. Maybe is this something to do with the commit that happened there that we could revert and test it in the machine by doing a git-bisect? But I couldn't see anything on GDB related code being changed in Python on #465. My initial hunch was that it had something to do with GDB 8.0+ but Ubuntu 18.04 LTS also runs on 8.0+ fine.

    Tests pass : 00e0524
    Tests fail : d6a283b

    ➜ cpython git:(master) git log 00e0524~1..d6a283b

    commit d6a283b
    Author: Zackery Spytz <zspytz@gmail.com>
    Date: Fri Jun 29 13:30:07 2018 -0600

    [3.6] bpo-25862: Fix assertion failures in io.TextIOWrapper.tell(). (GH-3918). (GH-8012)
    
    (cherry picked from commit 23db935bcf258657682e66464bf8512def8af830)
    
    Co-authored-by: Zackery Spytz <zspytz@gmail.com>
    

    commit a730279
    Author: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
    Date: Fri Jun 29 03:17:43 2018 -0700

    Fix the indentation in the documentation of bin() and hash() (GH-7998)
    
    (cherry picked from commit bda9c3eae3ad16e43145599ac6359bfdcaf1fd4a)
    
    Co-authored-by: Andrés Delfino <adelfino@gmail.com>
    

    commit 00e0524
    Author: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
    Date: Thu Jun 28 22:32:16 2018 -0700

    bpo-14117: Make minor tweaks to turtledemo (GH-8002)
    
    
    The 'wikipedia' example is now 'rosette', describing what it draws.
    The 'penrose' print output is reduced.  The 'tree' '1024'
    output is eliminated.
    (cherry picked from commit 891a1f86d415779cf67ca23e626a868e586feb05)
    
    Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
    

    @pablogsal
    Copy link
    Member Author

    New failure on s390x SLES 3.7:

    https://buildbot.python.org/all/#/builders/122/builds/540

    @pablogsal
    Copy link
    Member Author

    This failure seems related to the issue:

    https://buildbot.python.org/all/#/builders/139/builds/74

    =====================================================================
    FAIL: test_threads (test.test_gdb.PyBtTests)
    Verify that "py-bt" indicates threads that are waiting for the GIL
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/srv/buildbot/buildarea/2.7.bolen-ubuntu/build/Lib/test/test_gdb.py", line 808, in test_threads
        self.assertIn('Waiting for the GIL', gdb_output)
    test test_gdb failed -- Traceback (most recent call last):
      File "/srv/buildbot/buildarea/2.7.bolen-ubuntu/build/Lib/test/test_gdb.py", line 808, in test_threads
        self.assertIn('Waiting for the GIL', gdb_output)
    AssertionError: 'Waiting for the GIL' not found in 'Breakpoint 1 (PyObject_Print) pending.\n.....

    @pablogsal
    Copy link
    Member Author

    I requested a SLES VM in https://linuxone20.cloud.marist.edu as indicated by @david Edelsohn (the builtbot maintainer) and I can reproduce the issue on master, 3.7, 3.6 and 3.5. So it seems that a change in gdb itself caused this. Interestingly, in Python 3.5 the error is different:

    ======================================================================
    FAIL: test_threads (test.test_gdb.PyBtTests)
    Verify that "py-bt" indicates threads that are waiting for the GIL
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/linux1/cpython/Lib/test/test_gdb.py", line 771, in test_threads
        self.assertIn('Waiting for the GIL', gdb_output)
    AssertionError: 'Waiting for the GIL' not found in 'Breakpoint 1 at 0x11751d8: file Python/bltinmodule.c, line 1093.\n[Thread debugging using libthread_db enabled]\nUsing host libthread_db library "/lib64/libthread_db.so.1".\n[New Thread 0x3fffd2ff910 (LWP 33928)]\n[New Thread 0x3fffcaff910 (LWP 33929)]\n'

    Ran 45 tests in 19.522s

    @pablogsal
    Copy link
    Member Author

    I think I understand what is happening. The SLES buildbots do not have debugging symbols for glibc:

    (gdb) set verbose on
    (gdb) break builtin_id
    Reading in symbols for Python/bltinmodule.c...done.
    Breakpoint 1 at 0x139403c: file Python/bltinmodule.c, line 1182.
    (gdb) r
    Starting program: /home/linux1/cpython/python tester.py
    Reading symbols from /lib/ld64.so.1...(no debugging symbols found)...done.
    Reading symbols from system-supplied DSO at 0x3fffdffe000...(no debugging symbols found)...done.
    Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-62.13.2.s390x
    Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
    Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
    Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
    Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.

    And when requesting the stacktrace of a thread it cannot go into libc because the program counter is not available:

    (gdb) thread apply 2 where

    #19 0x0000000001324f02 in method_call (method=<method at remote 0x3fffdeee258>, args=(), kwargs=0x0) at Objects/classobject.c:306
    #20 0x0000000001047396 in PyObject_Call (callable=<method at remote 0x3fffdeee258>, args=(), kwargs=0x0) at Objects/call.c:245
    #21 0x00000000012b91be in t_bootstrap (boot_raw=0x3fffd9fb118) at ./Modules/_threadmodule.c:992
    #22 0x000003fffde88b22 in start_thread () from /lib64/libpthread.so.0
    #23 0x000003fffdb7164a in thread_start () from /lib64/libc.so.6
    PC not saved

    This get triggered when printing the python stack trace with py-bt and the exception is getting printed on stderr and this fails the test.

    @pablogsal
    Copy link
    Member Author

    I need to do some more experimenting because those symbols are not available on other systems with the test passing. Something is corrupting/not saving the program counter but I do not know what.

    @pablogsal
    Copy link
    Member Author

    Walking the stack up one by one only triggers the "PC not saved" when the stack goes into libc:

    (gdb) break builtin_id
    Breakpoint 1 at 0x139403c: file Python/bltinmodule.c, line 1182.
    (gdb) r
    Starting program: /home/linux1/cpython/python tester.py
    Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-62.13.2.s390x
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    [New Thread 0x3fffd2ff910 (LWP 9951)]
    [New Thread 0x3fffcaff910 (LWP 9952)]
    [New Thread 0x3fff7fff910 (LWP 9953)]
    [New Thread 0x3fff77ff910 (LWP 9954)]

    Thread 1 "python" hit Breakpoint 1, builtin_id (self=0x3fffdf5c0c8, v=0x14f7690 <small_ints+2256>) at Python/bltinmodule.c:1182
    1182 return PyLong_FromVoidPtr(v);
    (gdb) thread 2
    [Switching to thread 2 (Thread 0x3fffd2ff910 (LWP 9951))]
    #0 0x000003fffde8da0a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    (gdb) up
    #1 0x0000000001186a2e in PyCOND_TIMEDWAIT (cond=0x15139b0 <_PyRuntime+1192>, mut=0x15139e0 <_PyRuntime+1240>, us=5000) at Python/condvar.h:90
    90 r = pthread_cond_timedwait((cond), (mut), &ts);
    (gdb)
    #2 0x0000000001186f3a in take_gil (tstate=0x1538620) at Python/ceval_gil.h:208
    208 COND_TIMED_WAIT(_PyRuntime.ceval.gil.cond, _PyRuntime.ceval.gil.mutex,
    (gdb)
    #3 0x00000000011884a6 in _PyEval_EvalFrameDefault (f=0x3fffd844d00, throwflag=0) at Python/ceval.c:980
    980 take_gil(tstate);
    (gdb)
    #4 0x0000000001187db8 in PyEval_EvalFrameEx (f=0x3fffd844d00, throwflag=0) at Python/ceval.c:536
    536 return interp->eval_frame(f, throwflag);
    (gdb)
    #5 0x0000000001047596 in function_code_fastcall (co=0x3fffdd76640, args=0x3fff8000a98, nargs=1, globals=0x3fffddcff30) at Objects/call.c:283
    283 result = PyEval_EvalFrameEx(f,0);
    (gdb)
    #6 0x0000000001048144 in _PyFunction_FastCallKeywords (func=0x3fffd7d8680, stack=0x3fff8000a90, nargs=1, kwnames=0x0) at Objects/call.c:408
    408 return function_code_fastcall(co, stack, nargs, globals);
    (gdb)
    #7 0x00000000011a48b6 in call_function (pp_stack=0x3fffd2fb5f8, oparg=1, kwnames=0x0) at Python/ceval.c:4623
    4623 x = _PyFunction_FastCallKeywords(func, stack, nargs, kwnames);
    (gdb)
    #8 0x000000000119c088 in _PyEval_EvalFrameDefault (f=0x3fff80008f0, throwflag=0) at Python/ceval.c:3186
    3186 res = call_function(&sp, oparg + 1, NULL);
    (gdb)
    #9 0x0000000001187db8 in PyEval_EvalFrameEx (f=0x3fff80008f0, throwflag=0) at Python/ceval.c:536
    536 return interp->eval_frame(f, throwflag);
    (gdb)
    #10 0x0000000001047596 in function_code_fastcall (co=0x3fffda6e580, args=0x3fffd87e980, nargs=1, globals=0x3fffdd717c0) at Objects/call.c:283
    283 result = PyEval_EvalFrameEx(f,0);
    (gdb)
    #11 0x0000000001048144 in _PyFunction_FastCallKeywords (func=0x3fffd7d61b0, stack=0x3fffd87e978, nargs=1, kwnames=0x0) at Objects/call.c:408
    408 return function_code_fastcall(co, stack, nargs, globals);
    (gdb)
    #12 0x00000000011a48b6 in call_function (pp_stack=0x3fffd2fd1b8, oparg=1, kwnames=0x0) at Python/ceval.c:4623
    4623 x = _PyFunction_FastCallKeywords(func, stack, nargs, kwnames);
    (gdb)
    #13 0x000000000119c088 in _PyEval_EvalFrameDefault (f=0x3fffd87e7f0, throwflag=0) at Python/ceval.c:3186
    3186 res = call_function(&sp, oparg + 1, NULL);
    (gdb)
    #14 0x0000000001187db8 in PyEval_EvalFrameEx (f=0x3fffd87e7f0, throwflag=0) at Python/ceval.c:536
    536 return interp->eval_frame(f, throwflag);
    (gdb)
    #15 0x0000000001047596 in function_code_fastcall (co=0x3fffda6e340, args=0x3fffd2fed28, nargs=1, globals=0x3fffdd717c0) at Objects/call.c:283
    283 result = PyEval_EvalFrameEx(f,0);
    (gdb)
    #16 0x0000000001047a2e in _PyFunction_FastCallDict (func=0x3fffd7d3f70, args=0x3fffd2fed20, nargs=1, kwargs=0x0) at Objects/call.c:322
    322 return function_code_fastcall(co, args, nargs, globals);
    (gdb)
    #17 0x00000000010467d6 in _PyObject_FastCallDict (callable=0x3fffd7d3f70, args=0x3fffd2fed20, nargs=1, kwargs=0x0) at Objects/call.c:98
    98 return _PyFunction_FastCallDict(callable, args, nargs, kwargs);
    (gdb)
    #18 0x0000000001049c5c in _PyObject_Call_Prepend (callable=0x3fffd7d3f70, obj=0x3fffdc70650, args=0x3fffdfb9050, kwargs=0x0)
    at Objects/call.c:904
    904 result = _PyObject_FastCallDict(callable,
    (gdb)
    #19 0x0000000001324f02 in method_call (method=0x3fffdeee258, args=0x3fffdfb9050, kwargs=0x0) at Objects/classobject.c:306
    306 return _PyObject_Call_Prepend(func, self, args, kwargs);
    (gdb)
    #20 0x0000000001047396 in PyObject_Call (callable=0x3fffdeee258, args=0x3fffdfb9050, kwargs=0x0) at Objects/call.c:245
    245 result = (*call)(callable, args, kwargs);
    (gdb)
    #21 0x00000000012b91be in t_bootstrap (boot_raw=0x3fffd9fb118) at ./Modules/_threadmodule.c:992
    992 res = PyObject_Call(boot->func, boot->args, boot->keyw);
    (gdb)
    #22 0x000003fffde88b22 in start_thread () from /lib64/libpthread.so.0
    (gdb)
    #23 0x000003fffdb7164a in thread_start () from /lib64/libc.so.6
    (gdb)
    PC not saved

    @pablogsal
    Copy link
    Member Author

    Interestingly, everything points to something special in the glibc version of the buildbot. Usually the end of the stack for threads is:

    (gdb)
    #22 0x00007ffff7f82a9d in start_thread () from /usr/lib/libpthread.so.0
    (gdb)
    #23 0x00007ffff7d23a43 in clone () from /usr/lib/libc.so.6

    but in this buildbot we get:

    #24 0x000003fffde88b22 in start_thread () from /lib64/libpthread.so.0
    (gdb)
    #25 0x000003fffdb7164a in thread_start () from /lib64/libc.so.6
    (gdb)
    PC not saved

    I think that the combination of glibc + gdb is producing this problem probably because the optimization level of glibc.

    @pablogsal
    Copy link
    Member Author

    New changeset f2ef51f by Pablo Galindo in branch 'master':
    bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9018)
    f2ef51f

    @miss-islington
    Copy link
    Contributor

    New changeset 5d594f3 by Miss Islington (bot) in branch '2.7':
    bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9018)
    5d594f3

    @miss-islington
    Copy link
    Contributor

    New changeset 4da7181 by Miss Islington (bot) in branch '3.7':
    bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9018)
    4da7181

    @pablogsal
    Copy link
    Member Author

    New changeset 963bcc8 by Pablo Galindo in branch '3.6':
    [3.6] bpo-34007: Skip traceback tests if the Program Counter is not available. (GH-9022)
    963bcc8

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants