classification
Title: test_gdb fails on Python 3.6 when built with LTO+PGO
Type: Stage: resolved
Components: Build, Tests Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dormouse759, cstratak, inada.naoki, jkloth, koobs, mcepl, mi, miss-islington, petr.viktorin, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2017-05-11 16:16 by vstinner, last changed 2019-06-28 16:19 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
build.log cstratak, 2017-05-11 16:26
root.log cstratak, 2017-05-11 16:27
pgo-lto-gdb-errors-build cstratak, 2017-05-11 20:35
Pull Requests
URL Status Linked Edit
PR 1549 merged vstinner, 2017-05-11 16:27
PR 7709 merged vstinner, 2018-06-15 16:19
PR 7824 merged miss-islington, 2018-06-20 13:22
PR 7825 merged vstinner, 2018-06-20 13:42
PR 7826 merged vstinner, 2018-06-20 13:51
PR 14423 merged inada.naoki, 2019-06-27 12:46
PR 14427 merged miss-islington, 2019-06-27 17:05
Messages (29)
msg293503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-11 16:16
cstratak reported the following test failure on Fedora 24 when building Python 3.6 with LTO + PGO:

======================================================================
FAIL: test_threads (test.test_gdb.PyBtTests)
Verify that "py-bt" indicates threads that are waiting for the GIL
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/Python-3.6.1/Lib/test/test_gdb.py", line 783, in test_threads
    cmds_after_breakpoint=['thread apply all py-bt'])
  File "/builddir/build/BUILD/Python-3.6.1/Lib/test/test_gdb.py", line 218, in get_stack_trace
    self.assertEqual(unexpected_errlines, [])
AssertionError: Lists differ: ["Python Exception <class 'ValueError'> Va[95 chars]nd."] != []
First list contains 2 additional elements.
First extra element 0:
"Python Exception <class 'ValueError'> Variable 'func_obj' not found.: "
+ []
- ["Python Exception <class 'ValueError'> Variable 'func_obj' not found.: ",
-  "Error occurred in Python command: Variable 'func_obj' not found."]
----------------------------------------------------------------------
msg293505 - (view) Author: Charalampos Stratakis (cstratak) * Date: 2017-05-11 16:26
Full build log
msg293506 - (view) Author: Charalampos Stratakis (cstratak) * Date: 2017-05-11 16:27
All the dependencies dragged.

gdb is of version 7.11. The failures do not happen with gdb 7.12 (which exists in later Fedora releases).
msg293507 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-11 16:30
I created https://github.com/python/cpython/pull/1549 to update test_gdb.py and python-gdb.py in Python 3.6. I don't know if it will fix the issus, but it shouldn't hurt :-)
msg293508 - (view) Author: Charalampos Stratakis (cstratak) * Date: 2017-05-11 16:33
Note: test_gdb is skipped on later Fedora's actually (possibly due to gdb package no being dragged at the minimal buildroot) so the issue might still be there, so the gdb version might have no effect on that. Will investigate further.
msg293516 - (view) Author: Charalampos Stratakis (cstratak) * Date: 2017-05-11 20:35
So the issue wasn't restricted to a specific gdb version or distro release, as due to some issues dependency issues the gdb binary wasn't pulled in the buildroot which makes test_gdb to get skipped.

So I was able to reproduce it on my system by installing gdb (version 7.12.1), compiling python 3.6 from sources with
--enable-optimizations and --with-lto flags enabled and running 'make test'

Also applied the relevant PR however it didn't fix the issue.

Attaching the full log of 'make test'
msg293582 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-12 22:21
New changeset d05f7fdf6cf77724bd3064fb5a0846ef5cfe0c88 by Victor Stinner in branch '3.6':
[3.6] bpo-30345: Update test_gdb.py and python-gdb.py from master (#1549)
https://github.com/python/cpython/commit/d05f7fdf6cf77724bd3064fb5a0846ef5cfe0c88
msg295299 - (view) Author: Mikhail (mi) Date: 2017-06-06 20:19
I rebuilt my python-3.6.1 with the path (https://github.com/python/cpython/commit/d05f7fdf6cf77724bd3064fb5a0846ef5cfe0c88) and tried debugging again -- same problem:

% gdb7121 /opt/bin/python3.6
...
(gdb) r SCRIPT.py
Thread 1 received signal SIGSEGV, Segmentation fault.
...
(gdb) py-bt
Python Exception <type 'exceptions.ValueError'> Variable 'func_obj' not found.: 
Error occurred in Python command: Variable 'func_obj' not found.
msg295339 - (view) Author: Mikhail (mi) Date: 2017-06-07 14:11
The actual stack, which I'm trying to debug, begins like this:

#0  0xbfbfd34e in ?? ()
#1  0x2a9ec81e in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
#2  0x2acf0efe in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
#3  0x2acd8b74 in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
#4  0x2acd5d60 in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
#5  0x2acd87ae in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
#6  0x2a9fe2e3 in QWebFrameAdapter::load(QNetworkRequest const&, QNetworkAccessManager::Operation, QByteArray const&) () from /opt/lib/qt5/libQt5WebKit.so.5
#7  0x2d7a18dd in QWebFrame::setUrl(QUrl const&) () from /opt/lib/qt5/libQt5WebKitWidgets.so.5
#8  0x2d7ad5eb in QWebView::setUrl(QUrl const&) () from /opt/lib/qt5/libQt5WebKitWidgets.so.5
#9  0x2d75efd4 in meth_QWebView_setUrl(_object*, _object*) ()
   from /opt/lib/python3.6/site-packages/PyQt5/QtWebKitWidgets.so
#10 0x28125151 in _PyCFunction_FastCallDict () from /opt/lib/libpython3.6m.so.1.0
#11 0x28125326 in _PyCFunction_FastCallKeywords () from /opt/lib/libpython3.6m.so.1.0
#12 0x2819a458 in ?? () from /opt/lib/libpython3.6m.so.1.0
#13 0x28193ab2 in _PyEval_EvalFrameDefault () from /opt/lib/libpython3.6m.so.1.0
#14 0x2819b790 in ?? () from /opt/lib/libpython3.6m.so.1.0
#15 0x2819a425 in ?? () from /opt/lib/libpython3.6m.so.1.0
#16 0x28193ab2 in _PyEval_EvalFrameDefault () from /opt/lib/libpython3.6m.so.1.0
[...]

Maybe, it is "too deep" into the native (not Python) code for the feature to work?
msg295341 - (view) Author: Jeremy Kloth (jkloth) * Date: 2017-06-07 14:53
It seems that commit (https://github.com/python/cpython/commit/c52572319cbd50adff85050a54122c25239a516d) changed the parameter name in the definition of _PyCFunction_FastCallDict().  I believe that changing 'func_obj' to just 'func' should fix it (in Tools/gdb/libpython.py).
msg295348 - (view) Author: Mikhail (mi) Date: 2017-06-07 15:41
So, I tried the modified patch (see http://aldan.algebra.com/~mi/tmp/patch-issue30345) -- and now I simply get a different variable name in the error-message:

(gdb) py-bt
Python Exception <type 'exceptions.ValueError'> Variable 'func' not found.: 
Error occurred in Python command: Variable 'func' not found.

However, the older version of the patch only referenced "func_obj" in test_gdb.py -- not in libpython.py -- so I may have misunderstood Jeremy's suggestion entirely...
msg319614 - (view) Author: Marcel Plch (Dormouse759) * Date: 2018-06-15 13:12
LTO may break the debug symbols and make GDB unusable.
There is an option, that fixes the issue: to use a -g switch in link flags.
Note that this slows loading of the debug symbols significantly.

I suggest these options as possible approaches:

1) make the configure script include -g in LDFLAGS when --enable-optimizations and --with-lto are used

2) same as 1), but only when --with-pydebug is also used.

3) document this problem and make the user aware that this possible fix (-g in link flags) exists
msg319616 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-15 14:27
> - ["Python Exception <class 'ValueError'> Variable 'func_obj' not found.: ",

bpo-32962: My commit 019d33b7a447e78057842332fb5d3bad01922122 "python-gdb catchs ValueError on read_var()" (PR 7692) catches this ValueError.
msg319618 - (view) Author: Marcel Plch (Dormouse759) * Date: 2018-06-15 14:40
Yes, but that is not a fix really in this case.
While it makes the test pass because it 'correctly' prints out unknown objects, it makes no real difference when actually debugging. The -g switch at link time makes the debug symbols readable and user is able to debug just as usual.
msg319621 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-15 14:52
I tested on the current master:

git clean -fdx
./configure --with-lto --enable-optimizations
sed -i -e 's/^PROFILE_TASK=.*/PROFILE_TASK=-c pass/' Makefile
make 2>&1|tee log

Python is compiled twice:

* (1) gcc -DNDEBUG -g -O3 -flto -fprofile-generate (...)
* (2) gcc -DNDEBUG -g -O3 -flto -fprofile-use (...)

I see -g in both compilation steps.

It seems like debug symbols are still here:

vstinner@apu$ file ./python
./python: ELF 64-bit LSB executable, x86-64, (...), with debug_info, not stripped

But I confirm that test_gdb fails when using LTO+PGO.

gdb seems to be to read any C function argument:

$ gdb -args  ./python Lib/test/gdb_sample.py
(gdb) b builtin_id
(gdb) run
Breakpoint 1, 0x0000000000518da0 in builtin_id ()
(gdb) py-bt
Traceback (most recent call first):
  (unable to read python frame information)
  (unable to read python frame information)
  (unable to read python frame information)
  (unable to read python frame information)
msg319623 - (view) Author: Marcel Plch (Dormouse759) * Date: 2018-06-15 15:25
Those -g switches you see there are during compile-time.
For this to work, you need to enable it also during link/time:
./configure --enable-optimizations --with-lto LDFLAGS="-g"

Except for py-bt, you should also try bt. With this link flag enabled, I can observe significant slowdown on my machine during the backtrace when using bt command (At least when I let the PGO do all the profiling, when compiled with the sed edit you posted here, I observe none).
msg319625 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-15 16:21
> Except for py-bt, you should also try bt.

Oh. Using PGO+LTO but without LDFLAGS=-g, bt only shows me function names: all arguments are missing.

I tested with LDFLAGS=-g: py-bt and bt work as expected, and test_gdb pass.

I created PR 7709 to always compile Python with LDFLAGS=-g.
msg319979 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-19 15:24
Very interesting article about PGO and LTO changes in GCC:
https://hubicka.blogspot.com/2018/06/gcc-8-link-time-and-interprocedural.html

See "Early debug info" paragraph.
msg319984 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-19 16:25
New changeset 06fe77a84bd29d51506ab2ff703ae585a6121af2 by Victor Stinner in branch 'master':
bpo-30345: Add -g to LDFLAGS for LTO (GH-7709)
https://github.com/python/cpython/commit/06fe77a84bd29d51506ab2ff703ae585a6121af2
msg319985 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-19 16:25
Ok, I pushed a change to the master branch. Now the question is if Python 2.7, 3.6 and 3.7 should be fixed as well?

I will wait at least one day to see if buildbots are happy.
msg320076 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-20 13:43
I created backports to 2.7, 3.6 and 3.7 branches: do you see any reason to not fix python-gdb.py in these branches? (Any reason to not add -g to $LTOFLAGS?)
msg320213 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 07:10
New changeset 1bb9dd337ed5aa9eafc8e2ce017ceedf044145e3 by Victor Stinner (Miss Islington (bot)) in branch '3.7':
bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) (GH-7824)
https://github.com/python/cpython/commit/1bb9dd337ed5aa9eafc8e2ce017ceedf044145e3
msg320214 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 07:10
New changeset 78392885c9b08021c89649728053d31503d8a509 by Victor Stinner in branch '3.6':
bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) (GH-7826)
https://github.com/python/cpython/commit/78392885c9b08021c89649728053d31503d8a509
msg320215 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 07:11
New changeset 319cfb5f30c2f5da0cdcd556574727ca347cb7de by Victor Stinner in branch '2.7':
bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) (GH-7825)
https://github.com/python/cpython/commit/319cfb5f30c2f5da0cdcd556574727ca347cb7de
msg320216 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 07:14
> I created backports to 2.7, 3.6 and 3.7 branches: do you see any reason to not fix python-gdb.py in these branches? (Any reason to not add -g to $LTOFLAGS?)

Honestly, the risk is very low: only "./configure --with-lto" is impacted and the addition of -g is protected by $ac_cv_prog_cc_g in configure. The -g flag just asks to copy debug symbols, it should not impact compiler performances.
0
Anyway, if something goes wrong, obviously we can revert the change and see for a different fix.

Thanks cstratak for the bug report, and thanks Marcel Plch for the proposed fix (it works well)!
msg346759 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-27 17:05
New changeset 21cfae107e410bf4b0ab3c142ca4449bc33290f5 by Inada Naoki in branch 'master':
bpo-30345: travis: use -Og with --with-pydebug (GH-14423)
https://github.com/python/cpython/commit/21cfae107e410bf4b0ab3c142ca4449bc33290f5
msg346760 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-06-27 17:07
I think test_gdb is useful for 3.8 branch because some PEP 590 relating changes will be backported to it.
But I don't know how test_gdb is useful for older branches.
msg346762 - (view) Author: miss-islington (miss-islington) Date: 2019-06-27 17:24
New changeset 60f24b23bf6ac485d195bb904635bdc3fe646b07 by Miss Islington (bot) in branch '3.8':
bpo-30345: travis: use -Og with --with-pydebug (GH-14423)
https://github.com/python/cpython/commit/60f24b23bf6ac485d195bb904635bdc3fe646b07
msg346831 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-28 16:19
I'm fine with relying on buildbots for test_gdb on Python 3.7 and older. Thanks for the fixes.
History
Date User Action Args
2019-06-28 16:19:24vstinnersetmessages: + msg346831
2019-06-27 17:24:09miss-islingtonsetnosy: + miss-islington
messages: + msg346762
2019-06-27 17:07:13inada.naokisetmessages: + msg346760
2019-06-27 17:05:58miss-islingtonsetpull_requests: + pull_request14243
2019-06-27 17:05:40inada.naokisetmessages: + msg346759
2019-06-27 12:46:39inada.naokisetpull_requests: + pull_request14239
2018-06-22 07:14:38vstinnersetstatus: open -> closed
versions: + Python 2.7, Python 3.7, Python 3.8
messages: + msg320216

resolution: fixed
stage: patch review -> resolved
2018-06-22 07:11:35vstinnersetmessages: + msg320215
2018-06-22 07:10:45vstinnersetmessages: + msg320214
2018-06-22 07:10:22vstinnersetmessages: + msg320213
2018-06-20 13:51:12vstinnersetpull_requests: + pull_request7433
2018-06-20 13:43:53vstinnersetnosy: + pitrou, inada.naoki
messages: + msg320076
2018-06-20 13:42:12vstinnersetpull_requests: + pull_request7432
2018-06-20 13:22:16miss-islingtonsetpull_requests: + pull_request7431
2018-06-19 16:25:50vstinnersetmessages: + msg319985
2018-06-19 16:25:22vstinnersetmessages: + msg319984
2018-06-19 15:24:11vstinnersetmessages: + msg319979
2018-06-15 16:21:16vstinnersetmessages: + msg319625
2018-06-15 16:19:22vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request7323
2018-06-15 15:26:30Dormouse759setnosy: + petr.viktorin
2018-06-15 15:25:31Dormouse759setmessages: + msg319623
2018-06-15 14:52:00vstinnersetmessages: + msg319621
2018-06-15 14:40:01Dormouse759setmessages: + msg319618
2018-06-15 14:27:50vstinnersetmessages: + msg319616
2018-06-15 13:12:37Dormouse759setnosy: + Dormouse759
messages: + msg319614
2018-06-05 10:26:19mceplsetnosy: + mcepl
2017-06-16 05:19:35koobssetnosy: + koobs
2017-06-07 15:41:32misetmessages: + msg295348
2017-06-07 14:53:32jklothsetnosy: + jkloth
messages: + msg295341
2017-06-07 14:11:55misetmessages: + msg295339
2017-06-06 20:19:47misetnosy: + mi
messages: + msg295299
2017-05-12 22:21:53vstinnersetmessages: + msg293582
2017-05-11 20:35:23cstrataksetfiles: + pgo-lto-gdb-errors-build

messages: + msg293516
2017-05-11 16:33:16cstrataksetmessages: + msg293508
2017-05-11 16:30:28vstinnersetmessages: + msg293507
2017-05-11 16:27:55vstinnersetpull_requests: + pull_request1647
2017-05-11 16:27:13cstrataksetfiles: + root.log

messages: + msg293506
2017-05-11 16:26:03cstrataksetfiles: + build.log
nosy: + cstratak
messages: + msg293505

2017-05-11 16:16:32vstinnercreate