Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: Modules/gcmodule.c:379: visit_decref: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed. #64725

Closed
vstinner opened this issue Feb 6, 2014 · 30 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vstinner
Copy link
Member

vstinner commented Feb 6, 2014

BPO 20526
Nosy @pitrou, @vstinner, @larryhastings, @ned-deily, @serhiy-storchaka
PRs
  • bpo-20526: Fix PyThreadState_Clear(): don't decref frame #19120
  • [3.8] bpo-20526: Fix PyThreadState_Clear(): don't decref frame (GH-19120) #19136
  • [3.7] bpo-20526: Fix PyThreadState_Clear(): don't decref frame (GH-19120) (GH-19136) #19137
  • Files
  • asyncio_gc.py
  • threading_shutdown_interrupted.py
  • revert_19466.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-24.16:58:23.930>
    created_at = <Date 2014-02-06.09:14:02.239>
    labels = ['interpreter-core', '3.7', '3.8', '3.9', 'type-crash']
    title = "python: Modules/gcmodule.c:379: visit_decref: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed."
    updated_at = <Date 2020-03-24.17:01:40.129>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-03-24.17:01:40.129>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-24.16:58:23.930>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2014-02-06.09:14:02.239>
    creator = 'vstinner'
    dependencies = []
    files = ['33939', '34053', '34056']
    hgrepos = []
    issue_num = 20526
    keywords = ['patch']
    message_count = 30.0
    messages = ['210368', '210982', '210999', '211007', '211013', '211046', '211047', '211049', '211050', '211070', '211146', '211150', '211153', '213793', '214858', '214859', '214863', '214866', '214867', '214887', '214890', '214894', '364851', '364878', '364879', '364935', '364938', '364943', '364944', '364945']
    nosy_count = 9.0
    nosy_names = ['pitrou', 'vstinner', 'larry', 'ned.deily', 'Arfrever', 'neologix', 'python-dev', 'serhiy.storchaka', 'sebastien.renard']
    pr_nums = ['19120', '19136', '19137']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue20526'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2014

    While trying to reproduce a race condition in asyncio, I got an assertion error from the Python garbage collector at exit. It's not easy to reproduce the issue: run attached script on Python 3.4 compiled in debug mode (to get assertions) and press CTRL+c twice. It looks like the bug occurs more often after at least 400 runs.

    The Future class of the asyncio may store an exception in its private Future._exception attribute. An exception stores a traceback. I don't know if it's related.

    Output:
    ---

    ...
    run #347
    ^CTraceback (most recent call last):
      File "x.py", line 73, in <module>
        main()
      File "x.py", line 70, in main
        el.join()
      File "/home/haypo/prog/python/default/Lib/threading.py", line 1061, in join
        self._wait_for_tstate_lock()
      File "/home/haypo/prog/python/default/Lib/threading.py", line 1077, in _wait_for_tstate_lock
        elif lock.acquire(block, timeout):
    run #348
    KeyboardInterrupt
    ...
    run #364
    run #365
    run #366
    run #367
    -cancelled-
    run #368
    ^CException ignored in: <module 'threading' from '/home/haypo/prog/python/default/Lib/threading.py'>
    Traceback (most recent call last):
      File "/home/haypo/prog/python/default/Lib/threading.py", line 1295, in _shutdown
        t.join()
      File "/home/haypo/prog/python/default/Lib/threading.py", line 1061, in join
        self._wait_for_tstate_lock()
      File "/home/haypo/prog/python/default/Lib/threading.py", line 1077, in _wait_for_tstate_lock
        elif lock.acquire(block, timeout):
    KeyboardInterrupt
    -cancelled-
    python: Modules/gcmodule.c:379: visit_decref: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed.
    Abandon (core dumped)
    $ ls *core
    python-6564.core
    
    $ gdb ./python -c python-6564.core

    Core was generated by `./python x.py'.
    Program terminated with signal SIGABRT, Aborted.
    (gdb) where
    #0 0x0000003f3a835c59 in raise () from /lib64/libc.so.6
    #1 0x0000003f3a837368 in abort () from /lib64/libc.so.6
    #2 0x0000003f3a82ebb6 in __assert_fail_base () from /lib64/libc.so.6
    #3 0x0000003f3a82ec62 in __assert_fail () from /lib64/libc.so.6
    #4 0x000000000043ac66 in visit_decref (
    op=Frame 0x7f013c001398, for file x.py, line 43, in task (self=<WorkerThread(_kwargs={}, loop=<_UnixSelectorEventLoop(_scheduled=[], _default_executor=None, _running=True, _ready=<collections.deque at remote 0x7f014bbed6e0>, _selector=<EpollSelector(_map=<_SelectorMapping(_selector=<...>) at remote 0x7f014c3bff60>, _epoll=<select.epoll at remote 0x7f014d5bce08>, _fd_to_key={5: <SelectorKey at remote 0x7f014bbecaf8>}) at remote 0x7f014c3bfae8>, _granularity=<float at remote 0x7f014e8c76b8>, _ssock=<socket at remote 0x7f014c3c5a78>, _internal_fds=1, _csock=<socket at remote 0x7f014bbeb5f8>, _signal_handlers={}) at remote 0x7f014c437400>, _is_stopped=False, _target=None, _daemonic=False, executor=<ThreadPoolExecutor(_max_workers=5, _shutdown=False, _work_queue=<Queue(not_empty=<Condition(release=<built-in method release of _thread.lock object at remote 0x7f014c3bb1a0>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f014c3bb1a0>, _lock=<_thread.lock at remote 0x7f014c3bb1a0>, _waiters=<collecti...(truncated), data=0x0) at Modules/gcmodule.c:379
    #5 0x00000000004336bd in tb_traverse (tb=0x7f01493a66e8, visit=0x43abb4 <visit_decref>, arg=0x0) at Python/traceback.c:64
    #6 0x000000000043acdc in subtract_refs (containers=0x8f1a20 <generations+64>) at Modules/gcmodule.c:398
    #7 0x000000000043bdee in collect (generation=2, n_collected=0x7fff499317c0, n_uncollectable=0x7fff499317b8, nofail=0)
    at Modules/gcmodule.c:957
    #8 0x000000000043c455 in collect_with_callback (generation=2) at Modules/gcmodule.c:1128
    #9 0x000000000043d1d5 in PyGC_Collect () at Modules/gcmodule.c:1604
    #10 0x000000000041e959 in Py_Finalize () at Python/pythonrun.c:605
    #11 0x000000000043a890 in Py_Main (argc=2, argv=0x22bb020) at Modules/main.c:771
    #12 0x000000000041aba9 in main (argc=2, argv=0x7fff49931ab8) at ./Modules/python.c:69

    (gdb) frame 4
    #4 0x000000000043ac66 in visit_decref (

    (gdb) print op
    $3 = Frame 0x7f013c001398, for file x.py, line 43, in task (self=<WorkerThread(_kwargs={}, loop=<_UnixSelectorEventLoop(_scheduled=[], _default_executor=None, _running=True, _ready=<collections.deque at remote 0x7f014bbed6e0>, _selector=<EpollSelector(_map=<_SelectorMapping(_selector=<...>) at remote 0x7f014c3bff60>, _epoll=<select.epoll at remote 0x7f014d5bce08>, _fd_to_key={5: <SelectorKey at remote 0x7f014bbecaf8>}) at remote 0x7f014c3bfae8>, _granularity=<float at remote 0x7f014e8c76b8>, _ssock=<socket at remote 0x7f014c3c5a78>, _internal_fds=1, _csock=<socket at remote 0x7f014bbeb5f8>, _signal_handlers={}) at remote 0x7f014c437400>, _is_stopped=False, _target=None, _daemonic=False, executor=<ThreadPoolExecutor(_max_workers=5, _shutdown=False, _work_queue=<Queue(not_empty=<Condition(release=<built-in method release of _thread.lock object at remote 0x7f014c3bb1a0>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f014c3bb1a0>, _lock=<_thread.lock at remote 0x7f014c3bb1a0>, _waiters=<collecti...(truncated)
    ---

    @vstinner vstinner added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 6, 2014
    @vstinner
    Copy link
    Member Author

    At the first CTRL+c, the main thread exit and enters Py_Finalize() which calls wait_for_thread_shutdown() (in Python: threading._shutdown()).

    The problem is that the second call interrupted threading._shutdown() (still in the main thread).

    #9 0x000000000043d1d5 in PyGC_Collect () at Modules/gcmodule.c:1604
    #10 0x000000000041e959 in Py_Finalize () at Python/pythonrun.c:605

    At this point, we can expect that only one Python thread is running because wait_for_thread_shutdown() has been called, but there are between 2 and 9 running Python threads.

    If wait_for_thread_shutdown() is interrupted, maybe Python should kill all other threads with pthread_kill() or something like that.

    Or wait_for_thread_shutdown() should maybe block SIGINT and SIGTERM signals using pthread_sigmark()?

    @pitrou
    Copy link
    Member

    pitrou commented Feb 11, 2014

    If wait_for_thread_shutdown() is interrupted, maybe Python should kill > all other threads with pthread_kill() or something like that.

    Or wait_for_thread_shutdown() should maybe block SIGINT and SIGTERM
    signals using pthread_sigmark()?

    This sounds a bit desperate.
    Also, it doesn't explain why collecting the frame crashes. Does the refcount become inconsistent?

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Feb 11, 2014

    Random thoughts:

    • executor-created threads are daemon (unfortunately)
    • see issue bpo-19466: I think that the change to clear the frame of daemon
      threads was a mistake

    @vstinner
    Copy link
    Member Author

    Random thoughts:

    • executor-created threads are daemon (unfortunately)

    Are you sure? I didn't see the daemon flag in ThreadPoolExecutor.

    @vstinner
    Copy link
    Member Author

    Are you sure? I didn't see the daemon flag in ThreadPoolExecutor.

    Oh you're right, ThreadPoolExecutor does create daemon threads:

    class ThreadPoolExecutor(_base.Executor):
        ...
    
        def _adjust_thread_count(self):
            # When the executor gets lost, the weakref callback will wake up
            # the worker threads.
            def weakref_cb(_, q=self._work_queue):
                q.put(None)
            # TODO(bquinlan): Should avoid creating new threads if there are more
            # idle threads than items in the work queue.
            if len(self._threads) < self._max_workers:
                t = threading.Thread(target=_worker,
                                     args=(weakref.ref(self, weakref_cb),
                                           self._work_queue))
                t.daemon = True   <======= HERE
                t.start()
                self._threads.add(t)
                _threads_queues[t] = self._work_queue

    @vstinner
    Copy link
    Member Author

    threading_shutdown_interrupted.py: simpler and reliable script to reproduce the bug.

    @vstinner
    Copy link
    Member Author

    Charles-François wrote:

    see issue bpo-19466: I think that the change to clear the frame of daemon
    threads was a mistake

    Good catch. If I reverted changes of bpo-19466, threading_shutdown_interrupted.py doesn't crash anymore.

    It doesn't explain how the GC header of a frame object becomes inconsistent.

    IMO fixing bpo-19466 was a good idea but because these changes, some bugs became more likley.

    Maybe we can revert bpo-19466, then try to understand and fix this GC inconstency, and later fix again bpo-19466?

    --

    Oh by the way, threading_shutdown_interrupted.py shows also a deadlock. The main threads waits in threading._shutdown(), but this function is interrupted. Then it calls flush_std_files() which waits for a lock on stdout. The problem is that a thread was writing into stdout, but this thread is no more running and holds the lock.

    @vstinner
    Copy link
    Member Author

    I set the priority to release blocker because it's a crash, it's a regression since Python 3.3 and it can be easily fixed (revert changes done in bpo-19466).

    @vstinner
    Copy link
    Member Author

    revert_19466.patch: Patch to revert changes of issue bpo-19466. It restores how Python cleared threads in Python 3.3.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Feb 13, 2014

    Please revert.

    To debug, since I guess it's due to a memory corruption because objects are
    deallocated while they're still in use, you could try to use valgrind.
    Unfortunately, since it's due to a race condition, the overhead will
    probably make it really hard to reproduce :-(

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 13, 2014

    New changeset 1166b3321012 by Victor Stinner in branch 'default':
    Issue bpo-20526, bpo-19466: Revert changes of issue bpo-19466 which introduces a
    http://hg.python.org/cpython/rev/1166b3321012

    @vstinner
    Copy link
    Member Author

    To debug, since I guess it's due to a memory corruption because
    objects are deallocated while they're still in use, you could try
    to use valgrind.
    Unfortunately, since it's due to a race condition, the overhead
    will probably make it really hard to reproduce :-(

    I added assert(_Py_Finalizing == NULL || _Py_Finalizing == PyThreadState_GET()) almost everywhere, but the assertion didn't fail. It looks like only the main thread is running between the interrupted wait_for_thread_shutdown() and the final "python: Modules/gcmodule.c:379: visit_decref: ... failed" assertion in PyGC_Collect().

    I tried to add directly assert(_PyGCHead_REFS(gc) != 0) in _PyGCHead_SET_REFS(), but the assertion didn't fail.

    I also ran Python in Valgrind: no error neither.

    Ok...

    I close this issue as won't fix because it was occur with bpo-19466 applied, but I just reverted this patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 17, 2014

    New changeset 9ce58a73b6b5 by Victor Stinner in branch '3.4':
    Issue bpo-20526, bpo-19466: Revert changes of issue bpo-19466 which introduces a
    http://hg.python.org/cpython/rev/9ce58a73b6b5

    @sebastienrenard
    Copy link
    Mannequin

    sebastienrenard mannequin commented Mar 25, 2014

    Hello,

    I encounter a quite similar issue with python 3.4.0 and cx_Oracle. It segfault from time to time (hard to reproduce) on visit_decref at Modules/gcmodule.c:373.
    There were no issue with python 2.7. I did not test with 3.3.

    With gdb i got the following stacktrace :
    (gdb) info threads
    Id Target Id Frame

    #31 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #32 0x00000000004c5e85 in fast_function (nk=<optimized out>, na=2, n=2, pp_stack=0x7fffffffcc50, func=0x7ffff1635d90) at Python/ceval.c:4324
    #33 call_function (oparg=<optimized out>, pp_stack=0x7fffffffcc50) at Python/ceval.c:4252
    #34 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #35 0x00000000004c5e85 in fast_function (nk=<optimized out>, na=2, n=2, pp_stack=0x7fffffffce30, func=0x7ffff2513730) at Python/ceval.c:4324
    #36 call_function (oparg=<optimized out>, pp_stack=0x7fffffffce30) at Python/ceval.c:4252
    #37 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #38 0x00000000004c5e85 in fast_function (nk=<optimized out>, na=2, n=2, pp_stack=0x7fffffffd010, func=0x7ffff1633378) at Python/ceval.c:4324
    #39 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd010) at Python/ceval.c:4252
    #40 PyEval_EvalFrameEx (f=f@entry=0xab9478, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #41 0x00000000004be47b in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kws=0xab9380, kwcount=0, defs=0x7ffff2504b88, defcount=1, kwdefs=0x0, 
        closure=0x0) at Python/ceval.c:3578
    #42 0x00000000004c3d91 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=0x7fffffffd2a0, func=0x7ffff2513400) at Python/ceval.c:4334
    #43 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd2a0) at Python/ceval.c:4252
    #44 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #45 0x00000000004c5e85 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffd480, func=0x7ffff16331e0) at Python/ceval.c:4324
    #46 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd480) at Python/ceval.c:4252
    #47 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #48 0x00000000004c5e85 in fast_function (nk=<optimized out>, na=0, n=0, pp_stack=0x7fffffffd660, func=0x7ffff6ed4d08) at Python/ceval.c:4324
    #49 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd660) at Python/ceval.c:4252
    #50 PyEval_EvalFrameEx (f=f@entry=0x7ffff7eceaf8, throwflag=throwflag@entry=0) at Python/ceval.c:2829
    #51 0x00000000004be47b in PyEval_EvalCodeEx (_co=_co@entry=0x7ffff6ed14b0, globals=globals@entry=0x7ffff7ef1388, locals=locals@entry=0x7ffff7ef1388, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=kwcount@entry=0, 
        defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0) at Python/ceval.c:3578
    #52 0x00000000004be54b in PyEval_EvalCode (co=co@entry=0x7ffff6ed14b0, globals=globals@entry=0x7ffff7ef1388, locals=locals@entry=0x7ffff7ef1388) at Python/ceval.c:773
    #53 0x00000000004ed844 in run_mod (arena=0x8fea30, flags=0x7fffffffd910, locals=0x7ffff7ef1388, globals=0x7ffff7ef1388, filename=0x7ffff7ea7a78, mod=0x97c090) at Python/pythonrun.c:2175
    #54 PyRun_InteractiveOneObject (fp=fp@entry=0x7ffff74b6340 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7ea7a78, flags=flags@entry=0x7fffffffd910) at Python/pythonrun.c:1441
    #55 0x00000000004edb7e in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74b6340 <_IO_2_1_stdin_>, filename_str=filename_str@entry=0x580d63 "<stdin>", flags=flags@entry=0x7fffffffd910) at Python/pythonrun.c:1320
    #56 0x00000000004edcd6 in PyRun_AnyFileExFlags (fp=fp@entry=0x7ffff74b6340 <_IO_2_1_stdin_>, filename=<optimized out>, closeit=0, flags=flags@entry=0x7fffffffd910) at Python/pythonrun.c:1282
    #57 0x000000000041e243 in run_file (p_cf=0x7fffffffd910, filename=0x0, fp=0x7ffff74b6340 <_IO_2_1_stdin_>) at Modules/main.c:319
    #58 Py_Main (argc=argc@entry=1, argv=argv@entry=0x872010) at Modules/main.c:751
    #59 0x000000000041aacc in main (argc=1, argv=<optimized out>) at ./Modules/python.c:69
    (gdb) 

    Regards

    @pitrou
    Copy link
    Member

    pitrou commented Mar 25, 2014

    Hi Sebastien,
    Those symptoms are actually quite generic. If you want do diagnose your issue, I would suggest you compile a debug build of Python (./configure --with-pydebug), it will enable many additional checks (and of course be quite a bit slower too...).

    @sebastienrenard
    Copy link
    Mannequin

    sebastienrenard mannequin commented Mar 25, 2014

    Hi Antoine,
    Thanks for your quick answer. I compiled with debug and compile cx_oracle again. Here the stack trace with gdb. Hope it will help.

    Program received signal SIGSEGV, Segmentation fault.
    0x000000000043ab98 in visit_decref (op=0xb00000000, data=0x0) at Modules/gcmodule.c:373
    373 if (PyObject_IS_GC(op)) {
    (gdb) backtrace
    #0 0x000000000043ab98 in visit_decref (op=0xb00000000, data=0x0) at Modules/gcmodule.c:373
    #1 0x000000000048193a in BaseException_traverse (self=0x7ffff0f645f8, visit=0x43ab64 <visit_decref>, arg=0x0) at Objects/exceptions.c:97
    #2 0x00000000004dc4cc in subtype_traverse (self=0x7ffff0f645f8, visit=0x43ab64 <visit_decref>, arg=0x0) at Objects/typeobject.c:972
    #3 0x000000000043ac8c in subtract_refs (containers=0x8f2ec0 <generations>) at Modules/gcmodule.c:398
    #4 0x000000000043bda0 in collect (generation=0, n_collected=0x7ffffffe6860, n_uncollectable=0x7ffffffe6858, nofail=0) at Modules/gcmodule.c:957
    #5 0x000000000043c409 in collect_with_callback (generation=0) at Modules/gcmodule.c:1128
    #6 0x000000000043c4a0 in collect_generations () at Modules/gcmodule.c:1151
    #7 0x000000000043d77c in _PyObject_GC_Malloc (basicsize=48) at Modules/gcmodule.c:1726
    #8 0x000000000043d7b4 in _PyObject_GC_New (tp=0x90f980 <PyListIter_Type>) at Modules/gcmodule.c:1736
    #9 0x000000000049bec5 in list_iter (seq=0x7ffff0f807c8) at Objects/listobject.c:2738
    #10 0x00000000004620d4 in PyObject_GetIter (o=0x7ffff0f807c8) at Objects/abstract.c:2656
    #11 0x00000000005a4923 in PyEval_EvalFrameEx (f=0xbf4878, throwflag=0) at Python/ceval.c:2640
    #12 0x00000000005aab83 in PyEval_EvalCodeEx (_co=0x7ffff672fb80, globals=0x7ffff6721838, locals=0x0, args=0x7ffff13c79b0, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:3578
    #13 0x000000000065b49a in function_call (func=0x7ffff66f19b0, arg=0x7ffff13c7988, kw=0x0) at Objects/funcobject.c:632
    #14 0x0000000000460069 in PyObject_Call (func=0x7ffff66f19b0, arg=0x7ffff13c7988, kw=0x0) at Objects/abstract.c:2067
    #15 0x000000000047e933 in method_call (func=0x7ffff66f19b0, arg=0x7ffff13c7988, kw=0x0) at Objects/classobject.c:347
    #16 0x0000000000460069 in PyObject_Call (func=0x7ffff7ebd3d8, arg=0x7ffff6e30468, kw=0x0) at Objects/abstract.c:2067
    #17 0x00000000004dd338 in call_method (o=0x7ffff0f7cdc0, nameid=0x91a860 <id.11627>, format=0x67cd45 "(O)") at Objects/typeobject.c:1394
    #18 0x00000000004eb850 in slot_mp_subscript (self=0x7ffff0f7cdc0, arg1=0x7ffff0f86040) at Objects/typeobject.c:5585
    #19 0x000000000045aa4b in PyObject_GetItem (o=0x7ffff0f7cdc0, key=0x7ffff0f86040) at Objects/abstract.c:145
    #20 0x00000000005994e0 in PyEval_EvalFrameEx (f=0xbf42f8, throwflag=0) at Python/ceval.c:1571
    #21 0x00000000005aab83 in PyEval_EvalCodeEx (_co=0x7ffff2317d00, globals=0x7ffff25c6d08, locals=0x0, args=0xbf41b8, argcount=3, kws=0xbf41d0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x7ffff1c797c8, closure=0x0) at Python/ceval.c:3578
    #22 0x00000000005adb71 in fast_function (func=0x7ffff1c7ba68, pp_stack=0x7ffffffea1b8, n=3, na=3, nk=0) at Python/ceval.c:4334
    #23 0x00000000005ad64c in call_function (pp_stack=0x7ffffffea1b8, oparg=2) at Python/ceval.c:4252
    #24 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xbf4018, throwflag=0) at Python/ceval.c:2829
    #25 0x00000000005ada33 in fast_function (func=0x7ffff1c823f0, pp_stack=0x7ffffffebb08, n=2, na=2, nk=0) at Python/ceval.c:4324
    #26 0x00000000005ad64c in call_function (pp_stack=0x7ffffffebb08, oparg=1) at Python/ceval.c:4252
    #27 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xd21018, throwflag=0) at Python/ceval.c:2829
    #28 0x00000000005aab83 in PyEval_EvalCodeEx (_co=0x7ffff58f3280, globals=0x7ffff25db1a8, locals=0x0, args=0xbdcc00, argcount=2, kws=0xbdcc10, kwcount=3, defs=0x7ffff1ca6440, defcount=3, kwdefs=0x0, closure=0x0) at Python/ceval.c:3578
    #29 0x00000000005adb71 in fast_function (func=0x7ffff1c646d0, pp_stack=0x7ffffffed648, n=8, na=2, nk=3) at Python/ceval.c:4334
    #30 0x00000000005ad64c in call_function (pp_stack=0x7ffffffed648, oparg=770) at Python/ceval.c:4252
    #31 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xbdca38, throwflag=0) at Python/ceval.c:2829
    #32 0x00000000005ada33 in fast_function (func=0x7ffff13d36d0, pp_stack=0x7ffffffeef98, n=2, na=2, nk=0) at Python/ceval.c:4324
    ---Type <return> to continue, or q <return> to quit---
    #33 0x00000000005ad64c in call_function (pp_stack=0x7ffffffeef98, oparg=1) at Python/ceval.c:4252
    #34 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xd1d118, throwflag=0) at Python/ceval.c:2829
    #35 0x00000000005ada33 in fast_function (func=0x7ffff22e3eb8, pp_stack=0x7fffffff08e8, n=2, na=2, nk=0) at Python/ceval.c:4324
    #36 0x00000000005ad64c in call_function (pp_stack=0x7fffffff08e8, oparg=2) at Python/ceval.c:4252
    #37 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xd1c278, throwflag=0) at Python/ceval.c:2829
    #38 0x00000000005ada33 in fast_function (func=0x7ffff13d23f0, pp_stack=0x7fffffff2238, n=2, na=2, nk=0) at Python/ceval.c:4324
    #39 0x00000000005ad64c in call_function (pp_stack=0x7fffffff2238, oparg=1) at Python/ceval.c:4252
    #40 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xcda7d8, throwflag=0) at Python/ceval.c:2829
    #41 0x00000000005ada33 in fast_function (func=0x7ffff13d5bd8, pp_stack=0x7fffffff3b88, n=2, na=2, nk=0) at Python/ceval.c:4324
    #42 0x00000000005ad64c in call_function (pp_stack=0x7fffffff3b88, oparg=1) at Python/ceval.c:4252
    #43 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xbf5558, throwflag=0) at Python/ceval.c:2829
    #44 0x00000000005ada33 in fast_function (func=0x7ffff22e3eb8, pp_stack=0x7fffffff54d8, n=2, na=2, nk=0) at Python/ceval.c:4324
    #45 0x00000000005ad64c in call_function (pp_stack=0x7fffffff54d8, oparg=2) at Python/ceval.c:4252
    #46 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xb945b8, throwflag=0) at Python/ceval.c:2829
    #47 0x00000000005ada33 in fast_function (func=0x7ffff13d23f0, pp_stack=0x7fffffff6e28, n=2, na=2, nk=0) at Python/ceval.c:4324
    #48 0x00000000005ad64c in call_function (pp_stack=0x7fffffff6e28, oparg=1) at Python/ceval.c:4252
    #49 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xbf37a8, throwflag=0) at Python/ceval.c:2829
    #50 0x00000000005aab83 in PyEval_EvalCodeEx (_co=0x7ffff234f700, globals=0x7ffff234dfa8, locals=0x0, args=0xbf3668, argcount=1, kws=0xbf3670, kwcount=0, defs=0x7ffff2352c48, defcount=1, kwdefs=0x0, closure=0x0) at Python/ceval.c:3578
    #51 0x00000000005adb71 in fast_function (func=0x7ffff22e3a68, pp_stack=0x7fffffff8968, n=1, na=1, nk=0) at Python/ceval.c:4334
    #52 0x00000000005ad64c in call_function (pp_stack=0x7fffffff8968, oparg=0) at Python/ceval.c:4252
    #53 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xbf34c8, throwflag=0) at Python/ceval.c:2829
    #54 0x00000000005ada33 in fast_function (func=0x7ffff13d21c8, pp_stack=0x7fffffffa2b8, n=1, na=1, nk=0) at Python/ceval.c:4324
    #55 0x00000000005ad64c in call_function (pp_stack=0x7fffffffa2b8, oparg=0) at Python/ceval.c:4252
    #56 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xab0208, throwflag=0) at Python/ceval.c:2829
    #57 0x00000000005ada33 in fast_function (func=0x7ffff6e106d0, pp_stack=0x7fffffffbc08, n=0, na=0, nk=0) at Python/ceval.c:4324
    #58 0x00000000005ad64c in call_function (pp_stack=0x7fffffffbc08, oparg=0) at Python/ceval.c:4252
    #59 0x00000000005a5b37 in PyEval_EvalFrameEx (f=0xa12bc8, throwflag=0) at Python/ceval.c:2829
    #60 0x00000000005aab83 in PyEval_EvalCodeEx (_co=0x7ffff6e2e7c0, globals=0x7ffff7eb9ad8, locals=0x7ffff7eb9ad8, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:3578
    #61 0x0000000000596405 in PyEval_EvalCode (co=0x7ffff6e2e7c0, globals=0x7ffff7eb9ad8, locals=0x7ffff7eb9ad8) at Python/ceval.c:773
    #62 0x00000000004239ae in run_mod (mod=0xa9a180, filename=0x7ffff7e18380, globals=0x7ffff7eb9ad8, locals=0x7ffff7eb9ad8, flags=0x7fffffffd910, arena=0xa98800) at Python/pythonrun.c:2175
    #63 0x0000000000420df9 in PyRun_InteractiveOneObject (fp=0x7ffff74b6340 <IO_2_1_stdin>, filename=0x7ffff7e18380, flags=0x7fffffffd910) at Python/pythonrun.c:1441
    #64 0x00000000004206fd in PyRun_InteractiveLoopFlags (fp=0x7ffff74b6340 <IO_2_1_stdin>, filename_str=0x661936 "<stdin>", flags=0x7fffffffd910) at Python/pythonrun.c:1320
    #65 0x00000000004204f3 in PyRun_AnyFileExFlags (fp=0x7ffff74b6340 <IO_2_1_stdin>, filename=0x661936 "<stdin>", closeit=0, flags=0x7fffffffd910) at Python/pythonrun.c:1282
    ---Type <return> to continue, or q <return> to quit---
    #66 0x0000000000439b4f in run_file (fp=0x7ffff74b6340 <IO_2_1_stdin>, filename=0x0, p_cf=0x7fffffffd910) at Modules/main.c:319
    #67 0x000000000043a791 in Py_Main (argc=1, argv=0x975020) at Modules/main.c:751
    #68 0x000000000041aa19 in main (argc=1, argv=0x7fffffffdb08) at ./Modules/python.c:69
    (gdb)

    @sebastienrenard
    Copy link
    Mannequin

    sebastienrenard mannequin commented Mar 25, 2014

    Same issue with a fresh python 3.3.5:

    361 if (PyObject_IS_GC(op)) {
    (gdb) backtrace
    #0 visit_decref (op=0xb00000000, data=data@entry=0x0) at Modules/gcmodule.c:361
    #1 0x000000000052d9da in BaseException_traverse (self=0x7ffff156d328, visit=0x4c0800 <visit_decref>, arg=0x0) at Objects/exceptions.c:100

    Is it a python bug, or could it be an issue with the cx_oracle extension ?

    @pitrou
    Copy link
    Member

    pitrou commented Mar 25, 2014

    Well, 0xb00000000 is an unlikely pointer value, especially since other dynamically-allocated pointers seem to lie in other memory areas. So it would look like there's some memory corruption here.

    As for whether it's a Python issue, try reproducing without cx_Oracle? If you can't, it's likely to be a bug in cx_Oracle.

    @vstinner
    Copy link
    Member Author

    I encounter a quite similar issue with python 3.4.0 and cx_Oracle.

    Your issue is different, compare the top frames:

    My trace:

    #0  0x0000003f3a835c59 in raise () from /lib64/libc.so.6
    #1  0x0000003f3a837368 in abort () from /lib64/libc.so.6
    #2  0x0000003f3a82ebb6 in __assert_fail_base () from /lib64/libc.so.6
    #3  0x0000003f3a82ec62 in __assert_fail () from /lib64/libc.so.6
    #4  0x000000000043ac66 in visit_decref (
        op=Frame 0x7f013c001398, for file x.py, line 43 (...) at Modules/gcmodule.c:379
    #5  0x00000000004336bd in tb_traverse (tb=0x7f01493a66e8, visit=0x43abb4 <visit_decref>, arg=0x0) at Python/traceback.c:64
    #6  0x000000000043acdc in subtract_refs (containers=0x8f1a20 <generations+64>) at Modules/gcmodule.c:398

    cx_Oracle trace:

    #0 0x000000000043ab98 in visit_decref (op=0xb00000000, data=0x0) at Modules/gcmodule.c:373
    #1 0x000000000048193a in BaseException_traverse (self=0x7ffff0f645f8, visit=0x43ab64 <visit_decref>, arg=0x0) at Objects/exceptions.c:97
    #2 0x00000000004dc4cc in subtype_traverse (self=0x7ffff0f645f8, visit=0x43ab64 <visit_decref>, arg=0x0) at Objects/typeobject.c:972

    In my trace, visit_decref() is called on a frame and fail with an assertion error.

    In cx_Oracle trace, visit_decref() is called on a NULL pointer which comes from an Exception.

    In my experience, it's a bug in cx_Oracle. If you think that I'm wrong and that it's a bug in Python, please open a *new* issue since the trace is different.

    --

    For your cx_Oracle, issue:

    #1 0x000000000048193a in BaseException_traverse (self=0x7ffff0f645f8, visit=0x43ab64 <visit_decref>, arg=0x0) at Objects/exceptions.c:97

    Can you go this frame (in gdb, type "frame 1") and dump the Exception object:

    (gdb) print _PyObject_Dump(0x000000000048193a)

    Note for myself: I should write a documentation explaining how to debug Python in gdb.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 26, 2014

    In cx_Oracle trace, visit_decref() is called on a NULL pointer which comes from an Exception.

    Unless C conventions changed, 0xb00000000 is not a NULL pointer :-)

    @vstinner
    Copy link
    Member Author

    Unless C conventions changed, 0xb00000000 is not a NULL pointer :-)

    Ooops, I missed the B :-)

    By the way, my gdb example is wrong: you should pass self :-)

    #1 0x000000000048193a in BaseException_traverse (self=0x7ffff0f645f8, visit=0x43ab64 <visit_decref>, arg=0x0) at Objects/exceptions.c:97

    => (gdb) print _PyObject_Dump(0x7ffff0f645f8)

    @vstinner
    Copy link
    Member Author

    Right now, threading_shutdown_interrupted.py does crash on the master branch (commit 05e4a29). I reintroduced the bug with:

    commit 9ad58ac
    Author: Victor Stinner <vstinner@python.org>
    Date: Mon Mar 9 23:37:49 2020 +0100

    bpo-19466: Py_Finalize() clears daemon threads earlier (GH-18848)
    
    Clear the frames of daemon threads earlier during the Python shutdown to
    call objects destructors. So "unclosed file" resource warnings are now
    emitted for daemon threads in a more reliable way.
    

    I investigated this bug with Pablo Galindo Salgado. When the bug triggers, there are 2 strong references to a Frame object:

    • Traceback object
    • Generator object

    The bug occurs in "except ValueError as exc:" of EvilThread.coroutine().

    The exception contains a traceback object which has a strong reference to the frame.

    The frame object contains "exc" variable (the exception) which contains a reference to the tracecback (exc.__traceback__) which contains a refererence to the frame (tb.tb_frame): there is a reference cycle. (It's not really an issue by itself).

    There are 2 strong references to the Frame object.

    Py_FinalizeEx() calls _PyThreadState_DeleteExcept() which doesn't clear the reference cycle. But PyThreadState_Clear() calls Py_CLEAR(tstate->frame) which reduces the Frame reference counter from 2 to 1, even if there are still 2 strong references to it (traceback and generator).

    tstate->frame is only set at 3 places:

    • new_threadstate(): when a Python thread state is created (it's initialized to NULL)
    • _PyEval_EvalFrameDefault() entry: set to frame
    • _PyEval_EvalFrameDefault() exit: set to frame->f_back (which can be NULL

    By the way, _PyEval_EvalFrameDefault() has a bug: tstate->frame is not reset if _PyCode_InitOpcache() fails: bpo-40048.

    Py_FinalizeEx() calls _PyRuntimeState_SetFinalizing(runtime, tstate) and so threads which tries to take the GIL must exit immediately. Extract of take_gil():
    ---

    static inline int
    tstate_must_exit(PyThreadState *tstate)
    {
        PyThreadState *finalizing = _PyRuntimeState_GetFinalizing(&_PyRuntime);
        return (finalizing != NULL && finalizing != tstate);
    }
    
    static void
    take_gil(PyThreadState *tstate)
    {
        ...
    
        if (tstate_must_exit(tstate)) {
            PyThread_exit_thread();
        }
        ...
    }

    The thread is exited in the middle of _PyEval_EvalFrameDefault() by take_gil() which calls PyThread_exit_thread() because the thread must exit (because the main thread destroyed its Python thread state and so accessing it would crash the thread).

    _PyThreadState_DeleteExcept() calls PyThreadState_Clear() which calls Py_CLEAR(tstate->frame): this case only occurs because the thread exited in the middle of _PyEval_EvalFrameDefault(). In the normal case, _PyEval_EvalFrameDefault() always resets tstate->frame to its previous value.

    Py_CLEAR(tstate->frame) is wrong: it's a *borrowed* reference.

    Pablo and me agree with PyThreadState_Clear() must not call Py_CLEAR(tstate->frame): it's borrowed reference, not a strong reference.

    I'm working on a PR to fix the issue.

    @vstinner vstinner reopened this Mar 23, 2020
    @ned-deily
    Copy link
    Member

    (This issue previously had "release blocker" priority before it was closed. Now that it is re-opened does it still need that? I'm assuming not.)

    @ned-deily ned-deily added 3.9 only security fixes and removed release-blocker labels Mar 23, 2020
    @vstinner
    Copy link
    Member Author

    (This issue previously had "release blocker" priority before it was closed. Now that it is re-opened does it still need that? I'm assuming not.)

    Techicanilly, it's a 3.9 regression but I don't think that it should be hold 3.9 release. So no, it's not a blocker issue.

    @vstinner
    Copy link
    Member Author

    New changeset 5804f87 by Victor Stinner in branch 'master':
    bpo-20526: Fix PyThreadState_Clear(): don't decref frame (GH-19120)
    5804f87

    @vstinner
    Copy link
    Member Author

    New changeset e97c8b0 by Victor Stinner in branch '3.8':
    bpo-20526: Fix PyThreadState_Clear(): don't decref frame (GH-19120) (GH-19136)
    e97c8b0

    @vstinner
    Copy link
    Member Author

    New changeset d1c0989 by Victor Stinner in branch '3.7':
    bpo-20526: Fix PyThreadState_Clear(): don't decref frame (GH-19120) (GH-19136) (GH-19137)
    d1c0989

    @vstinner
    Copy link
    Member Author

    I ran threading_shutdown_interrupted.py 10 times on the master branch (at commit 9b8e74c): it does no longer crash.

    I consider that the root issue is now fixed, so I close again the bug.

    Thanks Pablo for the helpful debugging session!

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.7 (EOL) end of life 3.8 only security fixes labels Mar 24, 2020
    @vstinner
    Copy link
    Member Author

    Just in case, I also ran asyncio_gc.py 10x times on the master branch. I just replaced asyncio.async with asyncio.ensure_future. In short, I consider that the bug is now fixed.

    I interrupted the test two times with CTRL+c. 9 runs were stopped correctly. 1 run failed with:

    Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads

    But this is a different issue, unrelated to "Modules/gcmodule.c:379: visit_decref: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed". It's just that the test uses print() in threads, whereas using print() during Python finalization is not reliable. Using os.write() or avoiding print() would avoid the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants