Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contextvars: hamt_alloc() must initialize h_root and h_count fields #77984

Closed
vstinner opened this issue Jun 7, 2018 · 12 comments
Closed

contextvars: hamt_alloc() must initialize h_root and h_count fields #77984

vstinner opened this issue Jun 7, 2018 · 12 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vstinner
Copy link
Member

vstinner commented Jun 7, 2018

BPO 33803
Nosy @vstinner, @ned-deily, @1st1
PRs
  • bpo-33803: Fix a crash in hamt.c #7504
  • [3.7] bpo-33803: Fix a crash in hamt.c (GH-7504) #7505
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-08.03:44:16.205>
    created_at = <Date 2018-06-07.23:44:54.469>
    labels = ['interpreter-core', '3.7', '3.8', 'type-crash']
    title = 'contextvars: hamt_alloc() must initialize h_root and h_count fields'
    updated_at = <Date 2019-10-08.12:47:13.364>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-10-08.12:47:13.364>
    actor = 'yselivanov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-08.03:44:16.205>
    closer = 'yselivanov'
    components = ['Interpreter Core']
    creation = <Date 2018-06-07.23:44:54.469>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33803
    keywords = ['patch']
    message_count = 12.0
    messages = ['318990', '318991', '318992', '318994', '318997', '319001', '319013', '319026', '319032', '319033', '354172', '354199']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'ned.deily', 'yselivanov']
    pr_nums = ['7504', '7505']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue33803'
    versions = ['Python 3.7', 'Python 3.8']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 7, 2018

    test_asyncio started to crash in #7487

    I debugged the crash with Yury: _PyHAMT_New() triggers indirectly a GC collection at "o->h_root = hamt_node_bitmap_new(0);". Problem: the object that is being created is already tracked by the GC, whereas its h_root field is not set. Then the GC does crash because the field is a random pointer.

    hamt_alloc() must initialize h_root and h_count before tracking the object in the GC.

    Thinking about the GC when writing an object constructor is hard :-(

    Maybe we should add a debug option to trigger the GC more often to make such random bug more likely. contextvars has been implemented a few months ago, and we only spotted the bug a few days before Python 3.7.0 final release...

    @vstinner vstinner added release-blocker type-crash A hard crash of the interpreter, possibly with a core dump 3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 7, 2018
    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 8, 2018

    One solution to trigger so crash more frequently is to reduce the threshold of the generation 0 of the garbage collector.

    Here is a patch to do that when using -X dev: change the default threshold from 700 to 5 for the generation 0.

    With this patch, "PYTHONDEVMODE=1 python -m test test_context" does also crash as expected.

    diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
    index 958219b744..81218fb964 100644
    --- a/Python/pylifecycle.c
    +++ b/Python/pylifecycle.c
    @@ -622,6 +622,10 @@ _Py_InitializeCore(const _PyCoreConfig *core_config)
             return _Py_INIT_ERR("runtime core already initialized");
         }
     
    +    if (core_config->dev_mode) {
    +        _PyRuntime.gc.generations[0].threshold = 5;
    +    }
    +
         /* Py_Finalize leaves _Py_Finalizing set in order to help daemon
          * threads behave a little more gracefully at interpreter shutdown.
          * We clobber it here so the new interpreter can start with a clean

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 8, 2018

    Don't forget to merge #7487 once this bug is fixed. I would like to see https://bugs.python.org/issue33792 in Python 3.7 *if possible* (the feature now seems "required" for the new asyncio.loop() function).

    @1st1
    Copy link
    Member

    1st1 commented Jun 8, 2018

    Thanks Victor for debugging this. I made a PR (which is now trivial) and double checked all other calls to GCTrack in context.c & hamt.c.

    @1st1
    Copy link
    Member

    1st1 commented Jun 8, 2018

    New changeset 378c53c by Yury Selivanov in branch 'master':
    bpo-33803: Fix a crash in hamt.c (bpo-7504)
    378c53c

    @1st1
    Copy link
    Member

    1st1 commented Jun 8, 2018

    New changeset a971a6f by Yury Selivanov (Miss Islington (bot)) in branch '3.7':
    bpo-33803: Fix a crash in hamt.c (GH-7504) (GH-7505)
    a971a6f

    @1st1
    Copy link
    Member

    1st1 commented Jun 8, 2018

    • if (core_config->dev_mode) {
    •    \_PyRuntime.gc.generations[0].threshold = 5;
      
    • }

    I'd love to have a flag to turn this on, or maybe we should enable it for -X dev.

    @1st1 1st1 closed this as completed Jun 8, 2018
    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 8, 2018

    I'd love to have a flag to turn this on, or maybe we should enable it for -X dev.

    Well, there is already a public API to do it manually: gc.set_threshold(5).

    I'm not sure about a threshold of 5 for -X dev: that's very very low and so kill performances. -X dev can be slower, not but 10x slower for example. I didn't measure performance.

    Such bug is rare, so I'm not sure about putting gc.set_threshold(5) in -X dev.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 8, 2018

    FYI the bug was also seen 8 hours ago on a different asyncio PR:
    #423 (comment)

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 8, 2018

    FYI the bug was also seen 8 hours ago on a different asyncio PR:

    Oops, my message is misleading: the crash is not a regression. I just wanted to notice that a different PR also triggered the crash before the crash has been fixed. I'm just surprised that the bug decided to wake up 4 months after contextvars has been merged. Why yesterday and not previously?

    @ernestruiz ernestruiz mannequin added type-feature A feature request or enhancement and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Aug 30, 2018
    @berkerpeksag berkerpeksag added type-crash A hard crash of the interpreter, possibly with a core dump and removed type-feature A feature request or enhancement labels Aug 30, 2018
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 8, 2019

    Notes for myself.

    In bpo-38392, I modified PyObject_GC_Track() in debug mode to detect this bug.

    For example, this bug can be reproduced with this change:
    ---

    diff --git a/Python/hamt.c b/Python/hamt.c
    index 28b4e1ef6c..d7dd555d15 100644
    --- a/Python/hamt.c
    +++ b/Python/hamt.c
    @@ -2478,8 +2478,6 @@ hamt_alloc(void)
         if (o == NULL) {
             return NULL;
         }
    -    o->h_count = 0;
    -    o->h_root = NULL;
         o->h_weakreflist = NULL;
         PyObject_GC_Track(o);
         return o;

    Python now detects the bug in debug mode:
    ---

    $ ./python -m test -v test_context
    (...)
    test_context_copy_1 (test.test_context.ContextTest) ...
    Modules/gcmodule.c:1931: visit_validate: Assertion failed: PyObject_GC_Track() object is not valid
    Enable tracemalloc to get the memory block allocation traceback

    object address : 0x7f2b17408d70
    object refcount : 1
    object type : 0x76bc20
    object type name: hamt
    object repr : <hamt object at 0x7f2b17408d70>

    Fatal Python error: _PyObject_AssertFailed
    Python runtime state: initialized

    Current thread 0x00007f2b25590740 (most recent call first):
    File "/home/vstinner/python/master/Lib/test/test_context.py", line 322 in test_context_copy_1
    File "/home/vstinner/python/master/Lib/unittest/case.py", line 616 in _callTestMethod
    File "/home/vstinner/python/master/Lib/unittest/case.py", line 659 in run
    (...)
    File "/home/vstinner/python/master/Lib/runpy.py", line 192 in _run_module_as_main
    Aborted (core dumped)
    ---

    @1st1
    Copy link
    Member

    1st1 commented Oct 8, 2019

    In bpo-38392, I modified PyObject_GC_Track() in debug mode to detect this bug.

    Awesome!!!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants