Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tp_version_tag is not unique when test runs with -R : #89077

Closed
Fidget-Spinner opened this issue Aug 14, 2021 · 11 comments
Closed

tp_version_tag is not unique when test runs with -R : #89077

Fidget-Spinner opened this issue Aug 14, 2021 · 11 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@Fidget-Spinner
Copy link
Member

BPO 44914
Nosy @vstinner, @markshannon, @pablogsal, @Fidget-Spinner
PRs
  • bpo-44914: Maintain invariants of type version tags. #27773
  • bpo-44914: Add tests for some invariants of tp_version_tag #27774
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-17.09:37:10.047>
    created_at = <Date 2021-08-14.15:17:09.715>
    labels = ['interpreter-core', '3.11']
    title = 'tp_version_tag is not unique when test runs with -R :'
    updated_at = <Date 2021-08-19.10:51:17.049>
    user = 'https://github.com/Fidget-Spinner'

    bugs.python.org fields:

    activity = <Date 2021-08-19.10:51:17.049>
    actor = 'kj'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-08-17.09:37:10.047>
    closer = 'kj'
    components = ['Interpreter Core']
    creation = <Date 2021-08-14.15:17:09.715>
    creator = 'kj'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44914
    keywords = ['patch']
    message_count = 11.0
    messages = ['399594', '399616', '399618', '399623', '399651', '399671', '399723', '399726', '399727', '399735', '399906']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'Mark.Shannon', 'pablogsal', 'kj']
    pr_nums = ['27773', '27774']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue44914'
    versions = ['Python 3.11']

    @Fidget-Spinner
    Copy link
    Member Author

    tp_version_tag is supposed to be unique for different class objects. Under normal circumstances, everything works properly:

    def good():
        class C:
            def __init__(self):  # Just to force `tp_version_tag` to update
                pass
        cls_id = hex(id(C))
        tp_version_tag_before = C.v   # v is tp_version_tag of C, exposed to Python
        x = C()                       # tp_new requires a _PyType_Lookup for `__init__`, updating `tp_version_tag`
        tp_version_tag_after = C.v
        print(f'C ID: {cls_id}, before: {tp_version_tag_before} after: {tp_version_tag_after}')
    
    for _ in range(100):
        good()

    Result:
    C ID: 0x2920c2d58d0, before: 0 after: 115
    C ID: 0x2920c2d6170, before: 0 after: 116
    C ID: 0x2920c2d65c0, before: 0 after: 117
    C ID: 0x2920c8f2800, before: 0 after: 118
    C ID: 0x2920c8f7150, before: 0 after: 119
    C ID: 0x2920c8f6010, before: 0 after: 120
    C ID: 0x2920c8f6460, before: 0 after: 121
    C ID: 0x2920c8f3d90, before: 0 after: 122
    C ID: 0x2920c8f0e20, before: 0 after: 123
    C ID: 0x2920c8f41e0, before: 0 after: 124
    C ID: 0x2920c8f4a80, before: 0 after: 125
    C ID: 0x2920c8f1270, before: 0 after: 126
    C ID: 0x2920c8f16c0, before: 0 after: 127
    C ID: 0x2920c8f34f0, before: 0 after: 128
    C ID: 0x2920c8f5770, before: 0 after: 129
    C ID: 0x2920c8f30a0, before: 0 after: 130
    ...

    However, wrapping in a unittest and run under -R : suddenly changes things:

    class BadTest(unittest.TestCase):
        def test_bad(self):
            class C:
                def __init__(self):
                    pass
    
            cls_id = hex(id(C))
            tp_version_tag_before = C.v
            x = C()
            tp_version_tag_after = C.v
            print(f'C ID: {cls_id}, before: {tp_version_tag_before} after: {tp_version_tag_after}')

    Result:
    "python_d.exe" -m test test_bad -R 10:10
    C ID: 0x1c4c59354b0, before: 0 after: 78
    .C ID: 0x1c4c59372e0, before: 0 after: 82
    .C ID: 0x1c4c5934370, before: 0 after: 82
    .C ID: 0x1c4c5934370, before: 0 after: 82
    .C ID: 0x1c4c5933680, before: 0 after: 82
    .C ID: 0x1c4c5938cc0, before: 0 after: 82
    .C ID: 0x1c4c59354b0, before: 0 after: 82
    .C ID: 0x1c4c5935900, before: 0 after: 82
    .C ID: 0x1c4c5933680, before: 0 after: 82
    .C ID: 0x1c4c59354b0, before: 0 after: 82
    .C ID: 0x1c4c59354b0, before: 0 after: 82
    .C ID: 0x1c4c59361a0, before: 0 after: 82
    .C ID: 0x1c4c5933680, before: 0 after: 82
    .C ID: 0x1c4c5931400, before: 0 after: 82
    .C ID: 0x1c4c5938cc0, before: 0 after: 82
    .C ID: 0x1c4c5938cc0, before: 0 after: 82
    .C ID: 0x1c4c5933680, before: 0 after: 82
    .C ID: 0x1c4c5936a40, before: 0 after: 82
    .C ID: 0x1c4c5931400, before: 0 after: 82
    .C ID: 0x1c4c5935900, before: 0 after: 82

    Somehow the class is occasionally occupying the same address, and tp_version_tag didn't update properly. tp_version_tag being unique is an important invariant required for LOAD_ATTR and LOAD_METHOD specialization. I bumped into this problem after LOAD_METHOD specialization kept failing magically in test_descr.

    I think this is related to bpo-43636 and bpo-43452, but I ran out of time to bisect after spending a day chasing this down. I'll try to bisect soon.

    @Fidget-Spinner Fidget-Spinner added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Aug 14, 2021
    @markshannon
    Copy link
    Member

    Thanks for putting in the effort to find this.

    I think the first step to fixing this is to formalize the semantics of tp_version_tag. Initially it was designed just for the method cache, but we have started using it as a unique identifier for the state of a class.
    The two different uses have different requirements when the global version counter overflows. Possibly in other scenarios as well.

    @Fidget-Spinner
    Copy link
    Member Author

    @mark, agreed about properly defining the semantics.

    For the current bug: I've narrowed it down to a corner case in how the -R tests work:

    1. On each run cleanup, libregrtest will clear the type cache

      sys._clear_type_cache()

    2. This calls PyType_ClearCache, which sets the global next_version_tag counter to 0

      next_version_tag = 0;
      .

    3. On next run, since runs and type cache is deterministic, we end up with exact same tp_version_tag for the type.

    4. This repeats every run until crash.

    One possible fix: we can tell sys._clear_type_cache to not reset next_version_tag to 0 (it can use another function, since PyType_ClearCache is part of public C API and shouldn't be modified).

    This seems like a bug in libregrtest, the type method cache and tp_version_tag aren't at fault for this specific scenario. I'm submitting a PR soon.

    @markshannon
    Copy link
    Member

    PyType_ClearCache() is documented as:

    Clear the internal lookup cache. Return the current version tag.
    

    Modifying it to do what it is documented to do fixes this bug :)

    @markshannon
    Copy link
    Member

    New changeset 1a511dc by Mark Shannon in branch 'main':
    bpo-44914: Maintain invariants of type version tags. (GH-27773)
    1a511dc

    @markshannon
    Copy link
    Member

    New changeset d84d2c4 by Ken Jin in branch 'main':
    bpo-44914: Add tests for some invariants of tp_version_tag (GH-27774)
    d84d2c4

    @Fidget-Spinner
    Copy link
    Member Author

    Thanks Mark and Victor for the patch and reviews!

    This issue also affects 3.10, but IMO we don't need to backport this as we don't have other code relying on this invariant in that version, so it won't affect anything there. If you feel otherwise, please let me know.

    The fix and tests have landed, I'm closing this issue.

    @pablogsal
    Copy link
    Member

    The opcode cache for load attribute in 3.10 uses the tp_version_tag so unless I am missing something we can fall into the same problem, no?

    @Fidget-Spinner
    Copy link
    Member Author

    @pablo yup the 3.10 opcache used tp_version_tag. But it also does identity (pointer) comparison of type/class object https://github.com/python/cpython/blob/3.10/Python/ceval.c#L3432. Which is why it doesn't fail.

    I created this issue because we don't do type pointer comparisons anymore in 3.11's new specialization infrastructure, and rely only on tp_version_tag and dict keys version.

    @pablogsal
    Copy link
    Member

    But it also does identity (pointer) comparison of type/class object

    Good point! Even if I wrote that code I forgot about all the guards that were needed for correctness. I now remember all the weird cases we discovered there :)

    @Fidget-Spinner
    Copy link
    Member Author

    I just realised I'm slightly wrong about identity checks -- there is a very very small chance where if the type object occupies the same address and the attribute is in a dynamically allocated __dict__ (and not some static slot), we can trick the checks and a segfault may occur for LOAD_ATTR using the old opcache in 3.10.

    I've tried running tests for 30 repetitions, but I cannot get the segfault to appear. @pablo I'll leave it up to you if you feel it's backport worthy. I don't think people use sys._clear_type_cache or PyType_ClearCache normally so the chances of this causing any crash is astronomically small.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants