This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tp_version_tag is not unique when test runs with -R :
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, kj, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2021-08-14 15:17 by kj, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 27773 merged Mark.Shannon, 2021-08-15 16:01
PR 27774 merged kj, 2021-08-16 12:33
Messages (11)
msg399594 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-08-14 15:17
tp_version_tag is supposed to be unique for different class objects. Under normal circumstances, everything works properly:

def good():
    class C:
        def __init__(self):  # Just to force `tp_version_tag` to update
            pass
    cls_id = hex(id(C))
    tp_version_tag_before = C.v   # v is tp_version_tag of C, exposed to Python
    x = C()                       # tp_new requires a _PyType_Lookup for `__init__`, updating `tp_version_tag`
    tp_version_tag_after = C.v
    print(f'C ID: {cls_id}, before: {tp_version_tag_before} after: {tp_version_tag_after}')

for _ in range(100):
    good()

Result:
C ID: 0x2920c2d58d0, before: 0 after: 115
C ID: 0x2920c2d6170, before: 0 after: 116
C ID: 0x2920c2d65c0, before: 0 after: 117
C ID: 0x2920c8f2800, before: 0 after: 118
C ID: 0x2920c8f7150, before: 0 after: 119
C ID: 0x2920c8f6010, before: 0 after: 120
C ID: 0x2920c8f6460, before: 0 after: 121
C ID: 0x2920c8f3d90, before: 0 after: 122
C ID: 0x2920c8f0e20, before: 0 after: 123
C ID: 0x2920c8f41e0, before: 0 after: 124
C ID: 0x2920c8f4a80, before: 0 after: 125
C ID: 0x2920c8f1270, before: 0 after: 126
C ID: 0x2920c8f16c0, before: 0 after: 127
C ID: 0x2920c8f34f0, before: 0 after: 128
C ID: 0x2920c8f5770, before: 0 after: 129
C ID: 0x2920c8f30a0, before: 0 after: 130
...

However, wrapping in a unittest and run under -R : suddenly changes things:

class BadTest(unittest.TestCase):
    def test_bad(self):
        class C:
            def __init__(self):
                pass

        cls_id = hex(id(C))
        tp_version_tag_before = C.v
        x = C()
        tp_version_tag_after = C.v
        print(f'C ID: {cls_id}, before: {tp_version_tag_before} after: {tp_version_tag_after}')

Result:
"python_d.exe" -m test test_bad -R 10:10
C ID: 0x1c4c59354b0, before: 0 after: 78
.C ID: 0x1c4c59372e0, before: 0 after: 82
.C ID: 0x1c4c5934370, before: 0 after: 82
.C ID: 0x1c4c5934370, before: 0 after: 82
.C ID: 0x1c4c5933680, before: 0 after: 82
.C ID: 0x1c4c5938cc0, before: 0 after: 82
.C ID: 0x1c4c59354b0, before: 0 after: 82
.C ID: 0x1c4c5935900, before: 0 after: 82
.C ID: 0x1c4c5933680, before: 0 after: 82
.C ID: 0x1c4c59354b0, before: 0 after: 82
.C ID: 0x1c4c59354b0, before: 0 after: 82
.C ID: 0x1c4c59361a0, before: 0 after: 82
.C ID: 0x1c4c5933680, before: 0 after: 82
.C ID: 0x1c4c5931400, before: 0 after: 82
.C ID: 0x1c4c5938cc0, before: 0 after: 82
.C ID: 0x1c4c5938cc0, before: 0 after: 82
.C ID: 0x1c4c5933680, before: 0 after: 82
.C ID: 0x1c4c5936a40, before: 0 after: 82
.C ID: 0x1c4c5931400, before: 0 after: 82
.C ID: 0x1c4c5935900, before: 0 after: 82

Somehow the class is occasionally occupying the same address, and tp_version_tag didn't update properly. tp_version_tag being unique is an important invariant required for LOAD_ATTR and LOAD_METHOD specialization. I bumped into this problem after LOAD_METHOD specialization kept failing magically in test_descr.

I think this is related to issue43636 and issue43452, but I ran out of time to bisect after spending a day chasing this down. I'll try to bisect soon.
msg399616 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-08-15 12:30
Thanks for putting in the effort to find this.

I think the first step to fixing this is to formalize the semantics of `tp_version_tag`. Initially it was designed just for the method cache, but we have started using it as a unique identifier for the state of a class.
The two different uses have different requirements when the global version counter overflows. Possibly in other scenarios as well.
msg399618 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-08-15 14:26
@Mark, agreed about properly defining the semantics.

For the current bug: I've narrowed it down to a corner case in how the -R tests work:

1. On each run cleanup, libregrtest will clear the type cache https://github.com/python/cpython/blob/6f84656dc188191225c8d5cdfd2a00de663988d0/Lib/test/libregrtest/refleak.py#L170

2. This calls PyType_ClearCache, which sets the global next_version_tag counter to 0 https://github.com/python/cpython/blob/cee67fa66129b5d1db5c8aa3884338f82f0da3de/Objects/typeobject.c#L292.

3. On next run, since runs and type cache is deterministic, we end up with exact same tp_version_tag for the type.

4. This repeats every run until crash.

One possible fix: we can tell sys._clear_type_cache to not reset next_version_tag to 0 (it can use another function, since PyType_ClearCache is part of public C API and shouldn't be modified).

This seems like a bug in libregrtest, the type method cache and tp_version_tag aren't at fault for this specific scenario. I'm submitting a PR soon.
msg399623 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-08-15 16:04
PyType_ClearCache() is documented as:

    Clear the internal lookup cache. Return the current version tag.

Modifying it to do what it is documented to do fixes this bug :)
msg399651 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-08-16 11:21
New changeset 1a511dc92dd10ee8fc2e5da9f52f795924bdc89a by Mark Shannon in branch 'main':
bpo-44914: Maintain invariants of type version tags. (GH-27773)
https://github.com/python/cpython/commit/1a511dc92dd10ee8fc2e5da9f52f795924bdc89a
msg399671 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-08-16 19:19
New changeset d84d2c4985457733602d46dc4ee77290da4e9539 by Ken Jin in branch 'main':
bpo-44914: Add tests for some invariants of tp_version_tag (GH-27774)
https://github.com/python/cpython/commit/d84d2c4985457733602d46dc4ee77290da4e9539
msg399723 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-08-17 09:37
Thanks Mark and Victor for the patch and reviews!

This issue also affects 3.10, but IMO we don't need to backport this as we don't have other code relying on this invariant in that version, so it won't affect anything there. If you feel otherwise, please let me know.

The fix and tests have landed, I'm closing this issue.
msg399726 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-17 09:42
The opcode cache for load attribute in 3.10 uses the tp_version_tag so unless I am missing something we can fall into the same problem, no?
msg399727 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-08-17 09:48
@Pablo yup the 3.10 opcache used tp_version_tag. But it also does identity (pointer) comparison of type/class object https://github.com/python/cpython/blob/3.10/Python/ceval.c#L3432. Which is why it doesn't fail.

I created this issue because we don't do type pointer comparisons anymore in 3.11's new specialization infrastructure, and rely only on tp_version_tag and dict keys version.
msg399735 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-17 10:15
> But it also does identity (pointer) comparison of type/class object

Good point! Even if I wrote that code I forgot about all the guards that were needed for correctness. I now remember all the weird cases we discovered there :)
msg399906 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-08-19 10:51
I just realised I'm slightly wrong about identity checks -- there is a very very small chance where if the type object occupies the same address and the attribute is in a dynamically allocated __dict__ (and not some static slot), we can trick the checks and a segfault may occur for LOAD_ATTR using the old opcache in 3.10.

I've tried running tests for 30 repetitions, but I cannot get the segfault to appear. @Pablo I'll leave it up to you if you feel it's backport worthy. I don't think people use sys._clear_type_cache or PyType_ClearCache normally so the chances of this causing any crash is astronomically small.
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89077
2021-08-19 10:51:17kjsetmessages: + msg399906
2021-08-17 10:15:00pablogsalsetmessages: + msg399735
2021-08-17 09:48:57kjsetmessages: + msg399727
2021-08-17 09:42:30pablogsalsetmessages: + msg399726
2021-08-17 09:37:10kjsetstatus: open -> closed
resolution: fixed
messages: + msg399723

stage: patch review -> resolved
2021-08-16 19:19:05Mark.Shannonsetmessages: + msg399671
2021-08-16 12:33:17kjsetpull_requests: + pull_request26251
2021-08-16 11:21:54Mark.Shannonsetmessages: + msg399651
2021-08-16 11:02:20Mark.Shannonlinkissue44240 superseder
2021-08-15 16:04:47Mark.Shannonsetmessages: + msg399623
2021-08-15 16:01:08Mark.Shannonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request26247
2021-08-15 14:26:05kjsetmessages: + msg399618
2021-08-15 12:30:05Mark.Shannonsetmessages: + msg399616
2021-08-14 15:17:09kjcreate