Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnterNonRecursiveMutex on win32 can hang for 49.7 days: use GetTickCount64() rather than GetTickCount() #84028

Closed
bobince mannequin opened this issue Mar 4, 2020 · 7 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@bobince
Copy link
Mannequin

bobince mannequin commented Mar 4, 2020

BPO 39847
Nosy @pfmoore, @vstinner, @tjguk, @bobince, @zware, @zooba, @miss-islington, @sparrowt
PRs
  • bpo-39847: win32: don't over-wait for mutex after tickcount overflow #18780
  • [3.8] bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780) #18945
  • [3.7] bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780) #18959
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-12.14:29:14.812>
    created_at = <Date 2020-03-04.13:20:26.946>
    labels = ['type-bug', '3.8', '3.9', '3.7', 'library', 'OS-windows']
    title = 'EnterNonRecursiveMutex on win32 can hang for 49.7 days: use GetTickCount64() rather than GetTickCount()'
    updated_at = <Date 2020-03-12.14:29:14.811>
    user = 'https://github.com/bobince'

    bugs.python.org fields:

    activity = <Date 2020-03-12.14:29:14.811>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-12.14:29:14.812>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2020-03-04.13:20:26.946>
    creator = 'aclover'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39847
    keywords = ['patch']
    message_count = 7.0
    messages = ['363346', '363347', '363351', '363986', '363989', '364022', '364023']
    nosy_count = 9.0
    nosy_names = ['paul.moore', 'vstinner', 'tim.golden', 'aclover', 'zach.ware', 'steve.dower', 'miss-islington', 'Day Barr', 'sparrowt']
    pr_nums = ['18780', '18945', '18959']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39847'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @bobince
    Copy link
    Mannequin Author

    bobince mannequin commented Mar 4, 2020

    Since bpo-15038, waiting to acquire locks/events/etc from _thread/threading on Windows can fail to return long past the requested timeout. Cause:

    https://github.com/python/cpython/blob/3.8/Python/thread_nt.h#L85

    using 32-bit GetTickCount/DWORD, which will overflow at around 49.7 days of uptime.

    If the WaitForSingleObjectEx call in PyCOND_TIMEDWAIT returns later than the 'target' time, and the tick count overflows in that gap, 'milliseconds' will become very large (up to another 49.7 days) and the next PyCOND_TIMEDWAIT will be stuck for a long time.

    Where we've seen it is where it's most likely to happen: when the machine is hibernated during the WaitForSingleObjectEx call. I believe the TickCount continues to increase during hibernation so there is a much bigger gap between 'target' and 'now' for the overflow to happen in.

    Simplest fix is probably to switch to GetTickCount64/ULONGLONG. We should be able to get away with using this now we no longer support WinXP.

    @bobince bobince mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Mar 4, 2020
    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2020

    time.monotonic() is now always implemented with GetTickCount64() on Windows. Previously, there was a fallback on GetTickCount() when GetTickCount64() was not available. GetTickCount() call has been removed when we dropped support for old Windows versions.

    Do you want to work on a fix? (PR)

    @vstinner vstinner changed the title EnterNonRecursiveMutex on win32 can hang for 49.7 days EnterNonRecursiveMutex on win32 can hang for 49.7 days: use GetTickCount64() rather than GetTickCount() Mar 4, 2020
    @vstinner vstinner changed the title EnterNonRecursiveMutex on win32 can hang for 49.7 days EnterNonRecursiveMutex on win32 can hang for 49.7 days: use GetTickCount64() rather than GetTickCount() Mar 4, 2020
    @bobince
    Copy link
    Mannequin Author

    bobince mannequin commented Mar 4, 2020

    Yep, should be straightforward to fix (though not to test, as fifty-day test cases tend to be frowned upon...)

    @vstinner
    Copy link
    Member

    New changeset 64838ce by bobince in branch 'master':
    bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780)
    64838ce

    @miss-islington
    Copy link
    Contributor

    New changeset 60b1b5a by Miss Islington (bot) in branch '3.8':
    bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780)
    60b1b5a

    @miss-islington
    Copy link
    Contributor

    New changeset feaf0c3 by bobince in branch '3.7':
    [3.7] bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780) (GH-18959)
    feaf0c3

    @vstinner
    Copy link
    Member

    Thanks, it's now fixed in 3.7, 3.8 and master branches.

    Python 3.5 and 3.6 don't get bugfixes anymore: https://devguide.python.org/#status-of-python-branches

    @vstinner vstinner added the 3.9 only security fixes label Mar 12, 2020
    @vstinner vstinner added the 3.9 only security fixes label Mar 12, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants