classification
Title: EnterNonRecursiveMutex on win32 can hang for 49.7 days: use GetTickCount64() rather than GetTickCount()
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Day Barr, aclover, miss-islington, paul.moore, sparrowt, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2020-03-04 13:20 by aclover, last changed 2020-03-12 14:29 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18780 merged aclover, 2020-03-04 16:40
PR 18945 merged miss-islington, 2020-03-11 23:39
PR 18959 merged aclover, 2020-03-12 14:08
Messages (7)
msg363346 - (view) Author: And Clover (aclover) * Date: 2020-03-04 13:20
Since bpo-15038, waiting to acquire locks/events/etc from _thread/threading on Windows can fail to return long past the requested timeout. Cause:

https://github.com/python/cpython/blob/3.8/Python/thread_nt.h#L85

using 32-bit GetTickCount/DWORD, which will overflow at around 49.7 days of uptime.

If the WaitForSingleObjectEx call in PyCOND_TIMEDWAIT returns later than the 'target' time, and the tick count overflows in that gap, 'milliseconds' will become very large (up to another 49.7 days) and the next PyCOND_TIMEDWAIT will be stuck for a long time.

Where we've seen it is where it's most likely to happen: when the machine is hibernated during the WaitForSingleObjectEx call. I believe the TickCount continues to increase during hibernation so there is a much bigger gap between 'target' and 'now' for the overflow to happen in.

Simplest fix is probably to switch to GetTickCount64/ULONGLONG. We should be able to get away with using this now we no longer support WinXP.
msg363347 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-04 13:22
time.monotonic() is now always implemented with GetTickCount64() on Windows. Previously, there was a fallback on GetTickCount() when GetTickCount64() was not available. GetTickCount() call has been removed when we dropped support for old Windows versions.

Do you want to work on a fix? (PR)
msg363351 - (view) Author: And Clover (aclover) * Date: 2020-03-04 14:23
Yep, should be straightforward to fix (though not to test, as fifty-day test cases tend to be frowned upon...)
msg363986 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-11 23:39
New changeset 64838ce7172c7a92183b39b22504b433a33a884d by bobince in branch 'master':
bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780)
https://github.com/python/cpython/commit/64838ce7172c7a92183b39b22504b433a33a884d
msg363989 - (view) Author: miss-islington (miss-islington) Date: 2020-03-11 23:57
New changeset 60b1b5ac56fe6099a3d358dc9d6cd6ec72fce2d8 by Miss Islington (bot) in branch '3.8':
bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780)
https://github.com/python/cpython/commit/60b1b5ac56fe6099a3d358dc9d6cd6ec72fce2d8
msg364022 - (view) Author: miss-islington (miss-islington) Date: 2020-03-12 14:28
New changeset feaf0c37891dfe8f0f3e643c3711af3af23bf805 by bobince in branch '3.7':
[3.7] bpo-39847: EnterNonRecursiveMutex() uses GetTickCount64() (GH-18780) (GH-18959)
https://github.com/python/cpython/commit/feaf0c37891dfe8f0f3e643c3711af3af23bf805
msg364023 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-12 14:29
Thanks, it's now fixed in 3.7, 3.8 and master branches.

Python 3.5 and 3.6 don't get bugfixes anymore: https://devguide.python.org/#status-of-python-branches
History
Date User Action Args
2020-03-12 14:29:14vstinnersetstatus: open -> closed
versions: + Python 3.9, - Python 3.5, Python 3.6
messages: + msg364023

resolution: fixed
stage: patch review -> resolved
2020-03-12 14:28:38miss-islingtonsetmessages: + msg364022
2020-03-12 14:08:14acloversetpull_requests: + pull_request18309
2020-03-11 23:57:21miss-islingtonsetmessages: + msg363989
2020-03-11 23:39:52miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18299
2020-03-11 23:39:07vstinnersetmessages: + msg363986
2020-03-04 16:40:29acloversetkeywords: + patch
stage: patch review
pull_requests: + pull_request18136
2020-03-04 14:23:57acloversetmessages: + msg363351
2020-03-04 14:01:31sparrowtsetnosy: + sparrowt
2020-03-04 13:38:51Day Barrsetnosy: + Day Barr
2020-03-04 13:23:12vstinnersettitle: EnterNonRecursiveMutex on win32 can hang for 49.7 days -> EnterNonRecursiveMutex on win32 can hang for 49.7 days: use GetTickCount64() rather than GetTickCount()
2020-03-04 13:22:57vstinnersetnosy: + vstinner
messages: + msg363347
2020-03-04 13:20:26aclovercreate