classification
Title: Windows hangs in call to CRT setlocale()
Type: Stage: resolved
Components: Interpreter Core, Windows Versions: Python 3.6, Python 3.3, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Decorater, eryksun, jkloth, paul.moore, python-dev, steve.dower, terry.reedy, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2016-03-23 15:44 by jkloth, last changed 2016-08-07 21:03 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
dd_vs_community_20160702172149.log terry.reedy, 2016-07-02 22:06
Messages (34)
msg262276 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-03-23 15:44
My Windows BuildBot (http://buildbot.python.org/all/buildslaves/kloth-win64) is hanging in calls to the CRT function setlocale() as determined by attaching to the hung test process in Visual Studio.

This has been happening occasionally (every tenth+ build) for quite some time, but not to the frequency as of late (every fourth build or so).

I would debug further, however my Visual Studio debugging-fu is not up to this challenge it seems.

Pertinent details of the buildbot:
- Windows 7 SP1 64-bit on a quad-core processor
- Running buildbot as a service
- Tests run using '-j4'
- SSD used for storage

Order of the tests do not seem to matter as re-running the test suite using the random seed that hung will pass when run from a console on the buildbot.

Locale-using functions in the CRT have internally locking which is what I believe is the cause of the deadlock, but cannot determine exactly as this is where my debugging knowledge runs out.

Help to diagnose further is greatly appreciated!
msg262277 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-23 15:51
Have you actually seen this occurring back to 3.3? I'd expect something like this to either be pre-3.5 or post-3.5, but not both.
msg262278 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-03-23 16:00
Oh, yes.  It even occasionally happens on 2.7.  The oldest occurrence I  can dig up is May/June of 2013 (for 2.7).
msg262279 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-23 16:13
The bug occurs in test_strptime and test__locale according to Jeremy.

> - Tests run using '-j4'

With this option, each test file is run in a new fresh process, so it's more likely a race condition which depends on the system load (on exact timing) than the interaction with other tests running in parallel.
msg262285 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-23 16:41
Random links from Google:

* http://stackoverflow.com/questions/35572792/setlocale-stuck-on-windows "Sometimes it works but sometimes it never returns from it. I can not identify the reason. I use Visual Studio 2015 and Windows 7."

* https://bugs.chromium.org/p/webrtc/issues/detail?id=5507 " Summary: Dr Memory errors in MSVCP140D.dll when switching to VS2015"
msg262287 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-03-23 16:57
From the UCRT sources:

    // Deadlock Avoidance:  When a new thread is created in the process, we
    // create a new PTD for the thread.  The PTD initialization function is
    // called under the loader lock.  This initialization function will also
    // acquire the locale lock in order to acquire a reference to the current
    // global locale for the new thread.
    //
    // Some of the locale APIs are not available on all supported target OSes.
    // We dynamically obtain these libraries via LoadLibrary/GetProcAddress.
    // We must ensure that no call to LoadLibrary is made while we hold the
    // locale lock, lest we deadlock due to lock order inversion between the
    // loader lock and the locale lock.
msg262360 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-24 15:33
If the comment is in the sources, presumably it's also handled correctly. But it does seem like there's potential here for simultaneous thread creation and first-time calls into OS locale functions to cause a deadlock.

I'll email the CRT team and see what they know about this.
msg262361 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-24 15:34
> I'll email the CRT team and see what they know about this.

Thanks. It helps to have a Microsoft employee working on CPython :-)
msg262364 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-24 16:29
It's been fixed, just figuring out when the update will be available, or how to get it if it already is.

Jeremy - how up to date is your build machine? It's likely this fix will come through Windows Update for most users, but since you're running debug builds I think you'll need the latest Visual Studio updates to get it.
msg262381 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-03-24 22:32
The lastest set of updates were installed on 1/28.  Visual Studio does have Update 1.  I'm doing another round of updates now, although nothing related to VC 2015 or Visual Studio, however.
msg262462 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-25 23:28
The issue #25289 has marked as a duplicate of this issue.
msg262463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-25 23:29
> It's been fixed, just figuring out when the update will be available, or how to get it if it already is.

Can you elaborate? What was the bug? How was it fixed? Is it a bug in Windows, in Visual Studio or in the C library?
msg262484 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-26 05:24
Locking inversion I think, I don't have the bug details handy.

The fix went out over Windows Update recently, but that's only for the release version of the runtime. The fixed debug version should have been installed with VS 2015 Update 1.

However, since it's still occurring on Jeremy's machine, it may not actually be the same bug. Checking the file version of ucrtbased.dll will help, but I can't verify the exact certain until Monday.
msg262603 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-29 11:06
Jeremy: Did you update your buildbot? It looks like the deadlock occurred again:
-------------
http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/7478/steps/test/logs/stdio

(...)
0:14:50 [398/400] test_concurrent_futures (101 sec) -- running: test_strptime (40 sec)
0:14:52 [399/400] test_ssl -- running: test_strptime (42 sec)

command timed out: 1200 seconds without output, attempting to kill
program finished with exit code 1
elapsedTime=2096.738000
-------------

By the way, I don't understand why the faulthandler watch dog thread is unable to kill the main process and dump a traceback. I recently added a "faulthandler.dump_traceback_later" call Lib/test/libregrtest/runtest_mp.py. It uses a timeout of 15 min, it looks like buildbot kills the process after a timeout of 20 minutes "without output".
msg262611 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-29 14:41
The deadlock is between a CRT thread initialization lock and Windows's process-wide module initialization lock. So if anything is being loaded on demand by the watchdog thread, it's probably locked too (and that could be within the CRT or the OS, so not easy to identify).

I'm still keen to get some version numbers off the build machine, mainly ucrtbased.dll. I'd add code to the build process to just dump them, but I can't get to that for at least a week right now, so if Jeremy can provide it we can rule in/out some possibilities.
msg262630 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-29 21:43
> The deadlock is between a CRT thread initialization lock and Windows's process-wide module initialization lock. So if anything is being loaded on demand by the watchdog thread, it's probably locked too (and that could be within the CRT or the OS, so not easy to identify).

Basically, on timeout, the faulthandler thread uses the write() syscall and reads memory in the Python process. It's designed to be super safe. The code only uses async-signal safe functions, since the same code is used in signal handlers.

https://hg.python.org/cpython/file/688c6b16228e/Modules/faulthandler.c#l540

But the watchdog is trigerred by PyThread_acquire_lock_timed() which indirectly calls WaitForSingleObjectEx() on Windows. In expect that the wait times out before the lock is released on this buildbot.

Maybe Windows provides a safer function to implement a watchdog? Basically, the required function is a wait that can be interrupted.
msg262665 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-03-30 13:48
I would have responded sooner, but add me to the list of those not receiving tracker notifications...

My buildbot is running Windows 7 with all updates (last checked 3/29) and VS2015 Update 1.

ucrtbase:      10.0.10586.9     (11/19/2015)
ucrtbased:     10.0.10240.16384 (07/09/2015)
vcruntime140d: 14.00.23506.0    (11/05/2015)

Interestingly enough, my Win10 development machine has:
ucrtbase:      10.0.14295.1000  (03/19/2016)
ucrtbased:     10.0.10586.15    (11/20/2015)
vcruntime140d: 14.0.23506.0     (11/05/2015)
msg262669 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-30 14:35
So you've still got the old ucrtbased.dll on the buildbot, which is why we're still seeing the issue. Because it's a different OS I'm not surprised to see a difference between ucrtbase.dll instances.

There should be an installer for the debug runtime somewhere inside your VS install directory (under vc/redist). If you take the one from your dev machine and run it on the buildbot we should be good.
msg262672 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-03-30 16:41
After attempting to find the installer for ucrtbased.dll on my dev machine and reading various blog posts on the Visual C++ Team Blog and hunting through several KB articles, I've come across what I think is needed for an up-to-date Universal CRT.

When installing (or in this case modifying) VS2015, the Windows 10 SDK *MUST* also be installed.  It is unhelpfully located in the setup menu under "Windows and Web Development" (good) -> "Universal Windows App Development Tools" (not so good) -> "Tools (1.2) and Windows 10 SDK (10.0.10586)".  As Python is not an UWP app, I thought nothing of this submenu when installing VS2015 initially.

With the SDK installed, my ucrtbased.dll is at version 10.0.10586.15.  Now we wait and see if the deadlocking still happens.
msg262674 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-03-30 17:15
Ergh... that's awkward (but it explains why some of my own machines are missing bits).

Victor, I'll get back to you about the watchdog once I'm at an actual PC. This week I'm at a conference.
msg263286 - (view) Author: Jeremy Kloth (jkloth) * Date: 2016-04-12 23:57
It seems that the updated UCRT debug runtime has indeed solved the issue.

I suggest that this issue remains open pending an update to the devguide for required settings for installing VS2015 with the updated runtime (see msg262672).  I have no idea if the VC Build Tools 2015 installs the needed files, so that would need to be checked on a clean machine (all of mine have VS2015 installed).
msg263290 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-13 00:36
Is it possible to log a compilation warning in Python if VS version
contains the bug? Or even make the compilation fails?
msg263293 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-04-13 01:38
That's a good idea. I don't know that it's trivial to do, but at build is going to be the easiest time. I'll take a look.
msg263299 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-13 03:12
New changeset 605fde022b15 by Steve Dower in branch '3.5':
Closes #26624: Adds validation of ucrtbase[d].dll version with warning for old versions.
https://hg.python.org/cpython/rev/605fde022b15

New changeset 6ff020df61b8 by Steve Dower in branch 'default':
Closes #26624: Adds validation of ucrtbase[d].dll version with warning for old versions.
https://hg.python.org/cpython/rev/6ff020df61b8
msg263300 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-13 03:15
New changeset a352f7e96f85 by Steve Dower in branch 'default':
Issue #26624: Adds recommendation to install updates of VS 2015.
https://hg.python.org/devguide/rev/a352f7e96f85
msg269741 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-02 22:06
I have recent gotten
  WARN: ucrtbased contains known issues. Please update Visual Studio or the Windows SDK.
  See: http://bugs.python.org/issue26624
F:\Python\dev\36\PCbuild\python.vcxproj(91,5): warning MSB3073: The command ""F:\Python\dev\36\PCBuild\win32\python_d.exe" "F:\Python\dev\36\PC\validate_ucrtbase.py" ucrtbased" exited with code 1.

but Python still ran for me.  Today, I recompiled, and python stopped running when I try to either run IDLE or patchcheck.  To eliminate the old .dll as a cause, I tried to 'update'. How to do so?

I started VS by double clicking pcbuild/python.vcxproj.  The first barrier was that Community 2015 is only good for 30 days (the compiler invoked by build.bat still ran) and I could not access the menu.  Once I got the license renewed, so VS ran, there is no Update entry on Help or anywhere else on the menu that I could find.  (I seem to remember that VC Express updated automatically.)

When I started VS2015 from the start menu instead, there was a splash screen with News and a link to a brand new Update 3.
https://blogs.msdn.microsoft.com/visualstudio/2016/06/27/visual-studio-2015-update-3-and-net-core-1-0-available-now/

I clicked the first link and downloaded vs2015.3.exe. ran it, and left the minimal selection of components alone.  After 3 GB downloaded and installed, there are error messages that *cannot* be copied and pasted.  Here is a summary:
1. VS update prereq: Installer failed. bad hash 0x80091007
(did this mess up everything?)
2. Team J. Serv CTP! SToryboarding Pack file not found 0x80070002
(This does not seem like something I requested.)
3. VC++IDE base resource package ditto as to error.
(Perhaps not needed for build.bat)
Log file attached.  I will restart and see what happens.
msg269742 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-02 22:25
After the 'update', the compiler ran and re-linked all the targets.  The warning about the old .dll remains.  Perhaps I should remove and re-install?

The crash also remain.  I'll backdate the repository to mid-June when I successfully compiled and ran and see if I can find a particular commit causing the problem.
msg269743 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-02 23:19
The problem was the last patch, for #26765.  In spite of the faulty .dll, the test suite ran OK.
msg270694 - (view) Author: Decorater (Decorater) * Date: 2016-07-18 04:48
Cant someone just copy these dlls from Windows 10 itself if they have it installed?

ucrtbase:      10.0.14295.1000  (03/19/2016)
ucrtbased:     10.0.10586.15    (11/20/2015)
vcruntime140d: 14.0.23506.0     (11/05/2015)

I have windows 10 on another partition btw so I could manually update mine to fix this stuff on my windows 7. thank god for 64 bit OSes.
msg270695 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-18 04:54
You should get these files by installing the latest Windows 10 SDK (regardless of your OS - the SDK version relates to the latest it can target, not the latest it can be used on).

For some reason, they decided it shouldn't be "automatic" to update the SDK when you're updating other parts of the compiler toolset. I'll bring it up next time I chat to those guys at work.
msg270699 - (view) Author: Decorater (Decorater) * Date: 2016-07-18 05:37
Also can you ask them why sometimes the IDE takes too long to respond AKA freezes / crashes randomly?
msg270701 - (view) Author: Decorater (Decorater) * Date: 2016-07-18 05:41
not to mention build processes seems to be taking a lot of CPU recently but older C/C++ compilers do not EG the ones from like 2008 or so running faster than the same stuff being compiled againt 2010~2015 ones.
msg270749 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-18 12:59
Lot of different people involved in those areas, so can't promise an answer, but I'll see.

Also, unless you think a closed issue needs to be reopened, please avoid using them for chat.
msg272133 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-08-07 21:03
I updated ucrtbased.dll and resolved the warning as described in #27705.
History
Date User Action Args
2016-08-07 21:03:51terry.reedysetmessages: + msg272133
2016-07-18 12:59:55steve.dowersetmessages: + msg270749
2016-07-18 05:41:51Decoratersetmessages: + msg270701
2016-07-18 05:37:10Decoratersetmessages: + msg270699
2016-07-18 04:54:08steve.dowersetmessages: + msg270695
2016-07-18 04:48:30Decoratersetnosy: + Decorater
messages: + msg270694
2016-07-02 23:19:15terry.reedysetmessages: + msg269743
2016-07-02 22:25:53terry.reedysetmessages: + msg269742
2016-07-02 22:06:53terry.reedysetfiles: + dd_vs_community_20160702172149.log
nosy: + terry.reedy
messages: + msg269741

2016-04-13 03:15:34python-devsetmessages: + msg263300
2016-04-13 03:12:48python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg263299

resolution: fixed
stage: resolved
2016-04-13 01:38:31steve.dowersetmessages: + msg263293
2016-04-13 00:36:05vstinnersetmessages: + msg263290
2016-04-12 23:57:01jklothsetmessages: + msg263286
2016-03-30 17:15:00steve.dowersetmessages: + msg262674
2016-03-30 16:41:51jklothsetmessages: + msg262672
2016-03-30 14:35:14steve.dowersetmessages: + msg262669
2016-03-30 13:48:17jklothsetmessages: + msg262665
2016-03-29 21:43:10vstinnersetmessages: + msg262630
2016-03-29 14:41:03steve.dowersetmessages: + msg262611
2016-03-29 11:06:25vstinnersetmessages: + msg262603
2016-03-26 05:24:42steve.dowersetmessages: + msg262484
2016-03-25 23:29:43vstinnersetmessages: + msg262463
2016-03-25 23:28:59vstinnersetmessages: + msg262462
2016-03-25 23:28:43vstinnerlinkissue25289 superseder
2016-03-24 22:32:05jklothsetmessages: + msg262381
2016-03-24 16:29:05steve.dowersetmessages: + msg262364
2016-03-24 15:34:35vstinnersetmessages: + msg262361
2016-03-24 15:33:19steve.dowersetmessages: + msg262360
2016-03-23 16:57:20jklothsetmessages: + msg262287
2016-03-23 16:41:45vstinnersetmessages: + msg262285
2016-03-23 16:13:39vstinnersetmessages: + msg262279
2016-03-23 16:02:01vstinnersetnosy: + vstinner
2016-03-23 16:00:57jklothsetmessages: + msg262278
2016-03-23 15:51:16steve.dowersetmessages: + msg262277
2016-03-23 15:44:31jklothcreate