Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows hangs in call to CRT setlocale() #70811

Closed
jkloth opened this issue Mar 23, 2016 · 34 comments
Closed

Windows hangs in call to CRT setlocale() #70811

jkloth opened this issue Mar 23, 2016 · 34 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows

Comments

@jkloth
Copy link
Contributor

jkloth commented Mar 23, 2016

BPO 26624
Nosy @terryjreedy, @pfmoore, @vstinner, @tjguk, @jkloth, @zware, @eryksun, @zooba, @AraHaan
Files
  • dd_vs_community_20160702172149.log
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-04-13.03:12:48.408>
    created_at = <Date 2016-03-23.15:44:31.063>
    labels = ['interpreter-core', 'OS-windows']
    title = 'Windows hangs in call to CRT setlocale()'
    updated_at = <Date 2016-08-07.21:03:51.265>
    user = 'https://github.com/jkloth'

    bugs.python.org fields:

    activity = <Date 2016-08-07.21:03:51.265>
    actor = 'terry.reedy'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-04-13.03:12:48.408>
    closer = 'python-dev'
    components = ['Interpreter Core', 'Windows']
    creation = <Date 2016-03-23.15:44:31.063>
    creator = 'jkloth'
    dependencies = []
    files = ['43615']
    hgrepos = []
    issue_num = 26624
    keywords = []
    message_count = 34.0
    messages = ['262276', '262277', '262278', '262279', '262285', '262287', '262360', '262361', '262364', '262381', '262462', '262463', '262484', '262603', '262611', '262630', '262665', '262669', '262672', '262674', '263286', '263290', '263293', '263299', '263300', '269741', '269742', '269743', '270694', '270695', '270699', '270701', '270749', '272133']
    nosy_count = 10.0
    nosy_names = ['terry.reedy', 'paul.moore', 'vstinner', 'tim.golden', 'jkloth', 'python-dev', 'zach.ware', 'eryksun', 'steve.dower', 'Decorater']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue26624'
    versions = ['Python 3.3', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Mar 23, 2016

    My Windows BuildBot (http://buildbot.python.org/all/buildslaves/kloth-win64) is hanging in calls to the CRT function setlocale() as determined by attaching to the hung test process in Visual Studio.

    This has been happening occasionally (every tenth+ build) for quite some time, but not to the frequency as of late (every fourth build or so).

    I would debug further, however my Visual Studio debugging-fu is not up to this challenge it seems.

    Pertinent details of the buildbot:

    • Windows 7 SP1 64-bit on a quad-core processor
    • Running buildbot as a service
    • Tests run using '-j4'
    • SSD used for storage

    Order of the tests do not seem to matter as re-running the test suite using the random seed that hung will pass when run from a console on the buildbot.

    Locale-using functions in the CRT have internally locking which is what I believe is the cause of the deadlock, but cannot determine exactly as this is where my debugging knowledge runs out.

    Help to diagnose further is greatly appreciated!

    @jkloth jkloth added interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows labels Mar 23, 2016
    @zooba
    Copy link
    Member

    zooba commented Mar 23, 2016

    Have you actually seen this occurring back to 3.3? I'd expect something like this to either be pre-3.5 or post-3.5, but not both.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Mar 23, 2016

    Oh, yes. It even occasionally happens on 2.7. The oldest occurrence I can dig up is May/June of 2013 (for 2.7).

    @vstinner
    Copy link
    Member

    The bug occurs in test_strptime and test__locale according to Jeremy.

    • Tests run using '-j4'

    With this option, each test file is run in a new fresh process, so it's more likely a race condition which depends on the system load (on exact timing) than the interaction with other tests running in parallel.

    @vstinner
    Copy link
    Member

    Random links from Google:

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Mar 23, 2016

    From the UCRT sources:

    // Deadlock Avoidance:  When a new thread is created in the process, we
    // create a new PTD for the thread.  The PTD initialization function is
    // called under the loader lock.  This initialization function will also
    // acquire the locale lock in order to acquire a reference to the current
    // global locale for the new thread.
    //
    // Some of the locale APIs are not available on all supported target OSes.
    // We dynamically obtain these libraries via LoadLibrary/GetProcAddress.
    // We must ensure that no call to LoadLibrary is made while we hold the
    // locale lock, lest we deadlock due to lock order inversion between the
    // loader lock and the locale lock.
    

    @zooba
    Copy link
    Member

    zooba commented Mar 24, 2016

    If the comment is in the sources, presumably it's also handled correctly. But it does seem like there's potential here for simultaneous thread creation and first-time calls into OS locale functions to cause a deadlock.

    I'll email the CRT team and see what they know about this.

    @vstinner
    Copy link
    Member

    I'll email the CRT team and see what they know about this.

    Thanks. It helps to have a Microsoft employee working on CPython :-)

    @zooba
    Copy link
    Member

    zooba commented Mar 24, 2016

    It's been fixed, just figuring out when the update will be available, or how to get it if it already is.

    Jeremy - how up to date is your build machine? It's likely this fix will come through Windows Update for most users, but since you're running debug builds I think you'll need the latest Visual Studio updates to get it.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Mar 24, 2016

    The lastest set of updates were installed on 1/28. Visual Studio does have Update 1. I'm doing another round of updates now, although nothing related to VC 2015 or Visual Studio, however.

    @vstinner
    Copy link
    Member

    The issue bpo-25289 has marked as a duplicate of this issue.

    @vstinner
    Copy link
    Member

    It's been fixed, just figuring out when the update will be available, or how to get it if it already is.

    Can you elaborate? What was the bug? How was it fixed? Is it a bug in Windows, in Visual Studio or in the C library?

    @zooba
    Copy link
    Member

    zooba commented Mar 26, 2016

    Locking inversion I think, I don't have the bug details handy.

    The fix went out over Windows Update recently, but that's only for the release version of the runtime. The fixed debug version should have been installed with VS 2015 Update 1.

    However, since it's still occurring on Jeremy's machine, it may not actually be the same bug. Checking the file version of ucrtbased.dll will help, but I can't verify the exact certain until Monday.

    @vstinner
    Copy link
    Member

    Jeremy: Did you update your buildbot? It looks like the deadlock occurred again:
    -------------
    http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/7478/steps/test/logs/stdio

    (...)
    0:14:50 [398/400] test_concurrent_futures (101 sec) -- running: test_strptime (40 sec)
    0:14:52 [399/400] test_ssl -- running: test_strptime (42 sec)

    command timed out: 1200 seconds without output, attempting to kill
    program finished with exit code 1
    elapsedTime=2096.738000
    -------------

    By the way, I don't understand why the faulthandler watch dog thread is unable to kill the main process and dump a traceback. I recently added a "faulthandler.dump_traceback_later" call Lib/test/libregrtest/runtest_mp.py. It uses a timeout of 15 min, it looks like buildbot kills the process after a timeout of 20 minutes "without output".

    @zooba
    Copy link
    Member

    zooba commented Mar 29, 2016

    The deadlock is between a CRT thread initialization lock and Windows's process-wide module initialization lock. So if anything is being loaded on demand by the watchdog thread, it's probably locked too (and that could be within the CRT or the OS, so not easy to identify).

    I'm still keen to get some version numbers off the build machine, mainly ucrtbased.dll. I'd add code to the build process to just dump them, but I can't get to that for at least a week right now, so if Jeremy can provide it we can rule in/out some possibilities.

    @vstinner
    Copy link
    Member

    The deadlock is between a CRT thread initialization lock and Windows's process-wide module initialization lock. So if anything is being loaded on demand by the watchdog thread, it's probably locked too (and that could be within the CRT or the OS, so not easy to identify).

    Basically, on timeout, the faulthandler thread uses the write() syscall and reads memory in the Python process. It's designed to be super safe. The code only uses async-signal safe functions, since the same code is used in signal handlers.

    https://hg.python.org/cpython/file/688c6b16228e/Modules/faulthandler.c#l540

    But the watchdog is trigerred by PyThread_acquire_lock_timed() which indirectly calls WaitForSingleObjectEx() on Windows. In expect that the wait times out before the lock is released on this buildbot.

    Maybe Windows provides a safer function to implement a watchdog? Basically, the required function is a wait that can be interrupted.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Mar 30, 2016

    I would have responded sooner, but add me to the list of those not receiving tracker notifications...

    My buildbot is running Windows 7 with all updates (last checked 3/29) and VS2015 Update 1.

    ucrtbase: 10.0.10586.9 (11/19/2015)
    ucrtbased: 10.0.10240.16384 (07/09/2015)
    vcruntime140d: 14.00.23506.0 (11/05/2015)

    Interestingly enough, my Win10 development machine has:
    ucrtbase: 10.0.14295.1000 (03/19/2016)
    ucrtbased: 10.0.10586.15 (11/20/2015)
    vcruntime140d: 14.0.23506.0 (11/05/2015)

    @zooba
    Copy link
    Member

    zooba commented Mar 30, 2016

    So you've still got the old ucrtbased.dll on the buildbot, which is why we're still seeing the issue. Because it's a different OS I'm not surprised to see a difference between ucrtbase.dll instances.

    There should be an installer for the debug runtime somewhere inside your VS install directory (under vc/redist). If you take the one from your dev machine and run it on the buildbot we should be good.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Mar 30, 2016

    After attempting to find the installer for ucrtbased.dll on my dev machine and reading various blog posts on the Visual C++ Team Blog and hunting through several KB articles, I've come across what I think is needed for an up-to-date Universal CRT.

    When installing (or in this case modifying) VS2015, the Windows 10 SDK *MUST* also be installed. It is unhelpfully located in the setup menu under "Windows and Web Development" (good) -> "Universal Windows App Development Tools" (not so good) -> "Tools (1.2) and Windows 10 SDK (10.0.10586)". As Python is not an UWP app, I thought nothing of this submenu when installing VS2015 initially.

    With the SDK installed, my ucrtbased.dll is at version 10.0.10586.15. Now we wait and see if the deadlocking still happens.

    @zooba
    Copy link
    Member

    zooba commented Mar 30, 2016

    Ergh... that's awkward (but it explains why some of my own machines are missing bits).

    Victor, I'll get back to you about the watchdog once I'm at an actual PC. This week I'm at a conference.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Apr 12, 2016

    It seems that the updated UCRT debug runtime has indeed solved the issue.

    I suggest that this issue remains open pending an update to the devguide for required settings for installing VS2015 with the updated runtime (see msg262672). I have no idea if the VC Build Tools 2015 installs the needed files, so that would need to be checked on a clean machine (all of mine have VS2015 installed).

    @vstinner
    Copy link
    Member

    Is it possible to log a compilation warning in Python if VS version
    contains the bug? Or even make the compilation fails?

    @zooba
    Copy link
    Member

    zooba commented Apr 13, 2016

    That's a good idea. I don't know that it's trivial to do, but at build is going to be the easiest time. I'll take a look.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 13, 2016

    New changeset 605fde022b15 by Steve Dower in branch '3.5':
    Closes bpo-26624: Adds validation of ucrtbase[d].dll version with warning for old versions.
    https://hg.python.org/cpython/rev/605fde022b15

    New changeset 6ff020df61b8 by Steve Dower in branch 'default':
    Closes bpo-26624: Adds validation of ucrtbase[d].dll version with warning for old versions.
    https://hg.python.org/cpython/rev/6ff020df61b8

    @python-dev python-dev mannequin closed this as completed Apr 13, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 13, 2016

    New changeset a352f7e96f85 by Steve Dower in branch 'default':
    Issue bpo-26624: Adds recommendation to install updates of VS 2015.
    https://hg.python.org/devguide/rev/a352f7e96f85

    @terryjreedy
    Copy link
    Member

    I have recent gotten
    WARN: ucrtbased contains known issues. Please update Visual Studio or the Windows SDK.
    See: http://bugs.python.org/issue26624
    F:\Python\dev\36\PCbuild\python.vcxproj(91,5): warning MSB3073: The command ""F:\Python\dev\36\PCBuild\win32\python_d.exe" "F:\Python\dev\36\PC\validate_ucrtbase.py" ucrtbased" exited with code 1.

    but Python still ran for me. Today, I recompiled, and python stopped running when I try to either run IDLE or patchcheck. To eliminate the old .dll as a cause, I tried to 'update'. How to do so?

    I started VS by double clicking pcbuild/python.vcxproj. The first barrier was that Community 2015 is only good for 30 days (the compiler invoked by build.bat still ran) and I could not access the menu. Once I got the license renewed, so VS ran, there is no Update entry on Help or anywhere else on the menu that I could find. (I seem to remember that VC Express updated automatically.)

    When I started VS2015 from the start menu instead, there was a splash screen with News and a link to a brand new Update 3.
    https://blogs.msdn.microsoft.com/visualstudio/2016/06/27/visual-studio-2015-update-3-and-net-core-1-0-available-now/

    I clicked the first link and downloaded vs2015.3.exe. ran it, and left the minimal selection of components alone. After 3 GB downloaded and installed, there are error messages that *cannot* be copied and pasted. Here is a summary:

    1. VS update prereq: Installer failed. bad hash 0x80091007
      (did this mess up everything?)
    2. Team J. Serv CTP! SToryboarding Pack file not found 0x80070002
      (This does not seem like something I requested.)
    3. VC++IDE base resource package ditto as to error.
      (Perhaps not needed for build.bat)
      Log file attached. I will restart and see what happens.

    @terryjreedy
    Copy link
    Member

    After the 'update', the compiler ran and re-linked all the targets. The warning about the old .dll remains. Perhaps I should remove and re-install?

    The crash also remain. I'll backdate the repository to mid-June when I successfully compiled and ran and see if I can find a particular commit causing the problem.

    @terryjreedy
    Copy link
    Member

    The problem was the last patch, for bpo-26765. In spite of the faulty .dll, the test suite ran OK.

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Jul 18, 2016

    Cant someone just copy these dlls from Windows 10 itself if they have it installed?

    ucrtbase: 10.0.14295.1000 (03/19/2016)
    ucrtbased: 10.0.10586.15 (11/20/2015)
    vcruntime140d: 14.0.23506.0 (11/05/2015)

    I have windows 10 on another partition btw so I could manually update mine to fix this stuff on my windows 7. thank god for 64 bit OSes.

    @zooba
    Copy link
    Member

    zooba commented Jul 18, 2016

    You should get these files by installing the latest Windows 10 SDK (regardless of your OS - the SDK version relates to the latest it can target, not the latest it can be used on).

    For some reason, they decided it shouldn't be "automatic" to update the SDK when you're updating other parts of the compiler toolset. I'll bring it up next time I chat to those guys at work.

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Jul 18, 2016

    Also can you ask them why sometimes the IDE takes too long to respond AKA freezes / crashes randomly?

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Jul 18, 2016

    not to mention build processes seems to be taking a lot of CPU recently but older C/C++ compilers do not EG the ones from like 2008 or so running faster than the same stuff being compiled againt 2010~2015 ones.

    @zooba
    Copy link
    Member

    zooba commented Jul 18, 2016

    Lot of different people involved in those areas, so can't promise an answer, but I'll see.

    Also, unless you think a closed issue needs to be reopened, please avoid using them for chat.

    @terryjreedy
    Copy link
    Member

    I updated ucrtbased.dll and resolved the warning as described in bpo-27705.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants