classification
Title: test_ctypes.test_load_dll_with_flags Windows failure
Type: behavior Stage: resolved
Components: ctypes, Tests, Windows Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aeros, eryksun, miss-islington, paul.moore, steve.dower, tim.golden, xtreak, zach.ware
Priority: normal Keywords: patch

Created on 2020-04-07 03:27 by aeros, last changed 2020-05-22 22:24 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19404 merged zach.ware, 2020-04-07 06:07
PR 19413 closed zach.ware, 2020-04-07 16:30
PR 19449 merged miss-islington, 2020-04-09 12:13
PR 19500 closed steve.dower, 2020-04-13 11:39
PR 19652 merged steve.dower, 2020-04-22 10:43
PR 19656 merged miss-islington, 2020-04-22 16:05
Messages (27)
msg365887 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-07 03:27
In several recent PRs, test_ctypes.test_load_dll_with_flags is failing for the Azure Pipelines "Windows PR tests win32" and "Windows PR tests win64" with the following error message:

```
======================================================================
ERROR: test_load_dll_with_flags (ctypes.test.test_loading.LoaderTest) [WinDLL('_sqlite3.dll', winmode=0)]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\a\1\s\lib\ctypes\test\test_loading.py", line 140, in should_pass
    subprocess.check_output(
  File "d:\a\1\s\lib\subprocess.py", line 419, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "d:\a\1\s\lib\subprocess.py", line 533, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['d:\\a\\1\\s\\PCbuild\\win32\\python.exe', '-c', "from ctypes import *; import nt;WinDLL('_sqlite3.dll', winmode=0)"]' returned non-zero exit status 1.

======================================================================
ERROR: test_load_dll_with_flags (ctypes.test.test_loading.LoaderTest) [WinDLL('_sqlite3.dll', winmode=0)]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\a\1\s\lib\ctypes\test\test_loading.py", line 140, in should_pass
    subprocess.check_output(
  File "d:\a\1\s\lib\subprocess.py", line 419, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "d:\a\1\s\lib\subprocess.py", line 533, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['d:\\a\\1\\s\\PCbuild\\win32\\python.exe', '-c', "from ctypes import *; import nt;WinDLL('_sqlite3.dll', winmode=0)"]' returned non-zero exit status 1.
```

Since this only started occurring recently in several unrelated PRs, I suspect it was most likely an intermittent regression introduced in master. Here are the PRs I have seen the same exact error occur in so far:

https://github.com/python/cpython/pull/18239
https://github.com/python/cpython/pull/19403
https://github.com/python/cpython/pull/19402
https://github.com/python/cpython/pull/19399

I was unable to reproduce it locally on my secondary boot of Windows 10.
msg365890 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-07 06:09
My best guess at the moment is that something changed in the underlying Windows image, particularly given the comment above the failing line that "insecure load flags should succeed" (though that could easily be a red herring).  Until someone who actually does understand what's going on here (Steve? :)) can deal with it properly, I have a PR to disable the problematic part of the test that I'll merge shortly.
msg365892 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-07 06:40
New changeset f407e209c1e35b64835f73e7e7ca23e33817e9fe by Zachary Ware in branch 'master':
bpo-40214: Temporarily disable a ctypes test (GH-19404)
https://github.com/python/cpython/commit/f407e209c1e35b64835f73e7e7ca23e33817e9fe
msg365900 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-07 10:04
Zachary Ware wrote:
> My best guess at the moment is that something changed in the underlying Windows image, particularly given the comment above the failing line that "insecure load flags should succeed" (though that could easily be a red herring).

Also, now that I think about it, the the test failure has only specifically occurred on Azure Pipelines and not on the GitHub Actions Windows PR tests (as far as I have seen), which does further increase the likelihood that it's an issue with the Windows image being used in Azure for the PR tests.

Regardless of the cause though, thanks for applying a temporary fix/skip in the meantime, these types of issues can be troublesome for CPython's workflow when it occurs on required CI checks. Hopefully Steve Dower or someone similarly knowledgeable in the area can definitively figure out the cause.
msg365902 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-07 10:29
Maybe something else was installed that put an incompatible _sqlite3.dll on PATH? Thereby proving the inherent risk of using unsafe DLL load settings :)

If we're not already overriding PATH for the subprocess in this test (should only need "%SystemRoot%;%SystemRoot%\System32"), we may have to. Though if someone is installing an incompatible _sqlite3.dll into a system directory then we're in a bit of trouble.

Disabling this part of the test as inherently untestable because it's an inherently unsafe operation is fine by me :)
msg365910 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-07 15:42
> Maybe something else was installed that put an incompatible 
> _sqlite3.dll on PATH? Thereby proving the inherent risk of 
> using unsafe DLL load settings 

should_pass() sets the working directory of the test process to the `tmp` directory. The loader normally checks the working directory before PATH. But maybe the system itself is configured to disallow loading DLLs from the working directory. There's a registry setting for that, but it's little known and rarely used because it's disruptive in general.
msg365919 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-07 17:03
Assuming I implemented my checks correctly (see PR19413), I think I've just debunked both of our leading theories.

Results here: https://dev.azure.com/Python/cpython/_build/results?buildId=60764&view=logs&j=d554cd63-f8f4-5b2d-871b-33e4ea76e915&t=5a14d0eb-dbd4-5b80-f5d0-7909f950a1cc
msg365953 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-07 22:14
It's probably the sqlite3.dll dependency that's failing, not the _sqlite3.dll (originally _sqlite3.pyd) one.

The test is verifying that dependent DLLs are loaded correctly. So I assume in this case the _sqlite3.dll is being loaded, but it's finding the wrong sqlite3.dll.

Since I could, I updated Zach's PR to check for other sqlite3.dll files. The new build should be at https://dev.azure.com/Python/cpython/_build/results?buildId=60785&view=logs&j=c83831cd-3752-5cc7-2f01-8276919eb334 in a few minutes.
msg365954 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-07 22:16
It might also be a "sqlite3_d.dll". Updated build is https://dev.azure.com/Python/cpython/_build/results?buildId=60787&view=results
msg365955 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-07 22:34
Found it (and it's kind-of us):

Checking C:\Program Files\Amazon\AWSCLIV2\
*** Found at C:\Program Files\Amazon\AWSCLIV2\sqlite3.dll
*** Found at C:\Program Files\Amazon\AWSCLIV2\_sqlite3.pyd

But I'm not sure why that is getting loaded earlier than the current directory. Is that the behaviour we went for here?

(FWIW, we build and test a release build, not a debug build, which is why we're not looking for sqlite3_d.dll... but perhaps we should be using a debug build? That might be slower, but the extra validation is probably worthwhile...)
msg365970 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-08 04:05
Steve Dower wrote:
> (FWIW, we build and test a release build, not a debug build, which is why we're not looking for sqlite3_d.dll... but perhaps we should be using a debug build? That might be slower, but the extra validation is probably worthwhile...)

In general, I think a debug build instead of a release would make more sense for the PR tests. Do you have a general estimate or rough idea as to how much slower it would be in comparison?
msg365971 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-04-08 04:28
Is this going to be backported? It seems backports also use the same build and have this error. Sample 3.8 backport build that seems to be related to this issue : https://dev.azure.com/Python/cpython/_build/results?buildId=60753&view=logs&j=d554cd63-f8f4-5b2d-871b-33e4ea76e915&t=5a14d0eb-dbd4-5b80-f5d0-7909f950a1cc&l=1570
msg365993 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-08 16:14
Feel free to backport PR19404 as needed, but mark versions here appropriately to make sure the *real* fix makes it where it needs to go.
msg365999 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-08 18:31
> Do you have a general estimate or rough idea as to how much slower it would be in comparison?

It's one sample point, but compare https://buildbot.python.org/all/#/builders/129/builds/708 to https://github.com/python/cpython/runs/571497886

Compile time: 3:32 (release) -> 1:10 (debug)
Test time: 12:28 (release) -> 15:31 (debug)

Though the test timing vary wildly, as some tests cause more contention than others and they run in a random order.
msg366003 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-08 20:03
Steve Dower wrote:
> It's one sample point, but compare https://buildbot.python.org/all/#/builders/129/builds/708 to https://github.com/python/cpython/runs/571497886

FWIW, I'd be +1 in favor for using the debug build then. A few additional minutes would be well worth having more thorough PR tests IMO. If it becomes an issue, we can always revert it back, but it seems that there's little to no harm in trying it out.
msg366044 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-09 12:27
Python 3.8 is also affected. Example:
https://github.com/python/cpython/pull/19444#issuecomment-611499825

I backported the change to skip the test until it's fixed: PR 19449.
msg366045 - (view) Author: miss-islington (miss-islington) Date: 2020-04-09 12:31
New changeset c83f003ee5398e9c27a0c634329420003d607d46 by Miss Islington (bot) in branch '3.8':
bpo-40214: Temporarily disable a ctypes test (GH-19404)
https://github.com/python/cpython/commit/c83f003ee5398e9c27a0c634329420003d607d46
msg366307 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-13 12:02
> But I'm not sure why that is getting loaded earlier than the 
> current directory. Is that the behaviour we went for here?

I don't understand what's going on here if %PATH% is interfering. The current directory (%__CD__%) should be checked before %PATH% with the standard DLL search order for desktop applcations. That's what I've observed in practice and how it's documented [1]:

    SafeDllSearchMode enabled:

    1. %__APPDIR__%
    2. %SystemRoot%\System32
    3. %SystemRoot%\System
    4. %SystemRoot%
    5. %__CD__%
    6. %PATH%

    SafeDllSearchMode disabled:

    1. %__APPDIR__%
    2. %__CD__%
    3. %SystemRoot%\System32
    4. %SystemRoot%\System
    5. %SystemRoot%
    6. %PATH%

[1]: https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order#search-order-for-desktop-applications
msg366316 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-13 14:47
> I don't understand what's going on here if %PATH% is interfering.

Yeah, as shown by the fact my fix didn't fix it :)

I'm going to try adding some more diagnostics around this to find out exactly where it's failing. I suspect it's a dependency, rather than the direct file.
msg366319 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-13 17:41
So I collected more info, and it seems the search order we're getting for _sqlite3.dll (.pyd) is:

* app directory
* SQL server install directory (?)
* Windows/System32
* Windows
* %PATH% (which is truncated at 2190 chars)

And then we never look in CWD. I wonder if it's being excluded because it's under %TEMP%?

I can't reproduce it either with a long PATH or with it in TEMP, so there may be a policy setting involved. Trying again with the test not using TEMP (it shouldn't be using the real TEMP anyway...)
msg366320 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-13 18:07
Okay, I've wasted enough time on this. Let's just disable the test permanently and say that the winmode=0 behaviour is system-specific and is not tested.
msg366322 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-13 18:44
> it seems the search order we're getting for _sqlite3.dll (.pyd) is:
>
> * app directory
> * SQL server install directory (?)
> * Windows/System32
> * Windows
> * %PATH% (which is truncated at 2190 chars)

Thank you. It finally makes sense. The parent process is calling SetDllDirectoryW [1], which also replaces the current working directory in the DLL search path of child processes (inherited via the PEB ProcessParameters->DllPath) [2]: 

    Note: The standard search order of the process will also be 
    affected by calling the SetDllDirectory function in the parent
    process before start of the current process.

In that case, all we have to do is restore the original search path by calling SetDllDirectoryW(NULL). For example:

    import ctypes
    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
    if not kernel32.SetDllDirectoryW(None):
        raise ctypes.WinError(ctypes.get_last_error())


[1]: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setdlldirectoryw
[2]: https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order#alternate-search-order-for-desktop-applications
msg366324 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-13 19:02
You mean the CI agent is doing it? Because there's nowhere in Python that should be doing it.

I'd rather not blanket change this option in the test suite (except to turn it on :) ), so that code snippet probably needs to shrink down into a one-liner that can go with the "-c" just for these tests.
msg366327 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-13 19:35
> You mean the CI agent is doing it? Because there's nowhere in 
> Python that should be doing it.

The ProcessParameters->DllPath value is inherited until a process in the tree reverts to the original search path via SetDllDirectoryW(NULL), so it can be any ancestor process, not just the immediate parent process.

> I'd rather not blanket change this option in the test suite (except to 
> turn it on :) ), so that code snippet probably needs to shrink down 
> into a one-liner that can go with the "-c" just for these tests.

The test could call GetDllDirectoryW [1] to save and restore the previous value. Unfortunately, GetDllDirectoryW returns 0 for both the default case (NULL) and when the working directory is disabled by setting an empty string (""). It would have to assume one or the other.

Otherwise if you just want a one-liner: "import ctypes; ctypes.windll.kernel32.SetDllDirectoryW(None)".

[1]: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getdlldirectoryw
msg367013 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-22 16:04
New changeset 9b498939009f49b8c772c89e8fc80efbfd8afcb5 by Steve Dower in branch 'master':
bpo-40214: Fix ctypes WinDLL test with insecure flags (GH-19652)
https://github.com/python/cpython/commit/9b498939009f49b8c772c89e8fc80efbfd8afcb5
msg367014 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-22 16:05
Looks like that fix has done it. Thanks, Eryk!
msg369661 - (view) Author: miss-islington (miss-islington) Date: 2020-05-22 22:24
New changeset 0cc7becde0bfe896fd23b5cb14fedfb8f2066fca by Miss Islington (bot) in branch '3.8':
bpo-40214: Fix ctypes WinDLL test with insecure flags (GH-19652)
https://github.com/python/cpython/commit/0cc7becde0bfe896fd23b5cb14fedfb8f2066fca
History
Date User Action Args
2020-05-22 22:24:49miss-islingtonsetmessages: + msg369661
2020-04-22 16:05:54steve.dowersetstatus: open -> closed
resolution: fixed
messages: + msg367014

stage: patch review -> resolved
2020-04-22 16:05:00miss-islingtonsetpull_requests: + pull_request18982
2020-04-22 16:04:53steve.dowersetmessages: + msg367013
2020-04-22 10:43:38steve.dowersetpull_requests: + pull_request18978
2020-04-13 22:30:35vstinnersetnosy: - vstinner
2020-04-13 19:35:07eryksunsetmessages: + msg366327
2020-04-13 19:02:14steve.dowersetmessages: + msg366324
2020-04-13 18:44:55eryksunsetmessages: + msg366322
2020-04-13 18:07:12steve.dowersetmessages: + msg366320
2020-04-13 17:41:34steve.dowersetmessages: + msg366319
2020-04-13 14:47:03steve.dowersetmessages: + msg366316
2020-04-13 12:02:10eryksunsetmessages: + msg366307
2020-04-13 11:39:24steve.dowersetpull_requests: + pull_request18851
2020-04-09 12:31:56miss-islingtonsetmessages: + msg366045
2020-04-09 12:27:15vstinnersetnosy: + vstinner

messages: + msg366044
versions: + Python 3.8
2020-04-09 12:13:27miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request18804
stage: needs patch -> patch review
2020-04-08 20:03:27aerossetmessages: + msg366003
2020-04-08 18:31:23steve.dowersetmessages: + msg365999
2020-04-08 16:14:02zach.waresetmessages: + msg365993
2020-04-08 04:28:54xtreaksetnosy: + xtreak
messages: + msg365971
2020-04-08 04:05:49aerossetmessages: + msg365970
2020-04-07 22:34:32steve.dowersetmessages: + msg365955
2020-04-07 22:16:57steve.dowersetmessages: + msg365954
2020-04-07 22:14:47steve.dowersetmessages: + msg365953
2020-04-07 17:03:29zach.waresetmessages: + msg365919
stage: patch review -> needs patch
2020-04-07 16:30:03zach.waresetstage: needs patch -> patch review
pull_requests: + pull_request18773
2020-04-07 15:42:56eryksunsetnosy: + eryksun
messages: + msg365910
2020-04-07 10:29:24steve.dowersetmessages: + msg365902
2020-04-07 10:04:51aerossetmessages: + msg365900
2020-04-07 06:40:23zach.waresetstage: patch review -> needs patch
2020-04-07 06:40:04zach.waresetmessages: + msg365892
2020-04-07 06:09:53zach.waresettype: behavior
stage: patch review
2020-04-07 06:09:28zach.waresetnosy: + paul.moore, tim.golden, steve.dower
messages: + msg365890

components: + Tests, Windows, ctypes, - Library (Lib)
keywords: - 3.9regression
stage: patch review -> (no value)
2020-04-07 06:07:37zach.waresetkeywords: + patch
nosy: + zach.ware

pull_requests: + pull_request18768
stage: patch review
2020-04-07 03:27:46aeroscreate