This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add a shared library mechanism for win32
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Ray Donnelly, barry, brett.cannon, eryksun, lemburg, njs, paul.moore, steve.dower, tim.golden, xoviat, zach.ware
Priority: normal Keywords:

Created on 2018-01-07 22:41 by xoviat, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (25)
msg309642 - (view) Author: xoviat (xoviat) Date: 2018-01-07 22:41
On linux and macOS, a mechanism exists to relocate shared libraries inside of a wheel. Auditwheel creates a .libs folder and places the shared libraries inside of it. The problem is that on Windows, the rpath mechanism doesn't exist. We've attempted to ameliorate the situation with NumPy by modifying the DLL search path ourselves. I think this should be done in Python itself.

Here is what I propose: for each folder in site packages that matches the foldername created by auditwheel, specifically:

1. A folder directly inside site-packages that ends with '.libs'
2. A folder two levels under site-packages that is named 'libs'

Python should add these folders to the DLL search path at some point before the matching extensions are imported, so that DLLs located in these paths can be imported by a call to LoadLibrary.

The reason that this should be done in Python is that packages shouldn't be modifying the DLL search path, but that's currently what's required.

The alternative, current, recommendation is to place shared libraries in the same folder as the extension, but this approach fails when the shared library needs to be shared between more than one extension in different subpackages, but in the same distribution package.
msg309644 - (view) Author: xoviat (xoviat) Date: 2018-01-07 22:43
Sorry, that should have read:

2. A folder two levels under site-packages that is named '.libs'

Please consult the auditwheel source to determine the specific pattern if there is doubt.
msg309900 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-13 20:47
How is this an improvement over loading the DLL explicitly? Even if subpackages require the file, your top level __init__.py will run before they are loaded and it can import a pyd that's known to be next to the file or it can LoadLibrary the contents of .libs itself.

Modifying the search path is not entirely robust, as you know, but once a DLL is loaded it doesn't need to be on the path to be found.
msg309908 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-01-14 01:27
> on Windows, the rpath mechanism doesn't exist

It seems we can locate a dependent assembly up to two directories up from a DLL using a relative path. According to MSDN [1], this is supported in Windows 7+. 3.7 no longer supports Vista, so this can potentially be used for extension modules in 3.7. I tested that it works in Windows 10, at least.

[1]: https://msdn.microsoft.com/en-us/library/aa374182

Create a "<name>.2.config" file for the module (e.g. "myextension.pyd.2.config"), and include a "probing" path in this file. This can specify up to 9 relative directories that can be up to two levels above the module. For example, the following adds "..\.libs" to the DLL's private assembly search path:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <configuration>
      <windows>
        <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
          <probing privatePath="..\.libs" />
        </assemblyBinding>
      </windows>
    </configuration>

Add dependent assemblies to the module's #2 embedded manifest. For example, here's a dependency on 64-bit "myassembly" version 1.0.000.1234:

      <dependency>
        <dependentAssembly>
          <assemblyIdentity name="myassembly"
                            version="1.0.000.1234"
                            type="win32"
                            processorArchitecture="amd64" />
        </dependentAssembly>
      </dependency>

I this case the assembly is a directory with the given assembly name that contains an "<assembly_name>.manifest" file (e.g. "myassembly.manifest"). This manifest lists the assembly DLLs that are in the directory. For example:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
        <assemblyIdentity name="myassembly"
                          version="1.0.000.1234"
                          type="win32"
                          processorArchitecture="amd64" />
        <file name="mylib1.dll" />
        <file name="mylib2.dll" />
    </assembly>

The loader will look for the assembly in WinSxS, the module's directory, a subdirectory named for the assembly, and then the private probing paths that were added by the module's config file.
msg309911 - (view) Author: xoviat (xoviat) Date: 2018-01-14 02:42
So the idea here is actually to write a replacement for auditwheel that works on windows. In order to do that, I need to recursively analyze DLL dependencies, randomize the DLL filenames, rewrite the pe header table, and then copy them into the .libs folder.

At that point, I need some mechanism to either preload all of the DLLs or add them to the DLL search path so that extensions can find them (example: the extension will need to load umfpack-gghsa-cp36.dll). If assemblies can accomplish such a task, then I can use them. However the restriction on the number of parent directories is a real problem.

I've benchmarked preloading the DLLs and it's not really ideal.
msg309912 - (view) Author: xoviat (xoviat) Date: 2018-01-14 02:46
My current plan is to patch the __init__ package files to add the '.libs' folder to the search path. However, I think it would be better for Python to do this so that there is a consistent mechanism for loading shared libraries.
msg309938 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-14 22:36
The more details you reveal, the less I want it to be an officially supported pattern :)

Perhaps you actually want an import hook? Brett can confirm, but I believe you can overwrite __loader__ in a package and change how submodules are loaded.
msg309968 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2018-01-15 12:02
From experience with doing something similar in egenix-pyopenssl, I recommend putting the DLLs into the same directory as the PYD file on Windows. If you want to be extra safe, you can explicitly load the DLL, but normally this is not needed.

On Linux and other OSes, it's best to dlopen() to explicitly load the lib, since rpath and OS search paths are not always reliable.
msg310011 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-01-15 19:34
To answer Steve's question, what you would want is a finder which recognized the directory of the package so as to return a special loader just for that package (basically __path__ is sent through the normal import mechanism and so you would want something on sys.path_hooks which knew how to get an appropriate finder which would return the loader you want).
msg310076 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-16 11:02
Putting .dll's next to their .pyd's isn't a general solution, because sometimes you want to be able to use the same .dll's from .pyd's that are in different directories. For example, scipy.spatial.qhull, scipy.optimize._lbfsgb, and scipy.linalg._flinalg are three different extensions in different directories, but they all link against BLAS. And you definitely don't want to include multiple copies of BLAS; the current scipy wheels on Linux are 50 MB, and 40 MB of that is *one* copy of OpenBLAS.

I don't think import hooks are terribly relevant here either. The big problem is how to arrange for the .dlls to be loaded before the .pyds. The obvious hack would be to somehow automatically rewrite the package __init__.py to either mutate PATH or pre-load the .dlls, which is admittedly pretty nasty. But... if you want to install an import hook, then that *also* requires rewriting the package __init__.py to inject the import hook installation code, and then the import hook would just be mutating PATH or pre-loading the .dlls. So adding an import hook to the mix doesn't seem to be buying much.

I guess I can see the argument for @xoviat's original proposal: if lots of packages are going to have weird code injected into their __init__.py just to add the same paths to the DLL search path, then maybe it's simpler all around if this becomes a standard feature of the Python importer. However, even if this were implemented we'd still need to write and maintain the __init__.py injection code for the next ~5 years, so probably the thing to do is to implement the __init__.py approach first and wait to see how it shakes out before considering interpreter changes.
msg310077 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2018-01-16 11:13
Probably better overall to go with a conda package which puts
the DLLs in a central location and manages the dependencies.

You can then load the DLL in the package before loading the PYD
and you're all set. Whether in an __init__.py or elsewhere is
really up to the package.
msg310083 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-16 11:51
Conda is cool but we're not currently planning to abandon wheels.
msg310110 - (view) Author: xoviat (xoviat) Date: 2018-01-16 20:00
As Nathaniel noted, the "solution" of placing DLLs in the same directory as extension modules is not a solution. I also think that some people here misunderstand my proposal: I'm not necessarily proposing that these directories are added using an import hook: they could be added on startup through a scan of site-packages.

"However, even if this were implemented we'd still need to write and maintain the __init__.py injection code for the next ~5 years, so probably the thing to do is to implement the __init__.py approach first and wait to see how it shakes out before considering interpreter changes."

Yes, this approach is already implemented in NumPy and SciPy. I'm also implementing it for other packages as well. However, the principal reason that I'm opening this issue is that Ray complained that packages shouldn't be altering the DLL search path: the only other solution that I can see is to make this documented behavior, only on Windows, and only because the Windows developers (and I'm in no way anti-Windows, but I'm just frustrated with this particular issue) decided to place an arbitrary limit on probingPath.

As far as the complaints about rpath: this is a mechanism used by every single manylinux and actually most OSX wheels, and it works perfectly.
msg310115 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-16 20:46
> However, the principal reason that I'm opening this issue is that Ray complained that packages shouldn't be altering the DLL search path

If that's crucial context, can you link to it?
msg310116 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-16 20:53
I understood the proposal just fine, and I understand the problems involved and appreciate why the ideal isn't sufficient here.

The import hook was my proposal to let you only preload DLLs when the extension module is being loaded, rather than having to load all the DLLs on the first "import scipy" just in case one of its submodules gets imported later. A hook can trigger on a specific module.

Since there appears to be some uncertainty, package __init__.py always runs before its submodules are even resolved, so it's totally fine to modify import machinery or preload DLLs here. 

rpath is totally irrelevant here. The OS model is different and we've given you the available options on Windows (application directory, process-wide search path, explicit preloading, assembly probing path).
msg310118 - (view) Author: xoviat (xoviat) Date: 2018-01-16 22:51
This is what ray said:

"Please do not do this. Importing a specific module should not modify the way that process loads subsequent DLLs."

(https://github.com/numpy/numpy/pull/10229#issuecomment-354846459)

What I'm proposing to do is write a tool, widely used like auditwheel, that will copy shared libraries into the .libs folder and then patch __init__.py in all packages in order to modify the DLL search path to add these folders.

If everyone's okay with that, we can close this issue. But if everyone's not okay with that, then we need to document that it's Python's responsibility to do this.
msg310123 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-17 01:00
I am very okay with it not being Python's responsibility to do this.
msg310126 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-17 01:12
Steve said:
> The import hook was my proposal to let you only preload DLLs when the extension module is being loaded, rather than having to load all the DLLs on the first "import scipy" just in case one of its submodules gets imported later. A hook can trigger on a specific module.

That's a good point: we could write an import hook that examines each .pyd before it's loaded, and then preloads just the .dlls that it's looking for. But... that's literally reimplementing what the normal DLL loader does. If we can get the normal DLL loader to work, it's probably going to be simpler. And so long as we're talking specifically about the case where it's a directory we control and that only contains .dlls with mangled names, then it seems fine to me. (And maybe Ray will have to hold his nose, but, well, you know. That's programming sometimes.)

xoviat said:
> This is what ray said:
> "Please do not do this. Importing a specific module should not modify the way that process loads subsequent DLLs."

I'm not sure what Ray's worried about exactly, but I don't see anything in his comment that makes me think moving the DLL path manipulation code into the interpreter will make him happier.

I think this can be closed.
msg310129 - (view) Author: xoviat (xoviat) Date: 2018-01-17 01:52
For the record, moving the DLL path manipulation code into the interpreter would address the concern that importing a module would not manipulate the search path because the behavior would move into Python itself.
msg310130 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-17 02:08
The import hook could theoretically modify the search path before and after loading the module, though that could make debugging a real pain.

I was also thinking of just having an explicit list of DLLs to load, rather than inspecting the binary. Per-module preloading is as fine grained as you can get though it still won't help if another import has loaded a different DLL by the same name. This is where you can't avoid recompilation or activation contexts.
msg310131 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-17 03:00
> it still won't help if another import has loaded a different DLL by the same name. This is where you can't avoid recompilation or activation contexts.

Ah, there's an important bit of context you're missing: there is actually a third option :-)

https://github.com/njsmith/machomachomangler#pe-features
msg310133 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-17 03:38
That looks like recompilation (or at least how recompilation would look if you'd been granted permission). Assuming you can recompile the binary, you could rename the dependency and regenerate the import library so that you can link directly against the new name.

As an official position, I don't support modifying other people's PE files or distributing the modified results.
msg310135 - (view) Author: xoviat (xoviat) Date: 2018-01-17 03:48
Just to be clear, I'm not considering doing this with respect to the C/C++ runtimes, but all of the other shared libraries. And in case you weren't aware, this is exactly what auditwheel does (except that I believe it uses patchelf, whereas I will be using Nathaniel's tool).
msg310136 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-17 03:53
> That looks like recompilation (or at least how recompilation would look if you'd been granted permission). Assuming you can recompile the binary, you could rename the dependency and regenerate the import library so that you can link directly against the new name.

Perhaps *you* can do that, but I don't think you want to volunteer to maintain Windows wheels for every package on PyPI :-). Regenerating import libraries and tweaking build systems and all that works fine if you're a Windows specialist distributing artisinal hand-crafted builds, but you can't automate that knowledge and use it at scale. You *can* write an automatic tool that slurps in a bunch of PE files and rewrites them to create a self-contained DLL-hell-free redistributable. This is the core strategy that osx and manylinux wheels have been using for years now, and why it's possible for random devs to make wheels that work, even on platforms they don't use.
msg310143 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2018-01-17 08:35
On 17.01.2018 02:52, xoviat wrote:
> 
> xoviat <xoviat@gmail.com> added the comment:
> 
> For the record, moving the DLL path manipulation code into the interpreter would address the concern that importing a module would not manipulate the search path because the behavior would move into Python itself.

Can't you simply place the DLLs into the PythonXX\DLLs\ directory ?

That's where Python itself keeps external DLLs (and several PYDs)
and it won't change after installation of Python.

Or create a special container package on PyPI into which you place
the DLLs and add dependencies to this in all other packages.

You can then load the DLL via win32 LoadLibrary either using the
Python win32 tools or ctypes:

https://docs.python.org/3.7/library/ctypes.html
http://timgolden.me.uk/pywin32-docs/win32api__LoadLibrary_meth.html
https://www.programcreek.com/python/example/51388/win32api.LoadLibrary

FWIW: I think this ticket has shown plenty options to possible
solutions, including many which do not manipulate the path.
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76697
2018-01-17 08:35:43lemburgsetmessages: + msg310143
2018-01-17 03:53:54njssetmessages: + msg310136
2018-01-17 03:48:14xoviatsetmessages: + msg310135
2018-01-17 03:38:38steve.dowersetmessages: + msg310133
2018-01-17 03:00:48njssetmessages: + msg310131
2018-01-17 02:08:03steve.dowersetmessages: + msg310130
2018-01-17 01:52:30xoviatsetmessages: + msg310129
2018-01-17 01:12:44njssetmessages: + msg310126
2018-01-17 01:00:52steve.dowersetstatus: open -> closed
resolution: rejected
messages: + msg310123

stage: resolved
2018-01-16 22:51:15xoviatsetmessages: + msg310118
2018-01-16 20:53:22steve.dowersetmessages: + msg310116
2018-01-16 20:46:17njssetmessages: + msg310115
2018-01-16 20:00:25xoviatsetmessages: + msg310110
2018-01-16 11:51:20njssetmessages: + msg310083
2018-01-16 11:13:49lemburgsetmessages: + msg310077
2018-01-16 11:02:56njssetnosy: + njs
messages: + msg310076
2018-01-15 19:34:45brett.cannonsetmessages: + msg310011
2018-01-15 12:02:37lemburgsetnosy: + lemburg
messages: + msg309968
2018-01-14 22:36:42steve.dowersetnosy: + brett.cannon
messages: + msg309938
2018-01-14 02:46:18xoviatsetmessages: + msg309912
2018-01-14 02:42:54xoviatsetmessages: + msg309911
2018-01-14 01:27:25eryksunsetnosy: + eryksun
messages: + msg309908
2018-01-13 20:47:09steve.dowersetmessages: + msg309900
2018-01-13 20:10:47pitrousetnosy: + Ray Donnelly
2018-01-13 20:10:28pitrousetnosy: + paul.moore, tim.golden, zach.ware, steve.dower
2018-01-13 18:08:38barrysetnosy: + barry
2018-01-07 22:43:24xoviatsetmessages: + msg309644
2018-01-07 22:41:46xoviatcreate