classification
Title: Py_GetProgramFullPath() odd behaviour in Windows
Type: Stage: patch review
Components: Interpreter Core, Windows Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, mariofutire, ncoghlan, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2018-09-18 17:26 by mariofutire, last changed 2018-11-18 04:44 by steve.dower.

Files
File name Uploaded Description Edit
poc.c mariofutire, 2018-09-18 17:26 Example of odd behaviour
Pull Requests
URL Status Linked Edit
PR 9860 merged steve.dower, 2018-10-13 23:57
PR 9861 merged steve.dower, 2018-10-13 23:58
Messages (20)
msg325666 - (view) Author: Mario (mariofutire) Date: 2018-09-18 17:26
According to the doc Py_GetProgramFullPath() should return the full path of the program name as set by Py_SetProgramName().

https://docs.python.org/3/c-api/init.html#c.Py_GetProgramFullPath

This works well in Linux, but in Windows it is always the name of the current executable (from GetModuleFileNameW).

This is because the 2 files Modules/getpath.c and PC/getpathp.c have completely different logic in calculate_program_full_path() vs get_program_full_path().

This difference is harmless when running in the normal interpreter (python.exe), but can be quite dramatic when embedding python into a C application.

The value returned by Py_GetProgramFullPath() is the same as sys.executable in python.

Why this matters? For instance in Linux virtual environments work out of the box for embedded applications, while they are completely ignored in Windows.

python -m venv abcd

and then if I run my app inside the (activated) abcd environment in Linux I can access the same modules as if I were executing python, while in Windows I still get the system module search path.

If you execute the attached program in Linux you get

EXECUTABLE /tmp/abcd/bin/python3
PATH ['/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/tmp/abcd/lib/python3.7/site-packages']

in Windows

EXECUTABLE c:\TEMP\vsprojects\ConsoleApplication1\x64\Release\ConsoleApplication1.exe
PATH ['C:\\TEMP\\venv\\abcd\\Scripts\\python37.zip', 'C:\\Python37\\Lib', 'C:\\Python37\\DLLs', 'c:\\TEMP\\vsprojects\\ConsoleApplication1\\x64\\Relea
se', 'C:\\Python37', 'C:\\Python37\\lib\\site-packages']

with a mixture of paths from the venv, system and my app folder.
But more importantly site-packages comes from the system (bad!).

This is because site.py at lines 454 uses the path of the interpreter to locate the venv configuration file.

So in the end, virtual environments work out of the box in Linux even for an embedded python, but not in Windows.
msg325668 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-09-18 18:24
That executable doesn't appear to be in a virtual environment - you should be running C:\TEMP\venv\abcd\Scripts\python.exe

Does that resolve your problem?
msg325669 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-09-18 18:26
(Also, the behavior of Py_GetProgramFullPath is intentional, but we do have another bug somewhere to be able to override it for embedding purposes. sys.executable should be None when it does not contain a suitable path for running the normal Python interpreter again. I haven't searched for that bug just now, but we should find it and track the issue there, rather than creating a different issue.)
msg325674 - (view) Author: Mario (mariofutire) Date: 2018-09-18 19:42
On 18/09/2018 19:24, Steve Dower wrote:
> 
> Steve Dower <steve.dower@python.org> added the comment:
> 
> That executable doesn't appear to be in a virtual environment - you should be running C:\TEMP\venv\abcd\Scripts\python.exe
> 
> Does that resolve your problem?
> 

Nope,

I am *not* running python, I am running a C app which embeds the python interpreter.
I am running exactly

c:\TEMP\vsprojects\ConsoleApplication1\x64\Release\ConsoleApplication1.exe

In a later comment you say the behaviour of Py_GetProgramFullPath is intentional: which behaviour? 
Windows? Linux? or the fact that they behave differently?

I guess that if there were a way to force Py_GetProgramFullPath() it would solve my problem, because 
I could direct site.py towards the correct virtual environment.

If sys.executable becomes None for embedded python (without the ability to set it), then virtual 
environments wont work at all, which would be sad.
msg326035 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-09-21 20:44
I meant returning the full name of the process is intentional. But you're right that overriding it should actually override it.

I found the prior bug at issue33180, but I'm closing it in favour of this one. I don't have fully fleshed out semantics in my mind for all the cases to handle here, but I hope that we soon reach a point of drastically simplifying getpath and can align the platforms better at that point.

Meanwhile I'll leave this open in case anyone wants to work on a targeted fix.
msg326042 - (view) Author: Mario (mariofutire) Date: 2018-09-21 21:02
On 21/09/2018 21:44, Steve Dower wrote:
> 
> Steve Dower <steve.dower@python.org> added the comment:
> 
> I meant returning the full name of the process is intentional. But you're right that overriding it should actually override it.
> 
> I found the prior bug at issue33180, but I'm closing it in favour of this one. I don't have fully fleshed out semantics in my mind for all the cases to handle here, but I hope that we soon reach a point of drastically simplifying getpath and can align the platforms better at that point.
> 
> Meanwhile I'll leave this open in case anyone wants to work on a targeted fix.
> 

So you are saying that the Windows behaviour (+ ability to overwrite) is intentional.
This looks to me in contrast to what the doc says under 
https://docs.python.org/3/c-api/init.html#c.Py_GetProgramFullPath.

Moreover I am not sure what Py_SetProgramName() is meant to do then.

The problem in my opinion is that we are trying to fit 2 things in the same field: the real 
executable name and the root of the python installation (which could be a virtual environment as well).
In python.exe the 2 are the same (or linked), but for embedded applications they are not.

Remember that site.py uses the sys.executable as "root of the python installation" to derive the 
path and handle virtual environments.

I think that if these 2 concepts were separated, it would be much easier to explain the desired 
behaviour and find a valid implementation in Window and Linux.

Let's say sys.executable is the full name of the process and sys.python_root is the folder from 
which to derive all the paths.

It is probably too big of a change, but it might be useful to write down the ideal behaviour before 
thinking of a pragmatic solution.

Andrea
msg326970 - (view) Author: Mario (mariofutire) Date: 2018-10-03 14:05
Is there any agreement on what is wrong with the current code.

The key in my opinion is the double purpose of sys.executable and that in Linux and Windows people have taken the two different points of view, so they are both right and wrong at the same time.
msg327000 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-03 18:26
I don't think anything has been agreed upon.

Currently, the launched program name is used for some things other than setting sys.executable, and I believe it should continue to be used for those. But there are also needs for overriding sys.executable to be something other than the current process (e.g. a launcher that simply loads Python into its own process, but needs a different process to be used for multiprocessing support).

Victor has been looking at the initialization process, so I'm not sure if something has already changed here yet. I'd be keen to see the getpath part of initialization be written in (frozen or limited) Python code that can be easily overridden by embedders to initialize all of these members however they like. That way everyone can equally lie about argv0/GetModuleFullPath and sys.prefix/sys.executable/etc.

Until we get there, we may just need a couple more configuration fields, and perhaps some that default to one of the others when unspecified.
msg327081 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-04 20:06
Reading the docs, I'm pretty sure we need a new Py_SetProgramFullPath() function. Py_SetProgramName explicitly is only providing a hint to figure out the file containing the executable, and I really want this to make my new launcher feasible: https://github.com/zooba/cpython/blob/msix/Programs/launch.c

Victor - I've tried for an hour now and I can't figure out where to put this value in all the new configuration stuff. I'm finding it *very* convoluted, with so much copying of config structs and then back-and-forth copying certain values around. Some guidance would be great.
msg327249 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-10-06 14:42
Directly addressing the topic of the bug:

Py_SetProgramName() should be a relative or absolute path that can be used to set sys.executable and other values appropriately. This is used in Programs/_testembed.c for example.

I didn't know it didn't work the same way on Windows as it does on other platforms, and I have no idea why it's different there. (The divergence between the Windows and *nix implementations of getpath predates my own involvement in startup sequence modifications, and I've never even read the Windows version of the code)

On the startup sequence refactoring in general:

Yeah, eventually being able to eliminate getpath.c in favour of a froze _getpath.py module has been one of my long term hopes for the PEP 432 startup sequence refactoring. The underlying issue making that difficult that is that it's always been murky as to exactly what Python code could safely execute at the point where that path information needs to be calculated, and the tests of path configuration are weak enough that it's easy to introduce regressions even with small changes, let alone a wholesale rewrite.


If a new setting is genuinely needed, then where to put things in the new config is still open for discussion - at the moment, it's pretty much just a straight transcription of the way CPython has historically done things, and is hence heavy on the use of low level C data types (especially wchar* where paths are concerned).

This means that the CoreConfig struct currently still contains a lot of things that aren't actually needed if all you want is a running Python interpreter and can live without a fully populated sys module.

The *advantage* of that approach is that it means it still maps pretty easily to the existing Py_Initialize approach: the PySet_* API writes to a global copy of a the CoreConfig struct, and then Py_Initialize reads that in to the active runtime state.
msg327364 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-08 16:54
> Py_SetProgramName() should be a relative or absolute path that can be used to set sys.executable and other values appropriately.

Key point here is *can be*, but it doesn't have to be. Given it has fallbacks all the way to "python"/"python3", we can't realistically use it as sys.executable just because it has a value.

And right now, it's used to locate the current executable (which is unnecessary on Windows), which is then assumed to be correct for sys.executable. Most embedding cases require *this* assumption to be overridden, not the previous assumption.
msg327370 - (view) Author: Mario (mariofutire) Date: 2018-10-08 20:19
On 08/10/2018 17:54, Steve Dower wrote:
> 
> Steve Dower <steve.dower@python.org> added the comment:
> 
>> Py_SetProgramName() should be a relative or absolute path that can be used to set sys.executable and other values appropriately.
> 
> Key point here is *can be*, but it doesn't have to be. Given it has fallbacks all the way to "python"/"python3", we can't realistically use it as sys.executable just because it has a value.
> 
> And right now, it's used to locate the current executable (which is unnecessary on Windows), which is then assumed to be correct for sys.executable. Most embedding cases require *this* assumption to be overridden, not the previous assumption.

I still would like my use case to be acknowledged.

site.py uses the value of sys.executable to set up a virtual environment, which is a very valuable 
thing even in an embedded cases.

This constraint is strong enough to force it to point to python.exe or python3 as it would normally 
do in a scripted (non embedded case).

I still believe the 2 concepts should be decoupled to avoid them clashing and having supporters of 
one disagreeing with supporters of the other.

Andrea
msg327447 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-10 00:11
We'll need to bring in venv specialists to check whether using it outside of Py_Main() is valid. Or perhaps you could explain what you are actually trying to do?

I don't believe it is necessary when you are calling Py_SetPath yourself, and only the "launch normally with alternate args" case for scripts that use sys.executable are affected. But I'm happy to be set right here (with example scenarios, preferably).
msg327489 - (view) Author: Mario (mariofutire) Date: 2018-10-10 19:28
On 10/10/2018 01:11, Steve Dower wrote:
> 
> Steve Dower <steve.dower@python.org> added the comment:
> 
> We'll need to bring in venv specialists to check whether using it outside of Py_Main() is valid. Or perhaps you could explain what you are actually trying to do?
> 

Sure

1) Create a virtual environment ("python -m venv")
2) Activate
2) Pip install some modules
3) Try to use them form inside an embedded application (e.g. the one I attached)
4) Do it in Linux and Windows

Result

Works in Linux, fails in Windows.

Reason in site.py

https://github.com/python/cpython/blob/73870bfeb9cf350d84ee88bd25430c104b3c6191/Lib/site.py#L462

sys.executable is used to construct the correct search path.

Looking at the sys.path from inside an embedded application is very instructive and you can see in 
the first post why the failure in windows.

Andrea
msg327659 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-13 16:37
I meant why are you using an embedded application with a virtual environment? What sort of application do you have that requires users to configure a virtual environment, rather than providing its own set of libraries?

The embedding scenarios I'm aware of almost always want privacy/isolation from whatever a user has installed/configured, so that they can work reliably even when users modify other parts of their own system. I'm trying to understand what scenario (other than "I am an interactive Python shell") would want to automatically pick up the configuration rather than having its own configuration files/settings.
msg327701 - (view) Author: Mario (mariofutire) Date: 2018-10-14 10:23
On 13/10/2018 17:37, Steve Dower wrote:
> 
> Steve Dower <steve.dower@python.org> added the comment:
> 
> I meant why are you using an embedded application with a virtual environment? What sort of application do you have that requires users to configure a virtual environment, rather than providing its own set of libraries?
> 
> The embedding scenarios I'm aware of almost always want privacy/isolation from whatever a user has installed/configured, so that they can work reliably even when users modify other parts of their own system. I'm trying to understand what scenario (other than "I am an interactive Python shell") would want to automatically pick up the configuration rather than having its own configuration files/settings.

Does it really matter who owns main(), whether it is in python.exe or in some other C app.

This is exactly how you described, users want to use some C application which will call into python 
using some (user defined) python modules to execute some tasks which are scriptable.
And they want to be able to do in a confined environment where they can install the exact set of 
packages they require. And it is possible at the same time to set up multiple environments where 
different versions are tested independently.

There is as well the totally independent scenario where the app ships exactly what it needs, but 
there are some ways in between where one can script an app and in doing so you might need packages 
that the app itself knew nothing about.

For another example have a look at JEP
https://github.com/ninia/jep/search?q=virtual&unscoped_q=virtual

This is a way to call python from Java: same problem above, people might want to run it in a virtual 
environment and the only way to do this now is to manually set up PYTHONHOME, but it is pretty weak 
and does not replicate exactly what happens with virtual environments (e.g. inherit system's 
site-packages).

Again, in Linux, JEP works out of the box with no need to tell it about virtual environments, 
Py_Initialise() finds it (if they are indeed present) with absolutely no extra configuration (no 
need to change PYTHONPATH).

Andrea
msg328054 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-19 17:04
I requested Victor review on my PR, but if anyone else is able to please feel free.
msg330036 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-11-18 04:42
New changeset 177a41a07b7d13c70d068ea0962f07e625ae171e by Steve Dower in branch 'master':
bpo-34725: Adds _Py_SetProgramFullPath so embedders may override sys.executable (GH-9860)
https://github.com/python/cpython/commit/177a41a07b7d13c70d068ea0962f07e625ae171e
msg330037 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-11-18 04:42
New changeset e851049e0e045b5e0f9d5c6b8a64d7f6b8ecc9c7 by Steve Dower in branch '3.7':
bpo-34725: Adds _Py_SetProgramFullPath so embedders may override sys.executable (GH-9861)
https://github.com/python/cpython/commit/e851049e0e045b5e0f9d5c6b8a64d7f6b8ecc9c7
msg330038 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-11-18 04:44
The next releases of 3.7 and 3.8 will include _Py_SetProgramFullPath() functions for embedders to set the eventual value of sys.executable before calling Py_Initialize(). It's undocumented and not guaranteed stable (and indeed, it looks like Victor is already working on another patch that may see it removed before it's ever released), but it's in now as a workaround for the cases that need it.
History
Date User Action Args
2018-11-18 04:44:39steve.dowersetmessages: + msg330038
2018-11-18 04:42:12steve.dowersetmessages: + msg330037
2018-11-18 04:42:01steve.dowersetmessages: + msg330036
2018-10-19 17:04:38steve.dowersetmessages: + msg328054
2018-10-14 10:23:29mariofutiresetmessages: + msg327701
2018-10-13 23:59:25steve.dowersetversions: + Python 3.8, - Python 3.6
2018-10-13 23:58:40steve.dowersetpull_requests: + pull_request9230
2018-10-13 23:57:11steve.dowersetkeywords: + patch
stage: patch review
pull_requests: + pull_request9229
2018-10-13 16:37:16steve.dowersetmessages: + msg327659
2018-10-10 19:28:14mariofutiresetmessages: + msg327489
2018-10-10 00:11:29steve.dowersetmessages: + msg327447
2018-10-08 20:19:22mariofutiresetmessages: + msg327370
2018-10-08 16:54:26steve.dowersetmessages: + msg327364
2018-10-06 14:42:17ncoghlansetmessages: + msg327249
2018-10-05 17:09:45steve.dowersetnosy: + ncoghlan, eric.snow
2018-10-04 20:06:57steve.dowersetmessages: + msg327081
2018-10-03 18:26:29steve.dowersetnosy: + vstinner
messages: + msg327000
2018-10-03 14:05:04mariofutiresetmessages: + msg326970
2018-09-21 21:02:13mariofutiresetmessages: + msg326042
2018-09-21 20:44:31steve.dowersetmessages: + msg326035
2018-09-18 19:42:11mariofutiresetmessages: + msg325674
2018-09-18 18:26:18steve.dowersetmessages: + msg325669
2018-09-18 18:24:38steve.dowersetmessages: + msg325668
2018-09-18 17:26:47mariofutirecreate