classification
Title: Embedded initialization ignores Py_SetProgramName()
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: chrullrich, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2017-09-05 12:45 by chrullrich, last changed 2019-09-26 01:14 by vstinner. This issue is now closed.

Messages (8)
msg301306 - (view) Author: Christian Ullrich (chrullrich) * Date: 2017-09-05 12:45
I'm trying to do something that may be slightly tricky, and I think I just found a vintage 1997 bug. Please correct me if I'm wrong.

This is PC/getpathp.c from current master:

430: static void
431: get_progpath(void)
432: {
433:     extern wchar_t *Py_GetProgramName(void);
434:     wchar_t *path = _wgetenv(L"PATH");
435:     wchar_t *prog = Py_GetProgramName();
436: 
437: #ifdef Py_ENABLE_SHARED
438:     extern HANDLE PyWin_DLLhModule;
439:     /* static init of progpath ensures final char remains \0 */
440:     if (PyWin_DLLhModule)
441:         if (!GetModuleFileNameW(PyWin_DLLhModule, dllpath, MAXPATHLEN))
442:             dllpath[0] = 0;
443: #else
444:     dllpath[0] = 0;
445: #endif
446:     if (GetModuleFileNameW(NULL, progpath, MAXPATHLEN))
447:         return;
448:     if (prog == NULL || *prog == '\0')
449:         prog = L"python";

Lines 446-447 have been like this ever since Guido wrote them twenty years ago. I think the logic is the wrong way around. As it is now, the function always returns on line 447 because GetModuleFileNameW() *always* succeeds (returns nonzero). Hence, everything below line 447 is dead code.

GetModuleFileNameW(NULL, ...) fills the buffer with the path to the executable the current process was created from. It returns the number of characters put into the buffer, or zero on error. However, the only way *this* call could fail is if the code was being executed outside of any process, something that clearly cannot happen.

Prior to August 1997, the relevant bit of code looked like this:

156:    if (!GetModuleFileName(NULL, progpath, MAXPATHLEN))
157:        progpath[0] = '\0';	/* failure */


This bug, if it is one, would only manifest when initializing an embedded interpreter using a different argv[0] than that of the actual host process, because if the host process is a python.exe, it very likely runs inside a standard installation or venv anyway. What I am trying is to make an interpreter running in a third-party host process take site-packages from a venv.

Doing this (argv[0] different from actual host process) may not be entirely proper itself, but then again, why would Py_SetProgramName() even exist otherwise?

Suggested fix:

diff --git a/PC/getpathp.c b/PC/getpathp.c
index 8380e1bcfa..abb5e54c9f 100644
--- a/PC/getpathp.c
+++ b/PC/getpathp.c
@@ -443,8 +443,7 @@ get_progpath(void)
 #else
     dllpath[0] = 0;
 #endif
-    if (!GetModuleFileNameW(NULL, progpath, MAXPATHLEN))
-        return;
+    GetModuleFileNameW(NULL, progpath, MAXPATHLEN);
     if (prog == NULL || *prog == '\0')
         prog = L"python";

Since the call to GetModuleFileNameW() cannot possibly fail, there is no real point in checking its return value.
msg301307 - (view) Author: Christian Ullrich (chrullrich) * Date: 2017-09-05 12:54
That should have been 

diff --git a/PC/getpathp.c b/PC/getpathp.c
index e7be704a9a..abb5e54c9f 100644
--- a/PC/getpathp.c
+++ b/PC/getpathp.c
@@ -443,8 +443,7 @@ get_progpath(void)
 #else
     dllpath[0] = 0;
 #endif
-    if (GetModuleFileNameW(NULL, progpath, MAXPATHLEN))
-        return;
+    GetModuleFileNameW(NULL, progpath, MAXPATHLEN);
     if (prog == NULL || *prog == '\0')
         prog = L"python";

instead, of course, without the negation.
msg301345 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-09-05 17:56
You're right, though I disagree with the fix as that would cause the process filename to be ignored.

A correct fix would check whether the name has been set explicitly and if so, either use it as-is (if absolute) or get the actual name and use its directory (since there's no point searching PATH as we'll likely just find some other executable and hence not be able to initialize).

As this is a subtle change in behaviour, we should only do it for 3.7. For 3.6 and earlier I'd suggest naming your executable the name that you'd like it to be, as that will work fine in this case and anywhere else we assume that the process name is the name of the process. (The fact that the behaviour is wrong in earlier versions is overridden IMO by the fact that it has been wrong for a decade and nobody has complained, and there is a trivial workaround.)
msg301346 - (view) Author: Christian Ullrich (chrullrich) * Date: 2017-09-05 18:08
Not quite. Looking a bit further down get_progname()'s weird logic, we see that it works like this:

1. prog = Py_GetProgramName()
2. progpath = GetModuleFileNameW()
3. if (prog is empty):
       prog = "python"
4. if (slash in prog):      # Or backslash, of course
       progpath = prog

So it uses the host process name from step 2 (i.e. progpath) whenever Py_SetProgramName() has not been used and step 3 has set prog to a value not containing a directory separator (or if it *has* been used to set something with no such separator in it).

The logic makes sense, I think, but it is quite impenetrable. Any chance of PEP 432 ("Restructuring the CPython startup sequence") happening this century?
msg301371 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-09-05 20:43
People are working on PEP 432 this week at the sprints, so yeah, it's likely.
msg313342 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-06 17:32
I just hit this myself (and embarrassingly, I recently touched this code everyone and forgot to fix this).

Hopefully I get a chance to get to it, but patches are certainly welcome and I'll happily aim to get them into 3.6 onwards.
msg313343 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-06 17:33
(That said, I didn't try again 3.7, so it may already be fixed there. But since we're still fixing problems with 3.6, we should do that one too.)
msg353248 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-26 00:58
Modules/getpath.c and PC/getpathp.c have been deeply reworked with the implementation of the PEP 587 in Python 3.8. Parameters which are already set (ex: Py_SetProgramName() or using the new PyConfig API) are no longer overriden. I added some tests on the "path configuration".

The most reliable way to configure the path configuration is now to use the new initialization API using PyConfig.
https://docs.python.org/dev/c-api/init_config.html
History
Date User Action Args
2019-09-26 01:14:47vstinnersetstatus: open -> closed
resolution: fixed
stage: resolved
2019-09-26 00:58:43vstinnersetnosy: + vstinner

messages: + msg353248
versions: + Python 3.9, - Python 3.6, Python 3.7
2018-03-06 17:33:28steve.dowersetmessages: + msg313343
2018-03-06 17:32:50steve.dowersetmessages: + msg313342
versions: + Python 3.6, Python 3.8
2017-09-05 20:43:07steve.dowersetmessages: + msg301371
2017-09-05 18:08:06chrullrichsetmessages: + msg301346
2017-09-05 17:56:46steve.dowersetmessages: + msg301345
versions: - Python 3.6
2017-09-05 12:54:08chrullrichsetmessages: + msg301307
2017-09-05 12:45:00chrullrichcreate