Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py_GetProgramFullPath() odd behaviour in Windows #78906

Open
mariofutire mannequin opened this issue Sep 18, 2018 · 23 comments
Open

Py_GetProgramFullPath() odd behaviour in Windows #78906

mariofutire mannequin opened this issue Sep 18, 2018 · 23 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows

Comments

@mariofutire
Copy link
Mannequin

mariofutire mannequin commented Sep 18, 2018

BPO 34725
Nosy @pfmoore, @ncoghlan, @vstinner, @tjguk, @ericsnowcurrently, @zware, @zooba
PRs
  • bpo-34725: Adds _Py_SetProgramFullPath so embedders may override sys.executable #9860
  • [3.7] bpo-34725: Adds _Py_SetProgramFullPath so embedders may override sys.executable (GH-9860) #9861
  • Files
  • poc.c: Example of odd behaviour
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-09-18.17:26:47.540>
    labels = ['interpreter-core', '3.7', '3.8', 'OS-windows']
    title = 'Py_GetProgramFullPath() odd behaviour in Windows'
    updated_at = <Date 2019-05-27.21:30:46.561>
    user = 'https://bugs.python.org/mariofutire'

    bugs.python.org fields:

    activity = <Date 2019-05-27.21:30:46.561>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core', 'Windows']
    creation = <Date 2018-09-18.17:26:47.540>
    creator = 'mariofutire'
    dependencies = []
    files = ['47814']
    hgrepos = []
    issue_num = 34725
    keywords = ['patch']
    message_count = 23.0
    messages = ['325666', '325668', '325669', '325674', '326035', '326042', '326970', '327000', '327081', '327249', '327364', '327370', '327447', '327489', '327659', '327701', '328054', '330036', '330037', '330038', '343644', '343681', '343691']
    nosy_count = 8.0
    nosy_names = ['paul.moore', 'ncoghlan', 'vstinner', 'tim.golden', 'eric.snow', 'zach.ware', 'steve.dower', 'mariofutire']
    pr_nums = ['9860', '9861']
    priority = 'normal'
    resolution = None
    stage = 'resolved'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34725'
    versions = ['Python 3.7', 'Python 3.8']

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Sep 18, 2018

    According to the doc Py_GetProgramFullPath() should return the full path of the program name as set by Py_SetProgramName().

    https://docs.python.org/3/c-api/init.html#c.Py_GetProgramFullPath

    This works well in Linux, but in Windows it is always the name of the current executable (from GetModuleFileNameW).

    This is because the 2 files Modules/getpath.c and PC/getpathp.c have completely different logic in calculate_program_full_path() vs get_program_full_path().

    This difference is harmless when running in the normal interpreter (python.exe), but can be quite dramatic when embedding python into a C application.

    The value returned by Py_GetProgramFullPath() is the same as sys.executable in python.

    Why this matters? For instance in Linux virtual environments work out of the box for embedded applications, while they are completely ignored in Windows.

    python -m venv abcd

    and then if I run my app inside the (activated) abcd environment in Linux I can access the same modules as if I were executing python, while in Windows I still get the system module search path.

    If you execute the attached program in Linux you get

    EXECUTABLE /tmp/abcd/bin/python3
    PATH ['/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/tmp/abcd/lib/python3.7/site-packages']

    in Windows

    EXECUTABLE c:\TEMP\vsprojects\ConsoleApplication1\x64\Release\ConsoleApplication1.exe
    PATH ['C:\\TEMP\\venv\\abcd\\Scripts\\python37.zip', 'C:\\Python37\\Lib', 'C:\\Python37\\DLLs', 'c:\\TEMP\\vsprojects\\ConsoleApplication1\\x64\\Relea
    se', 'C:\\Python37', 'C:\\Python37\\lib\\site-packages']

    with a mixture of paths from the venv, system and my app folder.
    But more importantly site-packages comes from the system (bad!).

    This is because site.py at lines 454 uses the path of the interpreter to locate the venv configuration file.

    So in the end, virtual environments work out of the box in Linux even for an embedded python, but not in Windows.

    @mariofutire mariofutire mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows labels Sep 18, 2018
    @zooba
    Copy link
    Member

    zooba commented Sep 18, 2018

    That executable doesn't appear to be in a virtual environment - you should be running C:\TEMP\venv\abcd\Scripts\python.exe

    Does that resolve your problem?

    @zooba
    Copy link
    Member

    zooba commented Sep 18, 2018

    (Also, the behavior of Py_GetProgramFullPath is intentional, but we do have another bug somewhere to be able to override it for embedding purposes. sys.executable should be None when it does not contain a suitable path for running the normal Python interpreter again. I haven't searched for that bug just now, but we should find it and track the issue there, rather than creating a different issue.)

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Sep 18, 2018

    On 18/09/2018 19:24, Steve Dower wrote:

    Steve Dower <steve.dower@python.org> added the comment:

    That executable doesn't appear to be in a virtual environment - you should be running C:\TEMP\venv\abcd\Scripts\python.exe

    Does that resolve your problem?

    Nope,

    I am *not* running python, I am running a C app which embeds the python interpreter.
    I am running exactly

    c:\TEMP\vsprojects\ConsoleApplication1\x64\Release\ConsoleApplication1.exe

    In a later comment you say the behaviour of Py_GetProgramFullPath is intentional: which behaviour?
    Windows? Linux? or the fact that they behave differently?

    I guess that if there were a way to force Py_GetProgramFullPath() it would solve my problem, because
    I could direct site.py towards the correct virtual environment.

    If sys.executable becomes None for embedded python (without the ability to set it), then virtual
    environments wont work at all, which would be sad.

    @zooba
    Copy link
    Member

    zooba commented Sep 21, 2018

    I meant returning the full name of the process is intentional. But you're right that overriding it should actually override it.

    I found the prior bug at bpo-33180, but I'm closing it in favour of this one. I don't have fully fleshed out semantics in my mind for all the cases to handle here, but I hope that we soon reach a point of drastically simplifying getpath and can align the platforms better at that point.

    Meanwhile I'll leave this open in case anyone wants to work on a targeted fix.

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Sep 21, 2018

    On 21/09/2018 21:44, Steve Dower wrote:

    Steve Dower <steve.dower@python.org> added the comment:

    I meant returning the full name of the process is intentional. But you're right that overriding it should actually override it.

    I found the prior bug at bpo-33180, but I'm closing it in favour of this one. I don't have fully fleshed out semantics in my mind for all the cases to handle here, but I hope that we soon reach a point of drastically simplifying getpath and can align the platforms better at that point.

    Meanwhile I'll leave this open in case anyone wants to work on a targeted fix.

    So you are saying that the Windows behaviour (+ ability to overwrite) is intentional.
    This looks to me in contrast to what the doc says under
    https://docs.python.org/3/c-api/init.html#c.Py_GetProgramFullPath.

    Moreover I am not sure what Py_SetProgramName() is meant to do then.

    The problem in my opinion is that we are trying to fit 2 things in the same field: the real
    executable name and the root of the python installation (which could be a virtual environment as well).
    In python.exe the 2 are the same (or linked), but for embedded applications they are not.

    Remember that site.py uses the sys.executable as "root of the python installation" to derive the
    path and handle virtual environments.

    I think that if these 2 concepts were separated, it would be much easier to explain the desired
    behaviour and find a valid implementation in Window and Linux.

    Let's say sys.executable is the full name of the process and sys.python_root is the folder from
    which to derive all the paths.

    It is probably too big of a change, but it might be useful to write down the ideal behaviour before
    thinking of a pragmatic solution.

    Andrea

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Oct 3, 2018

    Is there any agreement on what is wrong with the current code.

    The key in my opinion is the double purpose of sys.executable and that in Linux and Windows people have taken the two different points of view, so they are both right and wrong at the same time.

    @zooba
    Copy link
    Member

    zooba commented Oct 3, 2018

    I don't think anything has been agreed upon.

    Currently, the launched program name is used for some things other than setting sys.executable, and I believe it should continue to be used for those. But there are also needs for overriding sys.executable to be something other than the current process (e.g. a launcher that simply loads Python into its own process, but needs a different process to be used for multiprocessing support).

    Victor has been looking at the initialization process, so I'm not sure if something has already changed here yet. I'd be keen to see the getpath part of initialization be written in (frozen or limited) Python code that can be easily overridden by embedders to initialize all of these members however they like. That way everyone can equally lie about argv0/GetModuleFullPath and sys.prefix/sys.executable/etc.

    Until we get there, we may just need a couple more configuration fields, and perhaps some that default to one of the others when unspecified.

    @zooba
    Copy link
    Member

    zooba commented Oct 4, 2018

    Reading the docs, I'm pretty sure we need a new Py_SetProgramFullPath() function. Py_SetProgramName explicitly is only providing a hint to figure out the file containing the executable, and I really want this to make my new launcher feasible: https://github.com/zooba/cpython/blob/msix/Programs/launch.c

    Victor - I've tried for an hour now and I can't figure out where to put this value in all the new configuration stuff. I'm finding it *very* convoluted, with so much copying of config structs and then back-and-forth copying certain values around. Some guidance would be great.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Oct 6, 2018

    Directly addressing the topic of the bug:

    Py_SetProgramName() should be a relative or absolute path that can be used to set sys.executable and other values appropriately. This is used in Programs/_testembed.c for example.

    I didn't know it didn't work the same way on Windows as it does on other platforms, and I have no idea why it's different there. (The divergence between the Windows and *nix implementations of getpath predates my own involvement in startup sequence modifications, and I've never even read the Windows version of the code)

    On the startup sequence refactoring in general:

    Yeah, eventually being able to eliminate getpath.c in favour of a froze _getpath.py module has been one of my long term hopes for the PEP-432 startup sequence refactoring. The underlying issue making that difficult that is that it's always been murky as to exactly what Python code could safely execute at the point where that path information needs to be calculated, and the tests of path configuration are weak enough that it's easy to introduce regressions even with small changes, let alone a wholesale rewrite.

    If a new setting is genuinely needed, then where to put things in the new config is still open for discussion - at the moment, it's pretty much just a straight transcription of the way CPython has historically done things, and is hence heavy on the use of low level C data types (especially wchar* where paths are concerned).

    This means that the CoreConfig struct currently still contains a lot of things that aren't actually needed if all you want is a running Python interpreter and can live without a fully populated sys module.

    The *advantage* of that approach is that it means it still maps pretty easily to the existing Py_Initialize approach: the PySet_* API writes to a global copy of a the CoreConfig struct, and then Py_Initialize reads that in to the active runtime state.

    @zooba
    Copy link
    Member

    zooba commented Oct 8, 2018

    Py_SetProgramName() should be a relative or absolute path that can be used to set sys.executable and other values appropriately.

    Key point here is *can be*, but it doesn't have to be. Given it has fallbacks all the way to "python"/"python3", we can't realistically use it as sys.executable just because it has a value.

    And right now, it's used to locate the current executable (which is unnecessary on Windows), which is then assumed to be correct for sys.executable. Most embedding cases require *this* assumption to be overridden, not the previous assumption.

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Oct 8, 2018

    On 08/10/2018 17:54, Steve Dower wrote:

    Steve Dower <steve.dower@python.org> added the comment:

    > Py_SetProgramName() should be a relative or absolute path that can be used to set sys.executable and other values appropriately.

    Key point here is *can be*, but it doesn't have to be. Given it has fallbacks all the way to "python"/"python3", we can't realistically use it as sys.executable just because it has a value.

    And right now, it's used to locate the current executable (which is unnecessary on Windows), which is then assumed to be correct for sys.executable. Most embedding cases require *this* assumption to be overridden, not the previous assumption.

    I still would like my use case to be acknowledged.

    site.py uses the value of sys.executable to set up a virtual environment, which is a very valuable
    thing even in an embedded cases.

    This constraint is strong enough to force it to point to python.exe or python3 as it would normally
    do in a scripted (non embedded case).

    I still believe the 2 concepts should be decoupled to avoid them clashing and having supporters of
    one disagreeing with supporters of the other.

    Andrea

    @zooba
    Copy link
    Member

    zooba commented Oct 10, 2018

    We'll need to bring in venv specialists to check whether using it outside of Py_Main() is valid. Or perhaps you could explain what you are actually trying to do?

    I don't believe it is necessary when you are calling Py_SetPath yourself, and only the "launch normally with alternate args" case for scripts that use sys.executable are affected. But I'm happy to be set right here (with example scenarios, preferably).

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Oct 10, 2018

    On 10/10/2018 01:11, Steve Dower wrote:

    Steve Dower <steve.dower@python.org> added the comment:

    We'll need to bring in venv specialists to check whether using it outside of Py_Main() is valid. Or perhaps you could explain what you are actually trying to do?

    Sure

    1. Create a virtual environment ("python -m venv")
    2. Activate
    3. Pip install some modules
    4. Try to use them form inside an embedded application (e.g. the one I attached)
    5. Do it in Linux and Windows

    Result

    Works in Linux, fails in Windows.

    Reason in site.py

    executable = sys.executable

    sys.executable is used to construct the correct search path.

    Looking at the sys.path from inside an embedded application is very instructive and you can see in
    the first post why the failure in windows.

    Andrea

    @zooba
    Copy link
    Member

    zooba commented Oct 13, 2018

    I meant why are you using an embedded application with a virtual environment? What sort of application do you have that requires users to configure a virtual environment, rather than providing its own set of libraries?

    The embedding scenarios I'm aware of almost always want privacy/isolation from whatever a user has installed/configured, so that they can work reliably even when users modify other parts of their own system. I'm trying to understand what scenario (other than "I am an interactive Python shell") would want to automatically pick up the configuration rather than having its own configuration files/settings.

    @zooba zooba added the 3.8 only security fixes label Oct 13, 2018
    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented Oct 14, 2018

    On 13/10/2018 17:37, Steve Dower wrote:

    Steve Dower <steve.dower@python.org> added the comment:

    I meant why are you using an embedded application with a virtual environment? What sort of application do you have that requires users to configure a virtual environment, rather than providing its own set of libraries?

    The embedding scenarios I'm aware of almost always want privacy/isolation from whatever a user has installed/configured, so that they can work reliably even when users modify other parts of their own system. I'm trying to understand what scenario (other than "I am an interactive Python shell") would want to automatically pick up the configuration rather than having its own configuration files/settings.

    Does it really matter who owns main(), whether it is in python.exe or in some other C app.

    This is exactly how you described, users want to use some C application which will call into python
    using some (user defined) python modules to execute some tasks which are scriptable.
    And they want to be able to do in a confined environment where they can install the exact set of
    packages they require. And it is possible at the same time to set up multiple environments where
    different versions are tested independently.

    There is as well the totally independent scenario where the app ships exactly what it needs, but
    there are some ways in between where one can script an app and in doing so you might need packages
    that the app itself knew nothing about.

    For another example have a look at JEP
    https://github.com/ninia/jep/search?q=virtual&unscoped_q=virtual

    This is a way to call python from Java: same problem above, people might want to run it in a virtual
    environment and the only way to do this now is to manually set up PYTHONHOME, but it is pretty weak
    and does not replicate exactly what happens with virtual environments (e.g. inherit system's
    site-packages).

    Again, in Linux, JEP works out of the box with no need to tell it about virtual environments,
    Py_Initialise() finds it (if they are indeed present) with absolutely no extra configuration (no
    need to change PYTHONPATH).

    Andrea

    @zooba
    Copy link
    Member

    zooba commented Oct 19, 2018

    I requested Victor review on my PR, but if anyone else is able to please feel free.

    @zooba
    Copy link
    Member

    zooba commented Nov 18, 2018

    New changeset 177a41a by Steve Dower in branch 'master':
    bpo-34725: Adds _Py_SetProgramFullPath so embedders may override sys.executable (GH-9860)
    177a41a

    @zooba
    Copy link
    Member

    zooba commented Nov 18, 2018

    New changeset e851049 by Steve Dower in branch '3.7':
    bpo-34725: Adds _Py_SetProgramFullPath so embedders may override sys.executable (GH-9861)
    e851049

    @zooba
    Copy link
    Member

    zooba commented Nov 18, 2018

    The next releases of 3.7 and 3.8 will include _Py_SetProgramFullPath() functions for embedders to set the eventual value of sys.executable before calling Py_Initialize(). It's undocumented and not guaranteed stable (and indeed, it looks like Victor is already working on another patch that may see it removed before it's ever released), but it's in now as a workaround for the cases that need it.

    @vstinner
    Copy link
    Member

    I understand that issue is now fixed in bpo-36763 by the implementation of the PEP-587 which adds a new public API for the "Python Initialization Configuration". It provides a finer API to configure the "Path Configuration". For example, PyConfig.executable can be used to replace _Py_SetProgramFullPath().

    In Python 3.7, the private _Py_SetProgramFullPath() function can be used as a workaround.

    I close the issue. If I misunderstood the issue, please comment/reopen it ;-)

    @mariofutire
    Copy link
    Mannequin Author

    mariofutire mannequin commented May 27, 2019

    Unfortunately the underlying cause of this issue has not been addressed, nor discussed.

    There is now a way to workaround the different behaviour in Windows and Linux and it is possible to use the new call to make virtual environment work in Windows as they already do in Linux.

    Problem is that application will have to be change to actually implement the workaround.

    I still think this difference should be addressed directly.

    @vstinner
    Copy link
    Member

    I read again the issue. In short, the Path Configuration is a mess and unusable in some cases :-) I reopen the issue.

    Handling venv shouldn't be handled by the site module which is optional, but earlier.

    I guess that venv support was added to site because it is way eaiser to write C than touching getpath.c written in C.

    @vstinner vstinner reopened this May 27, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants