classification
Title: os.path.realpath() unexpected breaking change: resolves subst'd paths to real paths
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, paul.moore, sfmc, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2021-04-28 16:26 by sfmc, last changed 2021-05-02 20:10 by sfmc.

Messages (9)
msg392234 - (view) Author: sfmc (sfmc) Date: 2021-04-28 16:26
For example if I mount directory
  C:\example\dir
to
  Z:\
the
  os.path.realpath('Z:\\')
returns the real directory.

Use following commands in Windows to reproduce the issue:

  md C:\example\dir
  subst Z: C:\example\dir
  python.exe -c "import os; print(os.path.realpath('Z:\\'))"

Python 3.8 outputs:
  C:\example\dir
Python 3.7 outputs:
  Z:\

This is unexpected behavior change and it breaks our scripts in many places, because we use mounts in Windows ("subst" command) and Linux ("mount" command).

We had to add our own implementation for realpath to our scripts, but it also affects other tools, such as Python debugger (breakpoints stopped working) and IDEs (such as PyCharm).

It can be argued whether the behavior to resolve mounts is good.

But this change breaks the ability to work with Python scripts in mounts.
I hope it can be fixed in Python 3.8.10.


I propose to fix it in Python 3.8.10 by adding to function
    os.path.realpath(path)
a new parameter (named for example "resolve_mounts"), like that:
    os.path.realpath(path, *, resolve_mounts=False)

And if resolve_mounts==False, then the function should not resolve mounts in Windows ("subst" command) and Linux ("mount" command).


Let me know if you wish to get a Pull Request with the proposed fix. I can try to implement it.
msg392278 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-04-29 05:17
A substitute drive is not a mount point, strictly speaking. It's a symlink in the object namespace that targets an arbitrary path on a device, or even on another mapped/substitute drive. A mapped drive is a similar case that targets an arbitrary path on a UNC share. Any number of these path components may be filesystem symlinks, which can in turn target other substitute drives and mapped drives.

The object manager reparses your example substitute drive "Z:" to its target path r"C:\example\dir" before the I/O manager and filesystem ever see the path. Thus, hypothetically, if r"Z:\symlink" targets r"\spam", it resolves to r"C:\spam", not r"Z:\spam", i.e. not r"C:\example\dir\spam". If r"Z:\symlink" targets r"..\spam", it resolves to r"C:\example\spam". 

Also, substitute drives are almost always defined for the current logon session only. If you think it's not an issue because you're the only user on the machine, there's still typically a linked logon to worry about in the case of a UAC split standard/administrator logon; plus there are service processes running as SYSTEM, NETWORK SERVICE, and LOCAL SERVICE; and maybe also non-interactive tasks running as the user account in a separate logon session. Given that Windows is inherently a multiple-logon system, a path that's only accessible by one logon session in the system is not the best candidate for a real path if it's possible to resolve it further to a path that's globally accessible.

---

A mount point, unlike a symlink, is a graft point that behaves like a regular subtree. Classical mount points in Windows are the root paths of volume devices (e.g. "C:\\", where "C:" is a volume device name) and redirected filesystems mounted as UNC shares (e.g. r"\\server\share"). Windows 2000 added support for bind mount points (i.e. junctions, e.g. r"C:\Mount\F"). Unfortunately, switching to junctions probably won't solve your problem. The current design of ntpath.realpath() doesn't always retain a bind mount point in the resolved path, not unless it's the canonical path of a local volume -- or unless it's in a UNC path since it's impossible to resolve a remote junction to an accessible target path.
msg392357 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-04-29 22:50
I think Eryk's point is that the behaviour is correct, and it's unfortunate that certain tools use realpath() when they don't actually mean it (I've been bitten by this too with pytest).

Presumably you want to replace realpath with abspath?
msg392390 - (view) Author: sfmc (sfmc) Date: 2021-04-30 08:20
I see the point: the real path may not be accessible from the substitute drive:
 - if symlink (or junction) is used, pointing to path not visible from the substitute drive.
 - if different security context is used (e.g. different user or UAC).

But that is the discussion about the _correct_ behavior (which may have differing opinions if the exact behavior is not documented).

-----

Let's discuss how we can fix the issue caused by the behavior change.

I propose a simple fix: keep old behavior if a special environment variable is specified.

E.g. if you specify env. var. PYTHON_NTREALPATH_OLD_BEHAVIOR=1, it doesn't resolve symlinks and junctions.

Is this acceptable for Python 3.8.10?
msg392487 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-04-30 18:00
> E.g. if you specify env. var. PYTHON_NTREALPATH_OLD_BEHAVIOR=1, 
> it doesn't resolve symlinks and junctions.

I assumed you wanted to resolve symlinks but hadn't considered that substitute drives are implemented as object symlinks that target arbitrary paths, which can include other substitute/mapped drives and filesystem symlinks. They aren't handled as mount points, at least not during system path parsing.

If you don't need to resolve symlinks, just use os.path.abspath() in Windows and os.path.realpath() in POSIX. Don't worry about symlinks in the opened path in Windows. Unlike POSIX, the Windows API resolves a path like r"C:\example\symlink\..\dir" simply as r"C:\example\dir", before the kernel and filesystem ever see the path. There are reasons to need a real path in Windows, but this isn't one of them.
msg392497 - (view) Author: sfmc (sfmc) Date: 2021-04-30 18:39
> If you don't need to resolve symlinks, just use os.path.abspath() in Windows and os.path.realpath() in POSIX.

As I said, we already fixed it in our scripts.

The problem is with Python debugger and third-party tools - we can't make changes there.

The environment variable would be easy fix to this issue.

No impact on those who do not set the variable.

-----

In our organization we use substituted drives for development, and no other tools (including other programming languages and their IDEs), except for Python 3.8+, had any problems with this.

For example, in last versions of Perl and Ruby the realpath keeps the substitute drive.

(In Ruby the realpath is also smart enough to resolve symlinks while keeping the drive.)

-----

I consider this a serious issue with Python, I hope it gets the deserved attention and at least the workaround with env. var. gets implemented.
msg392510 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-04-30 20:13
> In our organization we use substituted drives for development, and 
> no other tools (including other programming languages and their IDEs),
> except for Python 3.8+, had any problems with this.

How about keeping a substitute or mapped drive from the input path if resolving the root path on the drive prefixes the overall real path? That would be pretty easy to implement.

Still, descendant relative symlinks will traverse through the supposed root to the actual real path. Given "Z:" is a substitute drive for r"C:\example\dir", if r"Z:\symlink" targets r"\spam", it will resolve to r"C:\spam", and if it targets  r"..\spam", it will resolve to r"C:\example\spam". That violates the assumption that a relative symlink can be resolved against a real path via readlink(), join(), and normpath().

The nearest equivalent to Linux bind mounts are mount-point reparse points (junctions). But the final-path design of realpath() doesn't work how I'd want in that regard. Given the limitations with junctions, if I had to do something like what you're doing, presuming UNC paths are allowed, I'd use a local-only share created via `net share sharename=path /grant:local,full`, accessed as "//localhost/sharename". A share is a bind mount in the namespace of the "UNC" DOS device.
msg392700 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-02 14:35
> The problem is with Python debugger and third-party tools - we can't make changes there.

You can report the issue to them, though. They may not realise that they're using realpath() in scenarios when their users do not want links to be resolved.

For your own needs, you could add a sitecustomize.py file that does "ntpath.realpath = ntpath.abspath". That will change the behaviour back to what it was in 3.7.

> Is this acceptable for Python 3.8.10?

No, it's not a security fix.
msg392718 - (view) Author: sfmc (sfmc) Date: 2021-05-02 20:10
> How about keeping a substitute or mapped drive from the input path
> if resolving the root path on the drive prefixes the overall real
> path? That would be pretty easy to implement.

So if the resolved path is accessible from the original path's drive, then keep the drive?
This is what 'realpath' in Ruby does. Good idea, in my opinion.
Letting this be default behavior (controlled by keyword argument) would prevent the issues with mounted drives.

> For your own needs, you could add a sitecustomize.py file that
> does "ntpath.realpath = ntpath.abspath". That will change the
> behaviour back to what it was in 3.7.

It works. Thank you very much.

-----

Since you provided the working solution for me, the destiny of this bug report is at your discretion.
History
Date User Action Args
2021-05-02 20:10:57sfmcsetmessages: + msg392718
2021-05-02 14:35:23steve.dowersetmessages: + msg392700
2021-04-30 20:13:30eryksunsetmessages: + msg392510
2021-04-30 18:39:13sfmcsetmessages: + msg392497
2021-04-30 18:00:21eryksunsetmessages: + msg392487
2021-04-30 08:20:04sfmcsetmessages: + msg392390
2021-04-29 22:50:18steve.dowersetmessages: + msg392357
2021-04-29 05:17:17eryksunsetnosy: + paul.moore, tim.golden, eryksun, zach.ware, steve.dower
messages: + msg392278
components: + Windows
2021-04-28 16:26:16sfmccreate