This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.realpath on Windows resolves mapped network drives
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: cgohlke, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2019-08-31 08:04 by cgohlke, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg350910 - (view) Author: Christoph Gohlke (cgohlke) Date: 2019-08-31 08:04
Re https://bugs.python.org/issue9949:

Is it intended that Python-3.8.0b4 now also resolves mapped network drives and drives created with `subst`? 

I would not expect this from the documentation at https://docs.python.org/3.8/library/os.path.html#os.path.realpath. The documentation refers to symbolic links and junctions, which are different from mapped network and subst drives (AFAIU).

For example, mapping `\\SERVER\Programs` as `X:` drive:

```
Python 3.8.0b4 (tags/v3.8.0b4:d93605d, Aug 29 2019, 23:21:28) [MSC v.1916 64 bit (AMD64)] on win32
>>> import sys, os
>>> sys.executable
'X:\\Python38\\python.exe'
>>> os.path.realpath(sys.executable)
'\\\\SERVER\\Programs\\Python38\\python.exe'
```

```
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
>>> import sys, os
>>> sys.executable
'X:\\Python37\\python.exe'
>>> os.path.realpath(sys.executable)
'X:\\Python37\\python.exe'
```

It seems this change causes an error in pytest-5.1.2 during numpy-1.17.1 tests:

```
X:\Python38>python.exe -c"import numpy;numpy.test()"
NumPy version 1.17.1
NumPy relaxed strides checking option: True

============================================= ERRORS ==============================================
__________________________________ ERROR collecting test session __________________________________
lib\site-packages\_pytest\config\__init__.py:440: in _importconftest
    return self._conftestpath2mod[conftestpath]
E   KeyError: local('\\\\SERVER\\programs\\python38\\lib\\site-packages\\numpy\\conftest.py')

During handling of the above exception, another exception occurred:
lib\site-packages\_pytest\config\__init__.py:446: in _importconftest
    mod = conftestpath.pyimport()
lib\site-packages\py\_path\local.py:721: in pyimport
    raise self.ImportMismatchError(modname, modfile, self)
E   py._path.local.LocalPath.ImportMismatchError: ('numpy.conftest', 'X:\\Python38\\lib\\site-packages\\numpy\\conftest.py', local('\\\\SERVER\\programs\\python38\\lib\\site-packages\\numpy\\conftest.py'))

During handling of the above exception, another exception occurred:
lib\site-packages\_pytest\runner.py:220: in from_call
    result = func()
lib\site-packages\_pytest\runner.py:247: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
lib\site-packages\_pytest\main.py:485: in collect
    yield from self._collect(arg)
lib\site-packages\_pytest\main.py:512: in _collect
    col = self._collectfile(pkginit, handle_dupes=False)
lib\site-packages\_pytest\main.py:581: in _collectfile
    ihook = self.gethookproxy(path)
lib\site-packages\_pytest\main.py:424: in gethookproxy
    my_conftestmodules = pm._getconftestmodules(fspath)
lib\site-packages\_pytest\config\__init__.py:420: in _getconftestmodules
    mod = self._importconftest(conftestpath)
lib\site-packages\_pytest\config\__init__.py:454: in _importconftest
    raise ConftestImportFailure(conftestpath, sys.exc_info())
E   _pytest.config.ConftestImportFailure: (local('\\\\SERVER\\programs\\python38\\lib\\site-packages\\numpy\\conftest.py'), (<class 'py._path.local.LocalPath.ImportMismatchError'>, ImportMismatchError('numpy.conftest', 'X:\\Python38\\lib\\site-packages\\numpy\\conftest.py', local('\\\\SERVER\\programs\\python38\\lib\\site-packages\\numpy\\conftest.py')), <traceback object at 0x000000001B0F6B00>))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1 error in 16.39s
```
msg350916 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-31 12:54
>    >>> sys.executable
>    'X:\\Python38\\python.exe'
>    >>> os.path.realpath(sys.executable)
>    '\\\\SERVER\\Programs\\Python38\\python.exe'

Unix Python resolves the executable path with repeated _Py_wreadlink calls. Windows Python should do something similar to ensure the consistency of sys.executable with realpath(sys.executable).

> different from mapped network and subst drives (AFAIU).

Mapped and subst drives are implemented as object-manager symlinks to file-system directories. For example, a subst drive "S:" might target a local directory such as r"\??\C:\Temp\Subst", and a mapped drive "M:" for an SMB share might target a path such as r"\Device\LanmanRedirector\;M:<logon session>\Server\Share\Temp\Mapped". 

The root directory of these drives does not behave like a real root directory unless the drive targets the root of a volume or UNC share, such as "\\??\\C:\\" or r"\Device\LanmanRedirector;M:<logon session>\Server\Share".

This means that in many cases it's possible to evaluate a relative symlink that traverses above the drive root via ".." components. Say we have a directory r"C:\Temp\Subst" that contains a relative symlink "foo_link" that targets r"..\foo". If we map "S:" to r"C:\Temp\Subst", then r"S:\foo_link" opens r"C:\Temp\foo". Similarly if we map r"\\localhost\C$\Temp\Subst" to "M:", then r"M:\foo_link" opens r"C:\Temp\foo".

In the above case, if we're using relpath() to compute the relative path to the "foo" target, I think we want relpath(realpath('C:/Temp/foo'), realpath('S:/')) to succeed as r"..\foo". I don't think we want it to fail as a cross-drive relative path.
msg350936 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-01 02:41
Is this an issue or a mismatched expectation?

Tests that assume realpath() on Windows is the equivalent of abspath() are of course going to fail when we fix realpath(), and that's kind of what this one looks like. Just because it doesn't have a direct Unix equivalent doesn't mean that any particular behavior is any better.

> Unix Python resolves the executable path with repeated _Py_wreadlink calls. Windows Python should do something similar to ensure the consistency of sys.executable with realpath(sys.executable).

I don't think this necessarily follows. There's nowhere in the documentation that says that sys.executable is even a valid path, let alone the final path.

> I think we want relpath(realpath('C:/Temp/foo'), realpath('S:/')) to succeed as r"..\foo". I don't think we want it to fail as a cross-drive relative path.

Cross-drive relative paths are fine though - they are just absolute paths :)

>  The documentation refers to symbolic links and junctions, which are different from mapped network and subst drives (AFAIU).

If we can easily tell the difference between directory junctions and mapped drives, given that they are both identical types of reparse points, then we can make readlink() only read directory junctions. I need a specific algorithm for telling the difference though, not just lists of examples of things that "should" work (without any rationale for why they ought to work).
msg350937 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-01 02:42
And thanks for reporting this, Christoph. Issue37834 (and some of the issues linked from there) is where we had most of the discussion about this change.
msg350946 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-09-01 11:16
>> Unix Python resolves the executable path with repeated _Py_wreadlink 
>> calls. Windows Python should do something similar to ensure the 
>> consistency of sys.executable with realpath(sys.executable).
>
> I don't think this necessarily follows. There's nowhere in the 
> documentation that says that sys.executable is even a valid path, 
> let alone the final path.

The reason is cross-platform parity for users who aren't language lawyers -- as long as it's not expensive and doesn't compromise reliability or safety. 

That said, resolving the real executable path is more of a practical concern in Unix. In Windows it's not generally useful since the loader does not resolve the real path of an executable. 

Unix Python also calls _Py_wrealpath on the script path, which I think is more relevant in Windows than the sys.executable case because it's at a higher level that we control. This allows running a script symlink at the command line (e.g. linked in a common bin directory in PATH) even if the script depends on modules in the real directory.

>> I think we want relpath(realpath('C:/Temp/foo'), realpath('S:/')) to 
>> succeed as r"..\foo". I don't think we want it to fail as a cross-
>> drive relative path.
>
> Cross-drive relative paths are fine though - they are just absolute 
> paths :)

relpath() fails if the target and start directories aren't on the same drive. Code that's creating a symlink in Windows has to handle this case by using an absolute symlink instead of a relative symlink, if that's what you mean. That's probably for the better. So I change my mind. Forcing scripts to create absolute symlinks is not an issue, even if it's unnecessary because the target and start directory can be resolved to the same drive. The mount point should take precedence. But that's an argument against using the final path. Mapped drives and subst drives will be resolved in the final path. Reverse mapping to the original drive, if possible, would be extra work.

For example, say we start with "\\??\\S:\\". The object manager reparses the r"\??\S:" SymbolicLink as r"\??\C:\Temp\Subst". Next it reparses r"\??\C:" to a device object, with a resolved path such as r"\Device\HarddiskVolume2\Temp\Subst". The Device object type has a parse routine that's implemented by the I/O manager. This sends an IRP_MJ_CREATE request to the mounted file-system device (NTFS in this case) with the remaining path to be parsed, e.g. r"\Temp\Subst". Note that at this stage, information about the original drive "S:" is long gone.

If the file system in turn finds a reparse point, such as a file-system symlink or mount point, then it stops there and returns STATUS_REPARSE with the contents of the reparse buffer. The I/O Manager itself handles symlink and mount-point reparsing, for which it implements behavior that's as close as possible to Unix symlinks and mount points. After setting up the new path to open, the I/O manager's parse routine returns STATUS_REPARSE to the object manager. Up to 63 reparse attempts are allowed, including within the object namespace itself. The limit of 63 reparse attempts is a simple way to handle reparse loops.

Assuming no file-system reparse points, we have as the final path r"\Device\HarddiskVolume2\Temp\Subst". To map this back to a DOS path, GetFinalPathNameByHandleW queries the mount-point manager for the canonical DOS device name for r"\Device\HarddiskVolume2". The mount-point manager knows about "C:" in this case, but it doesn't have a registry of subst drives. GetFinalPathNameByHandleW also doesn't enumerate DOS devices and map a path back to a matching subst drive. It supports only the canonical path. 

Reverse mapping a UNC path to a mapped drive would be even more difficult. We would have to doubly resolve, for example, from "M:" -> r"\Device\<redirector name>\;M:<logon session>\server\share\path\to\directory" -> r"\Device\Mup\;<redirector name>\;M:<logon session>\server\share\path\to\directory". Once we confirm that "M:" targets the MUP device, we can compare r"\server\share\path\to\directory" to check whether the final path contains this path. If so it can replace it with "M:". That's a lot of work to get a non-canonical path, and that's the simplest case. For example, we could have a subst drive for a mapped drive, and the shortest path would be the subst drive.

To avoid resolving drives altogether, realpath() would have to manually walk the path instead of relying on GetFinalPathNameByHandleW.

> If we can easily tell the difference between directory junctions and 
> mapped drives, given that they are both identical types of reparse 
> points

Mapped drives and subst drives are not file-system reparse points. They're "DOS" devices, which are implemented as SymbolicLink objects in the "\\??\\" device-alias directory. A SymbolicLink object can be read via NtQuerySymbolicLinkObject, or via WINAPI QueryDosDeviceW if the SymbolicLink is in "\\??\\".
msg351780 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-11 09:38
I'm closing this as not a bug.

It's a few steps deep, but DefineDosDeviceW() [1] specifies that it creates junctions, and while it's not necessarily obvious how to get from SUBST to that page, Wikipedia managed it [2]. And I don't think it's unreasonable to expect people to either think about this really shallowly ("realpath will find the real path") or really deeply ("let me research every aspect to find the true answer") and avoid over-specifying the behaviour in our own documentation.

"MS-DOS device names are stored as junctions in the object namespace. The code that converts an MS-DOS path into a corresponding path uses these junctions to map MS-DOS devices and drive letters. The DefineDosDevice function enables an application to modify the junctions used to implement the MS-DOS device namespace."

[1]: https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-definedosdevicew
[2]: https://en.wikipedia.org/wiki/SUBST
msg351879 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-09-11 13:44
> It's a few steps deep, but DefineDosDeviceW() [1] specifies that it 
> creates junctions, and while it's not necessarily obvious how to get
> from SUBST to that page, Wikipedia managed it [2]. 

Take care to not conflate device junctions with the file-system reparse point that we commonly call a junction (aka a "mount point"). The term "junction" is used generically in the DefineDosDeviceW docs as well as in "Defining an MS-DOS Device Name". It's not supporting your earlier statement: "If we can easily tell the difference between directory junctions and mapped drives, given that they are both identical types of reparse points". Junctions in the object namespace have nothing to do with file-system reparse points.

A junction is a joining between two things -- in this case between two names in the object namespace. We can explore the NT object namespace via Sysinternals WinObj. The Object Manager maintains this namespace as a nested tree of Directory objects that contain other named kernel objects (e.g. an I/O Manager "Device", a Configuration Manager "Key", a Memory Manager "Section", a Process Manager "Job"). It also implements a named SymbolicLink object type, which is the basis for device junctions. 

SymbolicLink objects get reparsed by the Object Manager. The target path is a new absolute path in the object namespace. The system calls for working with SymbolicLink objects are as follows:

    NtCreateSymbolicLinkObject - Create a SymbolicLink object
        with the provided name, attributes and target path,
        and return a handle for it.
    NtOpenSymbolicLinkObject - Return a handle for an existing
        SymbolicLink object.
    NtQuerySymbolicLinkObject - Get the target path of a 
        SymbolicLink object.
    NtMakeTemporaryObject - Make a SymbolicLink object 
        temporary, so that it's automatically unlinked from 
        its parent Directory when no longer referenced.

They can be created with any name in any object directory. But specifically for device junctions they get created in particular object directories (discussed below) and often with DOS drive names "A:" - "Z:". Of course, other names are also used such as "CON" -> "\Device\ConDrv\Console", "NUL" -> "\Device\Null", "PIPE" -> "\Device\NamedPipe", and "PhysicalDrive0" -> "\Device\Harddisk0\DR0".

The target path isn't limited to just an object in the object namespace. It can include a remaining path that's parsed by the object. For example, the target could be "\Device\HarddiskVolume2\Windows\System32", where the object is "\Device\HarddiskVolume2" and the remaining path is "\Windows\System32". (It could just as well target a file-system file such as "kernel32.dll" in that directory.) The drives created by subst.exe take advantage of this capability to link directly to file-system directories. But it's noteworthy that this is a weird sort of drive that causes bugs in some API functions such as GetVolumePathNameW, which assumes a DOS drive is a junction to a volume device, not a file-system directory.

Each logon session has a local object Directory for its device junctions (AKA "DOS devices"). It makes sense for local devices to be associated with a logon session because credentials for mapped drives are associated with the user's logon session. The local Directory is located at "\Sessions\0\DosDevices\<logon session ID>". It's in desktop session 0 (non-interactive services) because logon sessions aren't necessarily limited to a single desktop session. The local directory shadows the system global directory, "\Global??". Name lookup first checks the local directory and then the global one. The SYSTEM logon uses "\Global??" as its local directory, so defining a device junction in a SYSTEM context always creates a global junction. A user's local directory is typically used just for mapped and subst drives.

The local device directory for the current user is accessible as "\??\", which the Object Manager reserves for this case. So native code doesn't need to look up the logon-session ID and create the "\Session\0\DosDevices\<logon session ID>" path. Neither does the Object Manager itself because the local and global directories are cached per process and per logon session. The local directory also contains a "Global" SymbolicLink to the global directory. 

The equivalent of NT "\??\" in the Windows API is either "\\?\" (non-normalized path) or "\\.\" (normalized path). For example, we can access r"\\?\Global\Z:", which may not be the same device as "\\?\Z:".

DefineDosDeviceW sends an LPC request to the desktop session server, csrss.exe, in order to define a device junction. This request is handled by basesrv!BaseSrvDefineDosDevice. As necessary, BaseSrvDefineDosDevice impersonates the caller to ensure it creates a local junction in the right directory. 

BaseSrvDefineDosDevice either redefines or creates a new SymbolicLink object. If the device junction already exists, it tries to redefine the target. First it queries the existing SymbolicLink to read its target. This allows a trick that takes advantage of NT counted strings. The buffer is made large enough for the new target path and the old path, separated by a NUL. The string's length includes only the new target path, but its maximum length includes all previous target paths. Thus we can "push" a new mapping for a junction (e.g. drive "Z:") and "pop" it off to restore the previous mapping when it's no longer needed.
History
Date User Action Args
2022-04-11 14:59:19adminsetgithub: 82174
2019-09-11 13:44:44eryksunsetmessages: + msg351879
2019-09-11 09:38:49steve.dowersetstatus: open -> closed
resolution: not a bug
messages: + msg351780

stage: resolved
2019-09-01 11:16:18eryksunsetmessages: + msg350946
2019-09-01 02:42:40steve.dowersetmessages: + msg350937
2019-09-01 02:41:45steve.dowersetmessages: + msg350936
2019-08-31 12:54:25eryksunsetnosy: + eryksun
messages: + msg350916
2019-08-31 08:04:35cgohlkecreate