This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients eryksun, paul.moore, simon mackenzie, steve.dower, tim.golden, zach.ware
Date 2021-01-20.00:24:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1611102269.71.0.187270957855.issue42957@roundup.psfhosted.org>
In-reply-to
Content
os.readlink() was generalized to support mountpoints (junctions) as well as symlinks, and it's more common for mountpoints to lack the print name field in the reparse data buffer [1]. For example, PowerShell's new-item creates junctions that only have a substitute path. This is allowed since the filesystem protocols only require that the substitute path is valid in the reparse data buffer, since that's all that the system actually uses when resolving the reparse point.

The substitute path in the reparse point is a \\?\ prefixed path (actually a \??\ NT path, but they're effectively the same for our purposes). This type of path is usually called an extended path -- or an extended device path, or verbatim path. It's a device path, like the \\.\ prefix, except that (1) it's verbatim (i.e. not normalized), (2) its length is never limited to MAX_PATH (260) characters, and (3) the Windows file API supports the \\?\ prefix more broadly than the \\.\ prefix.

You're right that some programs can't grok an extended path. Some can't even handle any type of UNC path. I agree that we need a simple way to remove the prefix. I just don't agree that removing it should be the default behavior in nt.readlink(), which I prefer to keep efficient and free of race conditions.

os.path.realpath() isn't necessarily what you want since it resolves the final path. The link may target a path that traverses any number of reparse points and mapped drives, so the final path may be completely different from the os.readlink() result. We simply need an option to remove the \\?\ or \\?\UNC prefix, either always or only when the path doesn't require it. It could be implemented by a high-level wrapper function in os.py.

---
Reasons the prefix may be required

If the length of the target path exceeds MAX_PATH, then removing the prefix may render the path inaccessible if the current process doesn't support long paths without it (e.g. Windows 10 without long paths enabled at the system level, or any version prior to Windows 10).

Also, reserved DOS device names are only accessible using an extended path. Say I have the following "spam" junction:

    >>> print(os.readlink('spam'))
    \\?\C:\Temp\con

The junction allows accessing the target directory normally:

    >>> stat.S_ISDIR(os.stat('spam').st_mode)
    True

But look what happens when I try to access the target path without the prefix:

    >>> stat.S_ISDIR(os.stat(r'C:\Temp\con').st_mode)
    False
    >>> stat.S_ISCHR(os.stat(r'C:\Temp\con').st_mode)
    True

Instead of the directory that one might expect, it's actually a character device!? Let's see what Windows opens:

    >>> print(os.path.abspath(r'C:\Temp\con'))
    \\.\con

It opens the "CON" device, for console I/O. It turns out that a bunch of names are reserved, including NUL, CON, CONIN$, CONOUT$, AUX, PRN, COM<1-9>, and LPT<1-9>. They're reserved even with an extension introduced by a colon or dot, preceded by zero or more spaces. For example:

    >>> print(os.path.abspath(r'C:\Temp\con :whatever'))
    \\.\con

Directly accessing such a name in the filesystem requires a verbatim path. For example:

    >>> stat.S_ISDIR(os.stat(r'\\?\C:\Temp\con').st_mode)
    True

Using reserved names is cautioned against, but in the real world we have to be defensive. We can't simply remove the prefix and hope for the best.

---

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_reparse_data_buffer
History
Date User Action Args
2021-01-20 00:24:29eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower, simon mackenzie
2021-01-20 00:24:29eryksunsetmessageid: <1611102269.71.0.187270957855.issue42957@roundup.psfhosted.org>
2021-01-20 00:24:29eryksunlinkissue42957 messages
2021-01-20 00:24:28eryksuncreate