This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients eryksun, paul.moore, steve.dower, tim.golden, vidartf, zach.ware
Date 2019-08-07.03:45:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1565149546.5.0.248033078744.issue31226@roundup.psfhosted.org>
In-reply-to
Content
> Junctions are sometimes used as links (e.g. mklink /j) and sometimes 
> as volume mount points (e.g. mountvol.exe).

That people sometimes use junctions as if they're symlinks doesn't mean that we should pretend it's true. The reparse tag is IO_REPARSE_TAG_MOUNT_POINT, and they behave like mountpoints (volume mounts and bind mounts). See Junctions vs. Symlinks, below.

It's potentially problematic to conflate junctions with symlinks. For example, a user who opts to use a junction instead of symlink may be denied the symlink privilege, so code that copies a junction as if it's a symlink will fail (e.g. move if os.rename fails, or copyfile with follow_symlinks=False, or copytree with symlinks=True), unless we add magical fallback code in os.symlink() to create junctions when the link target is a directory. Even if creating a symlink succeeds, a symlink has different behavior from that of a junction, which could lead to problems later on.

That said, always traversing directory mountpoints as if they're just plain directories, like what Unix does, is not the norm in Windows. In some contexts, they're basically handled as symlinks -- in particular for a recursive delete. CMD's `rmdir /s`, and PowerShell's `remove-item -recurse -force`, and Explorer's folder deletion all remove junctions without traversing them, regardless of whether the target is a regular DOS path or a volume GUID name. For example, if we step through the disassembled code of `rmdir /s` in CMD (i.e. cmd!RmDirSlashS), we observe that it looks for the name-surrogate bit in the reparse tag to determine whether it should call RemoveDirectoryW on a reparse-point directory instead of traversing it.

I would prefer to copy this behavior. It's safer, since standard users can create junctions to DOS paths and volume GUID names in Windows, unlike POSIX in which only the super user has the power to create mountpoints. While Windows mountvol.exe requires administrator access in order to update the mountpoint manager, CMD's `mklink /j` doesn't require elevated access, and neither does PowerShell's `new-item -itemtype junction`, even if the target is a volume GUID name.

Maybe for Windows we can have a name-surrogate category based on the reparse tag's name-surrogate bit (i.e. bit 29, "the file or directory represents another named entity in the system"), as identified by the WINAPI macro IsReparseTagNameSurrogate (winnt.h). The surrogate type would be a superset of the symlink type and would be allowed to be a directory. Nothing would change with regard to symlinks proper, however. It would remain the case that only IO_REPARSE_TAG_SYMLINK reparse points would be classified as symlinks by stat(), islink(), readlink(), etc. In POSIX systems, the only surrogate file type would be the symlink type, which is never a directory.

A keyword-only option surrogates_as_links=False could be added to stat() and lstat(). In POSIX, surrogates_as_links would be ignored. Given both follow_symlinks=False and surrogates_as_links=True, stat() would be able to return the reparse tag for any name-surrogate reparse point. The tag value could be added to _Py_stat_struct as st_reparse_tag, and the stat result tuple would be similarly extended. This field would be non-zero when querying any name-surrogate reparse point that's not followed. 

os.lstat(path, surrogates_as_links=True) could be the basis for os.path.issurrogate(). Or maybe we could add a more targeted function that calls CreateFileW and GetFileInformationByHandleEx: FileAttributeTagInfo, or FindFirstFileW. The scandir DirEntry result could implement an is_surrogate() method based on the reparse tag that's returned by FindFirstFileW.

For _rmtree_unsafe, we could simply insert a test at the start to avoiding listing surrogate directories. For example:

    if os.path.issurrogate(path):
        entries = []
    else:
        with os.scandir(path) as scandir_it:
            entries = list(scandir_it)

We could also add an allow_directory_surrogates=False keyword-only option to os.remove, which would be ignored in POSIX just as the symlink() target_is_directory option is ignored in POSIX. By default calling os.remove on a non-symlink directory would fail, as one expects it should. 

Adding an option to remove a directory via os.remove isn't strictly consistent with POSIX, but os.remove was already modified in issue 18314 to always remove all junctions, so the behavior is already inconsistent. We'd be clearly specifying and documenting how it works, and hopefully the new requirement to pass the keyword option wouldn't be too disruptive for programs that have relied on the undocumented behavior.

---
Junctions vs. Symlinks

Junctions and symlinks have different constraints and behavior. Junctions can only target local devices, and when accessed remotely by a client they're evaluated remotely on the server (e.g. if a client accesses a junction to "C:\Temp" on a server, the target is the system drive on the server). 

Symlinks are always evaluated on the client side, i.e. the redirector sends the reparse request over the wire to the client. The evaluation of local and remote symlinks is set by policies on the client system. A local symlink may be allowed to target either a local device or a remote device. A remote symlink may be allowed to target either a remote device or a local device on the client (e.g. a symlink to "C:\Temp" on the server targets the system drive on the client). The policies that govern this are SymlinkLocalToLocalEvaluation (default enabled), SymlinkLocalToRemoteEvaluation (default disabled?), SymlinkRemoteToLocalEvaluation (default disabled), and SymlinkRemoteToRemoteEvaluation (default disabled). You might see these abbreviated as L2L, L2R, R2L, and R2R. 

Junction targets must be fully qualified, but symlinks can target relative paths. How relative symlinks interact with junctions vs symlinks demonstrates that junctions are intentionally designed to behave as mountpoints. 

For example, given "C:\test1\test2\foo_link" is a link to "..\foo", if we have a directory symlink "C:\symlink" that targets "C:\test1\test2", then "C:\symlink\foo_link" refers to "C:\test1\foo". In contrast, relative symlinks traverse a junction as a namespace grafting. So if we have a junction "C:\junction" that targets "C:\test1\test2" (the same target as the symlink), then "C:\junction\foo_link" refers to "C:\foo". 

If we set up a similar scenario in Linux using either a kernel bind mount or FUSE bindfs mount, we'll observe the same behavior. The bind mount is a name grafting in the virtual filesystem, whereas a symlink simply resolves to the target path.

---
Mountpoints

It seems to me that handling all junctions as mountpoints is more consistent with how we handle DOS and UNC drives as mountpoints even when they're not volume mountpoints. For example, we can map a directory such as "C:\Users\Public" to drive "P:" or share it as "\\Server\Public". These are similar to Unix bind mounts, but in the case of DOS and UNC drives the namespace grafting is internal to the system, either as junctions in the system object namespace (e.g. "\Sessions\0\DosDevices\<Logon ID>\P:" -> "\Device\HarddiskVolume2\Users\Public") or as mappings in the UNC provider-share namespace (e.g. SMB shares, WebDAV shares, VirtualBox folder shares, and so on, all grafted under "\Device\Mup"). What's different about junction mountpoints is that they're not grafted as a root directory, whereas the syntax for DOS and UNC drives in Windows mandates that they're always the top-level root, i.e. we can't use ".." to traverse to a parent directory.

Given this broader definition of a mountpoint, os.path.ismount would no longer call _getvolumepathname. It would still return true for DOS and UNC drive root directories. Otherwise it would simply check whether the path is a junction (i.e. IO_REPARSE_TAG_MOUNT_POINT).
History
Date User Action Args
2019-08-07 03:45:46eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower, vidartf
2019-08-07 03:45:46eryksunsetmessageid: <1565149546.5.0.248033078744.issue31226@roundup.psfhosted.org>
2019-08-07 03:45:46eryksunlinkissue31226 messages
2019-08-07 03:45:45eryksuncreate