Author eryksun
Recipients eryksun, jaraco, paul.moore, steve.dower, tim.golden, zach.ware
Date 2020-05-17.06:25:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1589696719.74.0.749108410619.issue40654@roundup.psfhosted.org>
In-reply-to
Content
Copying a symlink verbatim requires copying both the print name and the substitute name in the reparse buffer [1]. For a file, CopyFileExW: COPY_FILE_COPY_SYMLINK implements this by enabling the symlink privilege for the thread and copying the reparse point via FSCTL_GET_REPARSE_POINT and FSCTL_SET_REPARSE_POINT. For a directory, CreateDirectoryExW is implemented similarly when lpTemplateDirectory is a symlink or mount point. (For a "\\\\?\\Volume{GUID}\\" volume mountpoint as opposed to a bind mountpoint, CreateDirectoryExW punts to SetVolumeMountPointW, which also updates the system mountpoint manager.)

If you can only have one or the other, the substitute name is more reliable according to the wording in [MS-FSCC] [2]. 

symlinks:

    A symbolic link has a substitute name and a print name associated 
    with it. The substitute name is a pathname (section 2.1.5) 
    identifying the target of the symbolic link. The print name SHOULD 
    be an informative pathname, suitable for display to a user, that 
    also identifies the target of the symbolic link. Either pathname 
    can contain dot directory names as specified in section 2.1.5.1. 

mount points (junctions):

    A mount point has a substitute name and a print name associated 
    with it. The substitute name is a pathname (section 2.1.5) 
    identifying the target of the mount point. The print name SHOULD 
    be an informative pathname (section 2.1.5), suitable for display 
    to a user, that also identifies the target of the mount point. 
    Neither of these pathnames can contain dot directory names.

The operative weasel word is "should", instead of a reliable "must" (RFC2119). 

An example of the power of "should" is that PowerShell doesn't even set a print name when it creates a mount point via `New-Item -ItemType Junction`. I don't agree that nt.readlink should read junctions, but it does, so the potentially missing print name is a problem. If it were just symlinks created by CreateSymbolicLinkW, the print name is reliable because we know that it sets the print name to whatever was passed as lpTargetFileName. I suppose nt.readlink could fall back on using the substitute name if there's no print name.

Also, if nt.readlink is used to manually resolve a broken path (e.g. ntpath._readlink_deep), and the process doesn't have long paths enabled, then the "\\?\" extended path from the substitute name is more reliable. (But one could also call _getfullpathname on the print name and convert the result to extended form if it's not already an extended path.)

If you search around, you'll find some projects using the print name and some using the substitute name to implement POSIX readlink, but using the print name is more popular.

Do you want 3.8 to revert to using the print name, at least for symlinks? (ntpath._readlink_deep would need to be modified to support long target paths.) Or would you rather that shutil used a more reliable way to copy symlinks verbatim on Windows? For example, use CopyFileExW for a file and CreateDirectoryEx for a directory.

[1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_reparse_data_buffer
[2]: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/b41f1cbf-10df-4a47-98d4-1c52a833d913
History
Date User Action Args
2020-05-17 06:25:19eryksunsetrecipients: + eryksun, paul.moore, jaraco, tim.golden, zach.ware, steve.dower
2020-05-17 06:25:19eryksunsetmessageid: <1589696719.74.0.749108410619.issue40654@roundup.psfhosted.org>
2020-05-17 06:25:19eryksunlinkissue40654 messages
2020-05-17 06:25:19eryksuncreate