This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date 2019-08-13.19:46:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1565725567.01.0.694928000585.issue37834@roundup.psfhosted.org>
In-reply-to
Content
> I feel like that's more work than is worth us doing for something that 
> will be relatively rarely used, will live in the stdlib, and is 
> obviously something that will become outdated as Microsoft adds new 
> reparse points.

Junctions (NT 5) and symlinks (NT 6) are stable. So if os.read_reparse_point only returns the unparsed bytes, maybe add os.read_junction as well. I know other projects overload this on POSIX readlink. They're both name-surrogate reparse points, but they have different constraints and behavior.

The I/O manager tries to make a junction behave something like a hard link to a directory, with the addition of being able to link across local volumes. This is in turn relates to how it evaluates relative symbolic links. For example, if "C:/Junction" and "C:/Symlink" both target r"\\?\C:\Temp1\Temp2", and there's a relative symlink "C:/Temp1/Temp2/foo_link" that targets r"..\foo", then "C:/Junction/foo_link" references "C:/foo" but "C:/Symlink/foo_link" references "C:/Temp1/foo".
 
Another difference is with remote filesystems. SMB special cases symlinks to have the server send the reparse request over the wire to be evaluated on the client side. (Refer to [MS-SMB2] 2.2.2.1 Symbolic Link Error Response, and the subsequent section about client-side handling of this error.) So an absolute symlink on the server that targets r"\\?\C:\Windows" actually references the client's "C:/Windows" directory, whereas the same junction target would reference the server's "C:/Windows" directory. The symlink evaluation will succeed only if the client's R2L (remote to local) policy allows it. Symlinks can also target remote devices, depending on the L2R and R2R policy settings. Junctions are restricted to local devices.

> In theory, we can't follow any reparse point that isn't documented as 
> being followable and provides the target name is a stable, documented
> manner. 

To follow a reparse point, we're just calling CreateFileW the normal way, without FILE_FLAG_OPEN_REPARSE_POINT. The Windows API also does this (usually via NtOpenFile, but this has a similar  FILE_OPEN_REPARSE_POINT option) for tags it doesn't handle. That's why MoveFileExW (os.rename and os.replace) fails on one of these app-exec links. In some cases, it adds a third open attempt if the reparse point isn't handled. This is important for DeleteFileW (os.remove) and RemoveDirectoryW (os.rmdir) because we should be able to delete a bad reparse point.

> The appexec links don't do this (I just looked at the returned 
> buffer), so we really should just not follow them. They exist solely 
> so that CreateProcess internally returns a unique error code that can 
> be handled without impacting regular process start, which means we 
> *don't* want to follow them.

I know, so a regular stat() will fail. I think for an honest result, stat() should fail for a reparse point that can't be handled. Scripts can use stat(path, follow_nonlinks=False) or stat(path, follow_reparse_points=False), or however this eventually gets parameterized to force opening all reparse points.

> Now, directory junctions are far more interesting. My gut feel is that 
> we should treat them the same as symlinks (with respect to stat vs. 
> lstat) for consistency

Junctions are their own thing. They're mount points that behave like Unix volume mounts (in Windows, target the root directory of a volume device named by its "Volume{...}" name) or Unix bind mounts (in Windows, target arbitrary directories on any local volume; in Linux it's a mount created with --bind or FUSE bindfs). Bind-like junctions are also similar to DOS subst drives (e.g. "W:" -> "C:/Windows") and UNC shares. These are all mount points of one sort or another. 

OTOH, the base device names such as "//?/C:" and "//?/Volume{...}", without a specified root directory, are aliases (object symlinks) for an NT device such as r"\Device\HarddiskVolume2". These paths open the volume itself, not the mounted filesystem, so they're not like Unix mount points. They're like Unix '/dev/sda1' device paths, except in Unix, devices don't have their own namespaces, so it would be nonsense to open "/dev/sda1/".

RemoveDirectoryW for a volume mount is special cased to call DeleteVolumeMountPointW, which notifies the mount-point manager. It won't do this for a junction that targets the same volume root directory via the DOS drive-letter name -- or any other device alias for that matter (e.g. Windows 10 creates "\\?\BootPartition" as an alternative named for the system "C:" drive). So bind-like mounts are different from volume mounts, but both are different from symlinks.
History
Date User Action Args
2019-08-13 19:46:07eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower
2019-08-13 19:46:07eryksunsetmessageid: <1565725567.01.0.694928000585.issue37834@roundup.psfhosted.org>
2019-08-13 19:46:06eryksunlinkissue37834 messages
2019-08-13 19:46:06eryksuncreate