Message349888
> Where "links" are the generic term for the set that includes
> "reparse point", "symlink", "mount point", "junction", etc.)
Why group all reparse points under the banner of 'link'? If we have a typical HSM reparse point (the vast majority of allocated tags), then all operations, such as delete and rename, act on the file itself, not simply the reparse point. We should be able to delete or rename a link without affecting the target.
In this case, there's also no chance that the reparse point is a surrogate for another path on the system, so code that walks paths doesn't have to worry about loops with regard to these reparse points. The only practical use case I can think of for detecting/opening this type of reparse point is backup software that should avoid triggering an HSM recall. For example:
https://www.ibm.com/support/knowledgecenter/en/SSEQVQ_8.1.0/client/r_opt_hsmreparsetag.html
As I've previously suggested (and this is the last time because I'm becoming a broken record), lstat() should at least be restricted to opening only name-surrogate reparse points that are supposed to be like links in that they target another path in the system. Plus it also has to open unhandled reparse points.
Personally, I'm only comfortable with opening it up to name surrogates if islink() and readlink() are still limited to just Unix-like symlinks that we can create via symlink(). Nothing changes there. It's just a restriction of how lstat() currently works. The addition of the reparse tag in the stat result enables special handling of non-symlink surrogates.
> shutil.copytree(path): Unchanged. (requires a minor fix to
> continue to recursively copy through junctions (using above test),
> but not symlinks.)
Everyone else who relies on islink(), readlink(), and symlink() to copy symlinks isn't special casing their code to look for junctions or anything else we lump under the banner of islink(). They could code defensively if readlink() fails for a 'link' that we can't read. But that leaves the problem of readlink() succeeding for a junction. That can causes problems if the target is passed to os.symlink(), which changes the link from a hard name grafting to a soft name grafting.
Why would we need to read the target of a junction? It's not needed for realpath() in Windows. We should only have to resolve symlinks. For example:
C:/Mount/junction/spam/eggs
junction -> Z:/bar/baz
We don't have to resolve this as "Z:/bar/baz/spam/eggs", and doing so may even be wrong for someone using it to manually resolve a relative symlink. "C:/Mount/junction/spam/eggs" is a solid path. In Unix it would not be resolved by realpath(). A solid path is needed to figure out how to create a relative symlink, or how to manually resolve one for a given path.
For example, if "foo_link" in "C:/Mount/junction/spam/eggs" targets "../../../foo", this refers to "C:/Mount/foo". On the other hand, if the junction mount point were replaced by a soft symlink, then "C:/Mount/symlink/spam/eggs" is not a solid path. "foo_link" is instead evaluated over the target path: "Z:/bar/baz/spam/eggs/foo_link", so the link resolves to "Z:/bar/foo".
IMO, S_IFLNK need not be set for anything other than Unix-like symbolic links. We would just need to document that on Windows, lstat opens any link-like reparse point that indicates it targets another path on the system, plus any reparse point that's not handled, but that islink() is only true for actual Unix symlinks that can be created via os.symlink() and read via os.readlink().
This preserves how islink() and readlink() currently work, while still leaving the door open to fix misbehavior in particular cases. Code, including our own code, that needs to look for the broader Windows category of "name surrogate" can examine the reparse tag. For convenience we can provide issurrogate() that checks lstat(filename).st_reparse_tag & 0x2000_0000. This can be true for directories. Also, a surrogate doesn't have to behave like a Unix "soft" symlink, i.e. it applies to "hard" mount points. In Unix, issurrogate() could just be an alias for islink() since Unix provides only one type of name surrogate.
Currently the name surrogate category includes the following tags:
Microsoft name surrogate (bits 31 and 29)
IO_REPARSE_TAG_MOUNT_POINT 0xA0000003
IO_REPARSE_TAG_SYMLINK 0xA000000C
IO_REPARSE_TAG_IIS_CACHE 0xA0000010
IO_REPARSE_TAG_GLOBAL_REPARSE 0xA0000019
IO_REPARSE_TAG_LX_SYMLINK 0xA000001D
IO_REPARSE_TAG_WCI_TOMBSTONE 0xA000001F
IO_REPARSE_TAG_PROJFS_TOMBSTONE 0xA0000022
Non-Microsoft name surrogate (bit 29)
IO_REPARSE_TAG_SOLUTIONSOFT 0x2000000D
IO_REPARSE_TAG_OSR_SAMPLE 0x20000017
IO_REPARSE_TAG_QI_TECH_HSM 0x2000002F
IO_REPARSE_TAG_MAXISCALE_HSM 0x20000035
IO_REPARSE_TAG_ALERTBOOT 0x2000004C
IO_REPARSE_TAG_NVIDIA_UNIONFS 0x20000054
IO_REPARSE_TAG_OSR_SAMPLE is used by OSR sample code in their Windows driver curriculum, so that one is unlikely to be seen in practice. I don't know anything about the other non-Microsoft tags. NVidia's UnionFS looks interesting. Using reparse points to merge file systems is probably not the most efficient way to handle that problem, but I'm sure the devil is in the details there.
> os.unlink(path): unchanged (still removes the junction, not the
> contents)
Whatever we're calling a link should be capable of being deleted via os.unlink. If we apply S_IFLNK, then it won't have S_IFDIR (at least how POSIX code expects it), and unlink should work on it. The current state of affairs in which unlink/remove works on a junction, which is reported by stat() as a directory, is inconsistent. It's not specified to remove directories, so nothing that it can remove should be a directory.
> shutil.rmtree(path): Will now remove a junction rather than
> recursively deleting its contents (net improvement, IMHO)
I'd like for it to remove all name-surrogate directories like CMD's `rmdir /s` does. In contrast, Unix shutil.rmtree traverses into a mount point, deletes everything, and then fails because the directory is mounted and can't be removed. That's hideous, IMO. |
|
Date |
User |
Action |
Args |
2019-08-16 22:14:04 | eryksun | set | recipients:
+ eryksun, paul.moore, tim.golden, zach.ware, steve.dower |
2019-08-16 22:14:04 | eryksun | set | messageid: <1565993644.53.0.0320037706152.issue37834@roundup.psfhosted.org> |
2019-08-16 22:14:04 | eryksun | link | issue37834 messages |
2019-08-16 22:14:03 | eryksun | create | |
|