Message 349839 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eryksun
Recipients	eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date	2019-08-16.00:47:14
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1565916435.22.0.601927911849.issue37834@roundup.psfhosted.org>
In-reply-to

Content
> # Always make the OS resolve "unknown" reparse points > ALLOWED_TO_TRAVERSE = {SYMLINK, MOUNT_POINT} > if !traverse and st.reparse_tag not in ALLOWED_TO_TRAVERSE: > return xstat(path, !traverse) To me the naming here makes sense as ALLOWED_TO_OPEN -- as in if traverse is false, meaning we opened the tag, but the tag is not in ALLOWED_TO_OPEN, then we have to reopen in order to traverse it. But then, on the second pass after ERROR_CANT_ACCESS_FILE, traverse is false, so we would also have to special case tags that should never be traversed in NOT_ALLOWED_TO_TRAVERSE = {APPEXECLINK, ...}. I mentioned this in a GitHub comment, and suggested maybe adding another internal-only parameter to check to avoid having to ever special case the app-exec-link tag. For example, we could have an explicit "open_reparse_point" parameter. It would take precedence over traverse (i.e. follow_symlinks). For now, it could be internal only. I assumed stat() would return the reparse point for all tags that fail to reparse with ERROR_CANT_ACCESS_FILE since it's not an invalid reparse point, just an unhandled one that has no other significance at the file system level. To stat(), it can never be anything other than a reparse point. Whatever relevance it has depends on some other context (e.g. a CreateProcessW call). > And the open question is just whether MOUNT_POINT should be in that > set near the end. I believe it should, since the alternative is to > force all Python developers to write special Windows-only code to > handle directory junctions. Python provides no cross-platform tooling to manipulate mount points or read what they target (e.g. something like "/dev/sda1" in Unix), so that doesn't bother me per se. A mount point is just a directory in the POSIX mindset. That doesn't mean, however, that I wouldn't like the ability to detect "name surrogate" reparse points in general to implement safer behavior for shutil.rmtree and anything else that walks a directory tree. Windows shells don't follow name surrogates (including mount points) when deleting a tree, such as `rmdir /s`. Unix `rm -rf` does follow mount points. (A process would need root access to unmout a directory anyway.) The author's of shutil.rmtree have a Unix perspective. For Windows, I'd like to change that perspective. When in Rome... If the only way to get this is to special case mount-point or name-surrogate reparse points as applicable to "follow_symlinks", then I suggest that this should be clearly documented and that we not go so far as to pretend that they're symlinks via S_IFLNK, islink, and readlink. Continue reporting mount points as directories (S_IFDIR). Continue with only supporting actual symbolic links for the triple: islink(), readlink(), and symlink(). In this case, we can copy symlinks and be certain the semantics remain the same since we're not changing the type of reparse point.

>     # Always make the OS resolve "unknown" reparse points
>    ALLOWED_TO_TRAVERSE = {SYMLINK, MOUNT_POINT}
>    if !traverse and st.reparse_tag not in ALLOWED_TO_TRAVERSE:
>        return xstat(path, !traverse)

To me the naming here makes sense as ALLOWED_TO_OPEN -- as in if traverse is false, meaning we opened the tag, but the tag is not in ALLOWED_TO_OPEN, then we have to reopen in order to traverse it. But then, on the second pass after ERROR_CANT_ACCESS_FILE, traverse is false, so we would also have to special case tags that should never be traversed in NOT_ALLOWED_TO_TRAVERSE = {APPEXECLINK, ...}.

I mentioned this in a GitHub comment, and suggested maybe adding another internal-only parameter to check to avoid having to ever special case the app-exec-link tag. For example, we could have an explicit "open_reparse_point" parameter. It would take precedence over traverse (i.e. follow_symlinks). For now, it could be internal only.

I assumed stat() would return the reparse point for all tags that fail to reparse with ERROR_CANT_ACCESS_FILE since it's not an invalid reparse point, just an unhandled one that has no other significance at the file system level. To stat(), it can never be anything other than a reparse point. Whatever relevance it has depends on some other context (e.g. a CreateProcessW call).

> And the open question is just whether MOUNT_POINT should be in that 
> set near the end. I believe it should, since the alternative is to 
> force all Python developers to write special Windows-only code to
> handle directory junctions.

Python provides no cross-platform tooling to manipulate mount points or read what they target (e.g. something like "/dev/sda1" in Unix), so that doesn't bother me per se. A mount point is just a directory in the POSIX mindset. 

That doesn't mean, however, that I wouldn't like the ability to detect "name surrogate" reparse points in general to implement safer behavior for shutil.rmtree and anything else that walks a directory tree. Windows shells don't follow name surrogates (including mount points) when deleting a tree, such as `rmdir /s`. Unix `rm -rf` does follow mount points. (A process would need root access to unmout a directory anyway.) The author's of shutil.rmtree have a Unix perspective. For Windows, I'd like to change that perspective. When in Rome...

If the only way to get this is to special case mount-point or name-surrogate reparse points as applicable to "follow_symlinks", then I suggest that this should be clearly documented and that we not go so far as to pretend that they're symlinks via S_IFLNK, islink, and readlink. Continue reporting mount points as directories (S_IFDIR). Continue with only supporting actual symbolic links for the triple: islink(), readlink(), and symlink(). In this case, we can copy symlinks and be certain the semantics remain the same since we're not changing the type of reparse point.

History
Date	User	Action	Args
2019-08-16 00:47:15	eryksun	set	recipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower
2019-08-16 00:47:15	eryksun	set	messageid: <1565916435.22.0.601927911849.issue37834@roundup.psfhosted.org>
2019-08-16 00:47:15	eryksun	link	issue37834 messages
2019-08-16 00:47:14	eryksun	create