Message 349890 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steve.dower
Recipients	eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date	2019-08-16.23:59:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1565999994.64.0.603744678.issue37834@roundup.psfhosted.org>
In-reply-to

Content
> Why group all reparse points under the banner of 'link'? Because for the purposes of the list of changes beneath it, there wasn't any difference (e.g. "traverses any links supported by the OS" is more meaningful to most people, even though both of us would understand it to mean "traverses any traversable reparse points supported by the OS"). I'm not redefining them all to be the same thing, just establishing the terminology for what immediately followed. > As I've previously suggested (and this is the last time because I'm > becoming a broken record), lstat() should at least be restricted to > opening only name-surrogate reparse points that are supposed to be like > links in that they target another path in the system. Plus it also has > to open unhandled reparse points. Apologies for causing the repetition. Let me summarise what I believe you're suggesting as an alternate flow (bearing in mind that only os.stat() and os.lstat() are affected here): os.lstat: * open without following reparse points * check the reparse tag - if it's a genuine symlink, return attributes of the link itself and marked ST_IFLNK - if it's a directory junction, call os.stat instead and return that (???) - if it's any name-surrogate reparse point, return attributes of the link itself but not marked ST_IFLNK - if it's any other reparse point, call os.stat instead and return that * otherwise regular handling (using hFile or FindFirstFile, etc.) os.stat: * open following reparse points * if the open fails with ERROR_CANT_ACCESS_FILE, try opening without following reparse points: - if it's a genuine symlink, ??? - if it's a directory junction, ??? - if it's any name-surrogate reparse point, ??? - if it's any other reparse point, return attributes of the link itself * otherwise regular handling If you can fill in the gaps, that will help me understand exactly what you're proposing. >> shutil.copytree(path): Unchanged. (requires a minor fix to >> continue to recursively copy through junctions (using above test), >> but not symlinks.) > > Everyone else who relies on islink(), readlink(), and symlink() to copy > symlinks isn't special casing their code to look for junctions or > anything else we lump under the banner of islink(). They could code > defensively if readlink() fails for a 'link' that we can't read. But > that leaves the problem of readlink() succeeding for a junction. That > can causes problems if the target is passed to os.symlink(), which > changes the link from a hard name grafting to a soft name grafting. Right, but is that because they deliberately want the junction to be treated like a file? Or because they want it to be treated like the directory is really right there? os.rmdir() already does special things to behave like a junction rather than the real directory, and the islink/readlink/symlink process is going to be problematic on Windows since most users can't create symlinks. That code simply isn't going to be portable. But code that is using stat() and expecting to get the real directory ought to work, just as code using lstat() and expecting to get the link if it's been linked somehow ought to work. > Why would we need to read the target of a junction? It's not needed for > realpath() in Windows. We should only have to resolve symlinks. For > example: > > ... > > IMO, S_IFLNK need not be set for anything other than Unix-like symbolic > links. We would just need to document that on Windows, lstat opens any > link-like reparse point that indicates it targets another path on the > system, plus any reparse point that's not handled, but that islink() is > only true for actual Unix symlinks that can be created via os.symlink() > and read via os.readlink(). I think I understand your reasoning here now, sorry for it taking so long. > This preserves how islink() and readlink() currently work, while still > leaving the door open to fix misbehavior in particular cases. Code, > including our own code, that needs to look for the broader Windows > category of "name surrogate" can examine the reparse tag. For > convenience we can provide issurrogate() that checks > lstat(filename).st_reparse_tag & 0x2000_0000. I'm not adding new API, even for internal use. This is edge case enough that os.lstat() is fine for it. >> os.unlink(path): unchanged (still removes the junction, not the >> contents) > > Whatever we're calling a link should be capable of being deleted via > os.unlink. If we apply S_IFLNK, then it won't have S_IFDIR (at least how > POSIX code expects it), and unlink should work on it. The current state > of affairs in which unlink/remove works on a junction, which is reported > by stat() as a directory, is inconsistent. It's not specified to remove > directories, so nothing that it can remove should be a directory. I'm proposing to fix the inconsistency by fixing the flags. Your proposal is to fix the inconsistency by generating a new error in unlink()? (Just clarifying.) >> shutil.rmtree(path): Will now remove a junction rather than >> recursively deleting its contents (net improvement, IMHO) > > I'd like for it to remove all name-surrogate directories like CMD's > `rmdir /s` does. In contrast, Unix shutil.rmtree traverses into a mount > point, deletes everything, and then fails because the directory is > mounted and can't be removed. That's hideous, IMO. Currently Windows shutil.rmtree traverses into junctions and deletes everything, though it then succeeds to delete the junction. With my change, rmtree() directly on a junction now raises (could be fixed?) but rmtree on a directory containing a junction will remove the junction without touching the target directory. So I think we're both happy about this one.

> Why group all reparse points under the banner of 'link'?

Because for the purposes of the list of changes beneath it, there wasn't any difference (e.g. "traverses any links supported by the OS" is more meaningful to most people, even though both of us would understand it to mean "traverses any traversable reparse points supported by the OS"). I'm not redefining them all to be the same thing, just establishing the terminology for what immediately followed.

> As I've previously suggested (and this is the last time because I'm
> becoming a broken record), lstat() should at least be restricted to
> opening only name-surrogate reparse points that are supposed to be like
> links in that they target another path in the system. Plus it also has
> to open unhandled reparse points.

Apologies for causing the repetition. Let me summarise what I believe you're suggesting as an alternate flow (bearing in mind that only os.stat() and os.lstat() are affected here):

os.lstat:
* open without following reparse points
* check the reparse tag
 - if it's a genuine symlink, return attributes of the link itself and marked ST_IFLNK
 - if it's a directory junction, call os.stat instead and return that (???)
 - if it's any name-surrogate reparse point, return attributes of the link itself but not marked ST_IFLNK
 - if it's any other reparse point, call os.stat instead and return that
* otherwise regular handling (using hFile or FindFirstFile, etc.)

os.stat:
* open following reparse points
* if the open fails with ERROR_CANT_ACCESS_FILE, try opening without following reparse points:
 - if it's a genuine symlink, ???
 - if it's a directory junction, ???
 - if it's any name-surrogate reparse point, ???
 - if it's any other reparse point, return attributes of the link itself
* otherwise regular handling

If you can fill in the gaps, that will help me understand exactly what you're proposing.

>> shutil.copytree(path): Unchanged. (requires a minor fix to
>> continue to recursively copy through junctions (using above test),
>> but not symlinks.)
>
> Everyone else who relies on islink(), readlink(), and symlink() to copy
> symlinks isn't special casing their code to look for junctions or
> anything else we lump under the banner of islink(). They could code
> defensively if readlink() fails for a 'link' that we can't read. But
> that leaves the problem of readlink() succeeding for a junction. That
> can causes problems if the target is passed to os.symlink(), which
> changes the link from a hard name grafting to a soft name grafting.

Right, but is that because they deliberately want the junction to be treated like a file? Or because they want it to be treated like the directory is really right there?

os.rmdir() already does special things to behave like a junction rather than the real directory, and the islink/readlink/symlink process is going to be problematic on Windows since most users can't create symlinks. That code simply isn't going to be portable. But code that is using stat() and expecting to get the real directory ought to work, just as code using lstat() and expecting to get the link if it's been linked somehow ought to work.

> Why would we need to read the target of a junction? It's not needed for
> realpath() in Windows. We should only have to resolve symlinks. For
> example:
> 
> ...
>
> IMO, S_IFLNK need not be set for anything other than Unix-like symbolic
> links. We would just need to document that on Windows, lstat opens any
> link-like reparse point that indicates it targets another path on the
> system, plus any reparse point that's not handled, but that islink() is
> only true for actual Unix symlinks that can be created via os.symlink()
> and read via os.readlink().

I think I understand your reasoning here now, sorry for it taking so long.

> This preserves how islink() and readlink() currently work, while still
> leaving the door open to fix misbehavior in particular cases. Code,
> including our own code, that needs to look for the broader Windows
> category of "name surrogate" can examine the reparse tag. For
> convenience we can provide issurrogate() that checks
> lstat(filename).st_reparse_tag & 0x2000_0000.

I'm not adding new API, even for internal use. This is edge case enough that os.lstat() is fine for it.

>> os.unlink(path): unchanged (still removes the junction, not the
>> contents)
> 
> Whatever we're calling a link should be capable of being deleted via
> os.unlink. If we apply S_IFLNK, then it won't have S_IFDIR (at least how
> POSIX code expects it), and unlink should work on it. The current state
> of affairs in which unlink/remove works on a junction, which is reported
> by stat() as a directory, is inconsistent. It's not specified to remove
> directories, so nothing that it can remove should be a directory.

I'm proposing to fix the inconsistency by fixing the flags. Your proposal is to fix the inconsistency by generating a new error in unlink()? (Just clarifying.)

>> shutil.rmtree(path): Will now remove a junction rather than
>> recursively deleting its contents (net improvement, IMHO)
> 
> I'd like for it to remove all name-surrogate directories like CMD's
> `rmdir /s` does. In contrast, Unix shutil.rmtree traverses into a mount
> point, deletes everything, and then fails because the directory is
> mounted and can't be removed. That's hideous, IMO.

Currently Windows shutil.rmtree traverses into junctions and deletes everything, though it then succeeds to delete the junction. With my change, rmtree() directly on a junction now raises (could be fixed?) but rmtree on a directory containing a junction will remove the junction without touching the target directory. So I think we're both happy about this one.

History
Date	User	Action	Args
2019-08-16 23:59:54	steve.dower	set	recipients: + steve.dower, paul.moore, tim.golden, zach.ware, eryksun
2019-08-16 23:59:54	steve.dower	set	messageid: <1565999994.64.0.603744678.issue37834@roundup.psfhosted.org>
2019-08-16 23:59:54	steve.dower	link	issue37834 messages
2019-08-16 23:59:53	steve.dower	create