Author eryksun
Recipients Christian Åkerström, Ethan Smith, eric.smith, eryksun, ishimoto, jaraco, living180, miss-islington, ncdave4life, pablogsal, paul.moore, pitrou, steve.dower, stutzbach, takluyver, tim.golden, zach.ware, Étienne Dupuis
Date 2019-08-21.23:59:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1566432000.46.0.251225212472.issue9949@roundup.psfhosted.org>
In-reply-to
Content
I'm tentatively reopening this issue for you to consider the following point, Steve.

A real path is not always the same as a final path. We can find code that does `relpath(realpath(target), realpath(start))` to compute the relative path to target for a symlink. The final path can't be relied on for this unless we always evaluate the symlink from the final path to `start`. In particular, it cannot be relied on if the relative path traverses a junction. 

What code like this needs from a realpath() implementation is a solid (real) path, not a final path. In other words, the caller wants a solidified form of `start` that can be used to compute the path to a target for a relative symlink, but one that works when accessed from `start`, not the final path of `start`. Generally this means resolving symlinks in the path, but not mount points. That's what Unix realpath() does, but of course there it's simpler because the only name surrogate in Unix is a symlink, which is never a mount point and never a directory.

Here's an example. In this first case "scripts" is a junction mount point that targets "C:/spam/etc/scripts":

    >>> eggs = r'C:\spam\dlls\eggs.dll'
    >>> scripts = r'C:\spam\scripts'

    >>> rel_eggs_right = os.path.relpath(eggs, scripts)
    >>> print(rel_eggs_right)
    ..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_right, 'C:/spam/scripts/eggs_right.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_right.dll')
    True

    >>> scripts_final = os.path._getfinalpathname(scripts)[4:]
    >>> print(scripts_final)
    C:\spam\etc\scripts
    >>> rel_eggs_wrong = os.path.relpath(eggs, scripts_final)
    >>> print(rel_eggs_wrong)
    ..\..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_wrong, 'C:/spam/scripts/eggs_wrong.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_wrong.dll')
    False

If we remove the junction and replace it with a 'soft' symlink that targets the same directory, then using the final path works, and using the given path no longer works.

    >>> print(os.readlink('C:/spam/scripts'))
    C:\spam\etc\scripts
    >>> scripts_final = os.path._getfinalpathname(scripts)[4:]
    >>> rel_eggs_right_2 = os.path.relpath(eggs, scripts_final)
    >>> print(rel_eggs_right_2)
    ..\..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_right_2, 'C:/spam/scripts/eggs_right_2.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_right_2.dll')
    True

    >>> rel_eggs_wrong_2 = os.path.relpath(eggs, scripts)
    >>> print(rel_eggs_wrong_2)
    ..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_wrong_2, 'C:/spam/scripts/eggs_wrong_2.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_wrong_2.dll')
    False

When the kernel traverses "scripts" as a soft link, it collapses to the target (i.e. "C:/spam/etc/scripts"), so our relative path that was computed from the final path is right in this case. On the other hand, if "scripts" is is a mount point (junction), it's a hard (solid) component. It does not collapse to the target (the kernel even checks the junction's security descriptor, which it does not do for a symlink), so ".." in the relative symlink traverses the junction component as if it were an actual directory.

What we need is an implementation of realpath("C:/spam/scripts") that returns "C:\\spam\\scripts" when "scripts" is a mount point and returns "C:\\spam\\etc\\scripts" when "scripts" is a symlink.

This means we need an implementation of realpath() that looks a lot like posixpath.realpath. Generally a mount point should be walked over like a directory, just as mount points are handled in Unix. The difference is that a mount point in Windows is allowed to target a symlink. (This is a design flaw; Unix doesn't allow it.) Since we need to know the target of a junction, we have to read the reparse point, until we hit a real directory target. As long as it targets another junction, it remains a hard component. As soon as it targets a symlink, however, it becomes a soft component that needs to be resolved. If the junction targets a name surrogate reparse point that we can't read, then our only option is to get a final path. This is dysfunctional. We should raise an exception for this case. Code can handle the exception and knowingly get a final path instead of a real path.

This also means we can't reliably compute a real path for a remote path (UNC) because we can't manually evaluate the target of a remote junction. A remote junction is meaningless to us. If we're evaluating a UNC path and reach a junction, we have to give up on a real path and settle for a final path. We can get a final path because that lets the kernel in the server talk to our kernel to resolve any combination of mount points (handled on the server side) and symlinks (handled on our side). This case should also raise an exception. Aware code can handle it by getting a real path and taking appropriate measures.
History
Date User Action Args
2019-08-22 00:00:00eryksunsetrecipients: + eryksun, paul.moore, ishimoto, jaraco, pitrou, eric.smith, tim.golden, stutzbach, living180, takluyver, zach.ware, ncdave4life, steve.dower, Christian Åkerström, Ethan Smith, pablogsal, Étienne Dupuis, miss-islington
2019-08-22 00:00:00eryksunsetmessageid: <1566432000.46.0.251225212472.issue9949@roundup.psfhosted.org>
2019-08-22 00:00:00eryksunlinkissue9949 messages
2019-08-21 23:59:59eryksuncreate