This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients barneygale, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date 2021-04-26.02:11:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1619403110.48.0.339685349175.issue43936@roundup.psfhosted.org>
In-reply-to
Content
> os.path.realpath() normalizes paths before resolving links 
> on Windows

Normalizing the input path is required in order to be consistent with the Windows file API. OTOH, the target path of a relative symlink gets resolved in a POSIX-ly correct manner in the kernel, and ntpath._readlink_deep() doesn't ensure this. 

I've attached a prototype that I wrote for a POSIX-like implementation that recursively resolves both the drive and the path. It uses the final path only as a shortcut to normalize volume GUID names as drives and the proper casing of UNC server and share names. However, it's considerably more work than the final-path approach, and more work always has the potential for more bugs. I'm providing it for the sake of discussion, or just for people to point to it as an example of what not to do... ;-)

Patching up the current implementation would probably involve extending _getfinalpathname() to support follow_symlinks=False. Aspects of the POSIX implementation would have to be adopted, but I think it can be kept relatively simple when integrated with _getfinalpathname(path, follow_symlinks=False). The latter also makes it easy to identify a UNC path, which is necessary because mountpoints should never be resolved in a UNC path, which is something the current implementation gets wrong.

What this wouldn't support is resolving an inaccessible drive as much as possible. Mapped drives are object symlinks that expand to UNC paths that can include an arbitrary filepath on a share. Substitute drives by definition target an arbitrary filepath, and can even target other substitute and mapped drives. A final-path only approach would leave the inaccessible drive in the result, along with any symlinks that are internal to the drive.

A final-path approach also can't support targets with rooted paths or ".." components that traverse a mountpoint. The final path will be on the mountpoint's device, which will change how such relative symlinks resolve. That said, rooted symlink targets are almost never seen in Windows, and targets that traverse a mountpoint by way of a ".." component should be rare, in principle. 

One problem is the frequent use of bind mountpoints in place of symlinks in Windows. In CMD, bind mountpoints can be created by anyone via `mklink /j`. Here's a fabricated example with a mountpoint (i.e. junction) that's used where normally a symlink should be used.

    C:\
        work\
            foo\
                bar [junction -> C:\work\bar]
                remote [symlink -> \\baz\spam]
            bar\
                remote [symlink -> ..\remote]
            remote [symlink -> \\qux\eggs]

C:\work\foo\bar\remote normally resolves as follows:

    C:\work\foo\bar\remote
        -> C:\work\foo\bar + ..\remote
        -> C:\work\foo\remote
        -> \\baz\spam

Assume that \\baz\spam is down, so C:\work\foo\bar\remote can't be strictly resolved. If the non-strict algorithm relies on getting the final path of C:\work\foo\bar\remote before resolving the target of "remote", then the result for this case will be incorrect.

    C:\work\foo\bar\remote
        -> C:\work\bar\remote
        -> C:\work\bar + ..\remote
        -> C:\work\remote
        -> \\qux\eggs
History
Date User Action Args
2021-04-26 02:11:50eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower, barneygale
2021-04-26 02:11:50eryksunsetmessageid: <1619403110.48.0.339685349175.issue43936@roundup.psfhosted.org>
2021-04-26 02:11:50eryksunlinkissue43936 messages
2021-04-26 02:11:50eryksuncreate