This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date 2019-08-14.21:00:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1565816453.06.0.766901071197.issue37834@roundup.psfhosted.org>
In-reply-to
Content
> but suddenly adding "\\?\" to the paths breaks a lot of assumptions.

The unwritten assumption has been that readlink() is reading symlinks that get created by CreateSymbolicLinkW, which sets the print name as the DOS path that's passed to the call. In this case, readlink() can rely on the print name being the intended DOS path. I raised a concern in the case of reading junctions. There's no high-level API to create junctions, so we can't assume the print name is reliable. PowerShell's new-item doesn't even set a print name for junctions. That symlinks also are valid without a print name (in principle; I haven't come across it practice) lends additional weight to always using the substitute name.

Even if we have the DOS path, resolving paths manually is still complicated if it's a relative symlink with a reserved name (DOS device; trailing dots or spaces) as the final component or if it's a long path. Reparsing a relative symlink in the kernel doesn't reserve such names and there's no MAX_PATH limit in the kernel. So using readlink() is tricky. Fortunately realpath() in Windows doesn't require a resolve loop based on readlink(). The kernel almost always knows the final path of an opened file, and we can walk the components from the end until we find one that exists.

> My idea was to GetFinalPathName(path[4:])[4:] and if that fails

An existing file named "spam" would be a false positive for a link that targets "spam.". The internal CreateFileW call would open "spam". Also, symlinks allow remote paths, and this doesn't handle "\\\\?\\UNC" paths. More generally, a link target doesn't have to exist, so being able to access it shouldn't be a factor. I see it's also returning the result from _getfinalpathname. readlink() doesn't resolve a final, solid path. It just returns the contents of a link, which can be a relative or absolute path.

In the proposed implementation of realpath() that I helped on for issue 14094 (I wasn't aware of the previous work in issue 9949), there's an _extended_to_normal function that tries to return a normal path if possible. The length of the normal path has to be less than MAX_PATH, and _getfullpathname should return the path unchanged. GetFullPathNameW is just rule-based processing in user mode; it doesn't touch the file system.

I wish we could remove the MAX_PATH limit in this case. I think at startup we should try to call RtlAreLongPathsEnabled, even though it's not documented, and set a sys flag to indicate whether we can use long paths. Also, support a -X option and an environment variable to override automatic detection.
History
Date User Action Args
2019-08-14 21:00:53eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower
2019-08-14 21:00:53eryksunsetmessageid: <1565816453.06.0.766901071197.issue37834@roundup.psfhosted.org>
2019-08-14 21:00:53eryksunlinkissue37834 messages
2019-08-14 21:00:52eryksuncreate