Author eryksun
Recipients eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date 2019-08-15.01:49:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1565833750.46.0.540959287808.issue37834@roundup.psfhosted.org>
In-reply-to
Content
> I wish we could remove the MAX_PATH limit in this case.
>
> The problem is that we have to remove the limit in any case where the 
> resulting path might be used, which is what we're already trying to 
> encourage by supporting long paths.

Maybe it's better to ignore the MAX_PATH limit and let processes fail hard if they lack long-path support. A known and expected exception is better than unpredictable behavior (see the next paragraph for an example). That leaves the problem of a final component that's a reserved name, i.e. a DOS device name or a name with trailing dots or spaces. We have no choice but to return this case as an extended path. 

The intersection of this problem with SetCurrentDirectoryW (os.chdir) troubles me. Without long-path support, the current-directory buffer in the process parameters is hard limited to MAX_PATH, and passing SetCurrentDirectoryW an extended path can't work around this. Fair enough. But it still accepts a device path as the current directory, even though the docs do not explicitly allow it, and the implementation assumes it's disallowed. The combination is an ugly bug:

    >>> os.chdir('//./C:/Temp')
    >>> os.getcwd()
    '\\\\.\\C:\\Temp'
    >>> os.path._getfullpathname('/spam/eggs')
    '\\\\spam\\eggs'

    >>> os.chdir('//?/C:/Temp')
    >>> os.getcwd()
    '\\\\?\\C:\\Temp'
    >>> os.path._getfullpathname('/spam/eggs')
    '\\\\spam\\eggs'

In order to resolve a rooted path such as "/spam/eggs", the runtime library needs to be able to figure out the current drive from the current directory. It checks for a UNC path and otherwise assumes it's a DOS drive, since it's assuming device paths aren't allowed. It ends up assuming the current directory is a DOS drive and grabs the first two characters as the drive name, which is "\\\\". Then when joining the rooted path to this 'drive', the initial slash or backslash of the rooted path gets collapsed into the preceding backslash. The result is at best a broken path, and at worst an unrelated UNC path that exists. 

I think os.chdir should raise an exception when passed a device path. In explanation, we can point to the documentation of SetCurrentDirectoryW, which explicitly states the following:

    Each process has a single current directory made up of two parts:

        * A disk designator that is either a drive letter followed by 
          a colon, or a server name and share name 
          (\\servername\sharename)
        * A directory on the disk designator

> Perhaps the best we can do is an additional test where we 
> GetFinalPathName, strip the prefix, reopen the file, 
> GetFinalPathName again and if they match then return it 
> without the prefix. That should handle the both long path 
> settings as transparently as we can.

I assume you're talking about realpath() here, toward the end where we're working with a solid path, or rather where we have at least the beginning part of the path as a solid path, up to the first component that's inaccessible.

For the problem of reserved names, GetFullPathNameW is all we need. This doesn't address the MAX_PATH issue. But that either works or not. It's a user-mode issue. There's nothing to resolve in the kernel. If the path is too long, then CreateFileW will fail at RtlDosPathNameToRelativeNtPathName_U_WithStatus with STATUS_NAME_TOO_LONG, before making a single system call.
History
Date User Action Args
2019-08-15 01:49:10eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower
2019-08-15 01:49:10eryksunsetmessageid: <1565833750.46.0.540959287808.issue37834@roundup.psfhosted.org>
2019-08-15 01:49:10eryksunlinkissue37834 messages
2019-08-15 01:49:09eryksuncreate