This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients brett.cannon, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date 2016-08-11.06:58:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1470898734.83.0.380268230276.issue27730@psf.upfronthosting.co.za>
In-reply-to
Content
Standard users have SeChangeNotifyPrivilege, which allows traversing a directory that they can't access, so Python should only work with paths as strings instead of trying to open a directory handle. 

I think it's best to make Windows do as much of the normalization work as possible. Why reinvent the wheel instead of relying on GetFullPathNameW [1]? 

For example, you propose to trim leading and trailing spaces and trailing dots from each component, but Windows itself doesn't go that far. Leading spaces are never removed. Only the final path component has all trailing spaces and dots trimmed. From the preceding components Windows will strip one and only one trailing dot, and a trailing space is never removed. Some examples:

    >>> os.path.exists(r'C:\Temp\test\dir1\dir2\file')
    True
    >>> os.path.exists(r'C:\Temp\test\dir1\dir2\file. . . . . .')
    True
    >>> os.path.exists(r'C:\Temp\test\dir1.\dir2.\file')
    True
    >>> os.path.exists(r'C:\Temp\test\dir1..\dir2\file')
    False
    >>> os.path.exists(r'C:\Temp\test\dir1 \dir2\file')
    False
    >>> os.path.exists(r'C:\Temp\test\ dir1\dir2\file')
    False

Components that consist of only "." and ".." should also be normalized:

    >>> os.path.abspath(r'C:\Temp\test\dir1\..\dir1\.\dir2\...\file')
    'C:\\Temp\\test\\dir1\\dir2\\...\\file'

Paths with DOS devices also need to be translated beforehand, since the existence of classic DOS devices in every directory is emulated by the NT runtime library when it translates from DOS paths to native NT paths. For example:

    >>> os.path.abspath(r'C:\Temp\con')
    '\\\\.\\con'
    >>> os.path.abspath(r'C:\Temp\nul')
    '\\\\.\\nul'
    >>> os.path.abspath(r'C:\Temp\prn')
    '\\\\.\\prn'
    >>> os.path.abspath(r'C:\Temp\aux')
    '\\\\.\\aux'

GetFullPathNameW handles all of these corner cases already, so I think a simpler algorithm is to just rely on Windows to do most of the work:

* If len(path) < 260 or the path starts with L"\\\\?\\" or L"\\\\.\\", don't do anything.
* Call GetFullPathNameW to calculate the required path length.
* If the path starts with L"\\\\", over-allocate by sizeof(WCHAR) * 6. Otherwise over-allocate by sizeof(WCHAR) * 4.
* Call GetFullPatheNameW again, with the buffer pointer adjusted past the overallocation. 
* If the path is a UNC path, copy the L"\\\\?\\UNC" prefix to the start of the buffer. Otherwise copy L"\\\\?\\".

Contrary to the documentation on MSDN, Windows doesn't need the \\?\ prefix to use a long path with GetFullPathNameW. On NT systems it has always worked with long paths. The implementation uses the RtlGetFullPathName_U* family of functions, which immediately wrap the input buffer in a UNICODE_STRING, which has a limit of 32,768 characters. 

The only MAX_PATH limit here is one that can't be avoided. The process working directory is limited to MAX_PATH, as are the per-drive working directories (stored in hidden environment variables, e.g. "=C:"). At least that's the case prior to the upcoming change in Windows 10. With the change you propose in issue 27731, Windows 10 users should be able to set a working directory that exceeds MAX_PATH. 

For example, the following demonstrates (in Windows 10.0.10586) that the value of "=Z:" is only used when its length is less than MAX_PATH and the target directory exists.

Create a long test path:

    >>> path = 'Z:' + r'\test' * 50
    >>> os.makedirs('\\\\?\\' + path + r'\last\test')

A drive-relative path is resolved relative to the root directory if the current directory on the drive doesn't exist or is inaccessible:

    >>> kernel32.SetEnvironmentVariableW('=Z:', path + r'\test')
    1
    >>> os.path._getfullpathname('Z:file')
    'Z:\\file'

It also uses the root directory if the current directory on the drive exceeds MAX_PATH:

    >>> kernel32.SetEnvironmentVariableW('=Z:', path + r'\last\test')
    1
    >>> os.path._getfullpathname('Z:file')
    'Z:\\file'

It resolves correctly if the current directory can be opened and the path length doesn't exceed MAX_PATH:

    >>> kernel32.SetEnvironmentVariableW('=Z:', path + r'\last')
    1
    >>> os.path._getfullpathname('Z:file')
    'Z:\\test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\
    test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\
    test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\
    test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\test\\
    test\\test\\test\\test\\test\\test\\test\\last\\file'

    >>> shutil.rmtree(r'\\?\Z:\test')

[1]: https://msdn.microsoft.com/en-us/library/aa364963
History
Date User Action Args
2016-08-11 06:58:54eryksunsetrecipients: + eryksun, brett.cannon, paul.moore, tim.golden, zach.ware, steve.dower
2016-08-11 06:58:54eryksunsetmessageid: <1470898734.83.0.380268230276.issue27730@psf.upfronthosting.co.za>
2016-08-11 06:58:54eryksunlinkissue27730 messages
2016-08-11 06:58:53eryksuncreate