classification
Title: nt._getfinalpathname may use uninitialized memory
Type: behavior Stage: resolved
Components: Extension Modules, Windows Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: eryksun, izbyshev, miss-islington, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2018-03-06 22:31 by izbyshev, last changed 2018-09-21 20:47 by steve.dower. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6010 merged izbyshev, 2018-03-06 22:42
PR 6031 merged miss-islington, 2018-03-08 16:04
PR 6032 merged steve.dower, 2018-03-08 16:16
Messages (11)
msg313366 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-03-06 22:31
The first call of GetFinalPathNameByHandleW requests the required buffer size for the NT path (VOLUME_NAME_NT), while the second call receives the DOS path (VOLUME_NAME_DOS) in the allocated buffer. Usually, NT paths are longer than DOS ones, for example:

 NT path: \Device\HarddiskVolume2\foo
DOS path: \\?\C:\foo

Or, for UNC paths:

 NT path: \Device\Mup\server\share\foo
DOS path: \\?\UNC\server\share\foo

However, it is not always the case. A volume can be mounted to an arbitrary path, and if a drive letter is not assigned to such a volume, 
GetFinalPathNameByHandle will use the mount point path instead of C: above. This way, a DOS path can be longer than an NT path. Since the result of the second call is not checked properly, this condition won't be detected, resulting in an out-of-bounds access and use of uninitialized memory later.

Moreover, the path returned by GetFinalPathNameByHandle may change between the first and the second call, for example, because an intermediate directory was renamed. If the path becomes longer than buf_size, the same issue will occur.
msg313367 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-03-06 23:20
The inconsistent use of VOLUME_NAME_NT and VOLUME_NAME_DOS was addressed incidentally in issue 29734, but that issue is primarily about the handle leaked when GetFinalPathNameByHandleW fails. That said, you've cleanly addressed the handle leak in your PR also. I prefer the common cleanup approach that you've used here, rather than spreading the cleanup tasks around in blocks, where it can easily be overlooked.

The PR for issue 29734 also fixes some handle leaks related to win32_xstat_impl. Those leaks still need to be addressed for 3.7, and backported to 3.6.
msg313445 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-08 16:03
New changeset 3b20d3454e8082e07dba93617793de5dc9237828 by Steve Dower (Alexey Izbyshev) in branch 'master':
bpo-33016: Fix potential use of uninitialized memory in nt._getfinalpathname (#6010)
https://github.com/python/cpython/commit/3b20d3454e8082e07dba93617793de5dc9237828
msg313448 - (view) Author: miss-islington (miss-islington) Date: 2018-03-08 16:26
New changeset 8c163bbf370f6f6cedd2c07f7d54c9b36c97d8f2 by Miss Islington (bot) in branch '3.7':
bpo-33016: Fix potential use of uninitialized memory in nt._getfinalpathname (GH-6010)
https://github.com/python/cpython/commit/8c163bbf370f6f6cedd2c07f7d54c9b36c97d8f2
msg313463 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-03-08 22:29
Copying the comment of @eryksun from PR 6010 for reference.

Because we only try VOLUME_NAME_DOS, this function always fails for a volume that's not mounted as either a drive letter or a junction mount point. The error in this case is ERROR_PATH_NOT_FOUND. We know that the path is valid because we have a handle (in this case the file system ensures that no parent directory in the path can be unlinked or renamed). To address this, if the flags parameter isn't already VOLUME_NAME_GUID, it could switch to it and continue the while loop (no need to realloc). Otherwise if it's already VOLUME_NAME_GUID or for any other error, the call should fail.

For DOS devices in paths (e.g. "C:\Temp\NUL"), CreateFile commonly succeeds, but GetFinalPathNameByHandle commonly fails with either ERROR_INVALID_FUNCTION or ERROR_INVALID_PARAMETER. What do you think about handling this failure by calling GetFullPathName instead (e.g. "C:\Temp\NUL" => "\\.\NUL")? That way most DOS device paths won't cause this function to fail (e.g. when called by pathlib's resolve method). We could do the same if CreateFile fails with ERROR_INVALID_PARAMETER, which should only occur for CON (e.g. "C:\Temp\CON") because it needs to be opened with either generic read or generic write access.

CreatFile also commonly fails here with either ERROR_SHARING_VIOLATION (from a paging file) or ERROR_ACCESS_DENIED. But I think those are best handled by the caller, with a PermissionError exception handler. Currently pathlib's resolve method doesn't handle PermissionError like I think it should in non-strict mode. It only handles FileNotFoundError
msg313464 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-08 22:50
New changeset 32efcd13069a89abf007373274ee1bc0909d1996 by Steve Dower in branch '3.6':
bpo-33016: Fix potential use of uninitialized memory in nt._getfinalpathname (GH-6032)
https://github.com/python/cpython/commit/32efcd13069a89abf007373274ee1bc0909d1996
msg313465 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-08 22:51
Leaving this open for commit review (and buildbots) for a little while, but then I'll close it if we're all good.
msg313466 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-03-08 22:55
> Because we only try VOLUME_NAME_DOS, this function always fails for a volume that's not mounted as either a drive letter or a junction mount point.

If a volume is not mounted, users have to use \\?\ or \\.\ either directly or indirectly via a symlink or a junction to get to it, right? Do you think such uses are common enough to warrant dealing with VOLUME_NAME_GUID? Also note that pathlib currently expects DOS paths only: it will strip '\\?\' prefix in resolve(), making GUID path "invalid" (and, also, precluding direct usage of '\\?\' prefix with resolve() in other cases unless users are not prepared to deal with it).
msg313467 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-03-08 23:22
> unless users are not prepared to deal with it
ARE prepared

> What do you think about handling this failure by calling GetFullPathName instead (e.g. "C:\Temp\NUL" => "\\.\NUL")?

I think it would indeed be nice if pathlib handled such paths in its resolve(), especially since os.path.abspath() does handle them, and it looks weird that even resolve(strict=False) fails. That could be an enhancement, but note that it'll expose users to '\\.\'-prefixed paths which can't be returned from resolve() now. It is not necessary a problem because users should be prepared to handle UNC-like paths anyway.

> Currently pathlib's resolve method doesn't handle PermissionError like I think it should in non-strict mode. It only handles FileNotFoundError

That behavior doesn't look good, and it's inconsistent with POSIX resolve() which doesn't propagate any OSError in non-strict mode. I think this warrants an issue report.
msg313469 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-03-08 23:25
> We know that the path is valid because we have a handle (in this case the file system ensures that no parent directory in the path can be unlinked or renamed).

Thank you for pointing this out. I erroneously stated that the length of the path could increase between GetFinalPathNameByHandle calls because an intermediate directory could be renamed, but actually I've only checked that the last part can be renamed (or even deleted, though it'll still linger in inaccessible state until the handle is closed).
msg313593 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-03-11 10:20
> pathlib currently expects DOS paths only: it will strip '\\?\' 
> prefix in resolve()

pathlib's unqualified conversion from the extended form to classic DOS form is a bug. The resolved path may be invalid as a classic DOS path. It may be too long, a DOS device name (e.g. "nul.txt"), or end with trailing spaces or dots (e.g. "spam."). 

Here's another pathlib bug with device paths:

    >>> print(pathlib.Path(r'\\.\C:'))
    \\.\C:\
    >>> print(pathlib.Path('//./CON'))
    \\.\CON\

In the first case, the input path is the "C:" volume device, but pathlib changes it to the file-system root directory. The second case is invalid in general, though some devices will ignore a remaining path.

As background, note that the Windows runtime library classifies paths into 6 types:

    >>> ntdll = ctypes.WinDLL('ntdll')
    >>> GetDosPathNameType = ntdll.RtlDetermineDosPathNameType_U

    UNC Absolute
    >>> GetDosPathNameType(r'\\eggs\spam')
    1

    Drive Absolute
    >>> GetDosPathNameType(r'C:\spam')
    2

    Drive Relative
    >>> GetDosPathNameType('C:spam')
    3

    Rooted
    >>> GetDosPathNameType(r'\spam')
    4

    Relative
    >>> GetDosPathNameType('spam')
    5

    Local Device
    >>> GetDosPathNameType(r'\\.\C:\spam')
    6
    >>> GetDosPathNameType(r'\\?\C:\spam')
    6

A local-device path is always absolute, so it's the only way to reference a volume device by a DOS drive letter. Without the prefix, "C:" is a drive-relative path.

If the prefix is exactly "\\\\?\\" (no forward slashes), then a local-device path is an extended path. This path type never gets normalized in any way, except to replace the WinAPI prefix with NTAPI "\\??\\". For all other local-device paths, the runtime library resolves "." and ".." components, translates forward slash to backslash, and strips trailing spaces and dots from the final component. Unlike DOS paths, local-device paths do not reserve DOS device names (e.g. "NUL" or "NUL:").

pathlib should never add a trailing slash to a local-device path. Also, the is_reserved() method needs to distinguish the DOS, device ("\\\\.\\"), and extended device ("\\\\?\\") namespaces.

> it would indeed be nice if pathlib handled [device] paths in its resolve()

I suggested handling volume GUID and device paths in _getfinalpathname, so it's not special-cased just in pathlib (e.g. if we were to implement ntpath.realpath). OTOH, pathlib's resolve() method should handle high-level mitigation, such as working around bad links and permission errors in non-strict mode.

> I erroneously stated that the length of the path could increase 
> between GetFinalPathNameByHandle calls because an intermediate
> directory could be renamed

The rules for the rename operation are discussed in the documentation for NtSetInformationFile, FileRenameInformation [1], and explained in detail in File Systems Algorithms, 2.1.5.14.11 FileRenameInformation [2].

[1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntifs/ns-ntifs-_file_rename_information
[2]: https://msdn.microsoft.com/en-us/library/ff469527
History
Date User Action Args
2021-03-15 17:39:20eryksunlinkissue12777 superseder
2018-09-21 20:47:00steve.dowersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2018-03-11 10:21:00eryksunsetmessages: + msg313593
2018-03-08 23:25:47izbyshevsetmessages: + msg313469
2018-03-08 23:22:07izbyshevsetmessages: + msg313467
2018-03-08 22:55:51izbyshevsetmessages: + msg313466
2018-03-08 22:51:17steve.dowersetassignee: steve.dower
messages: + msg313465
stage: patch review -> commit review
2018-03-08 22:50:38steve.dowersetmessages: + msg313464
2018-03-08 22:29:58izbyshevsetmessages: + msg313463
2018-03-08 16:26:46miss-islingtonsetnosy: + miss-islington
messages: + msg313448
2018-03-08 16:16:12steve.dowersetpull_requests: + pull_request5795
2018-03-08 16:04:37miss-islingtonsetpull_requests: + pull_request5794
2018-03-08 16:03:32steve.dowersetmessages: + msg313445
2018-03-06 23:20:46eryksunsetnosy: + eryksun
messages: + msg313367
2018-03-06 22:42:54izbyshevsetkeywords: + patch
stage: patch review
pull_requests: + pull_request5775
2018-03-06 22:31:37izbyshevcreate