classification
Title: resolve() on Windows makes some pathological paths unusable
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Kevin.Norris, eryksun, pitrou, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2014-08-29 13:17 by Kevin.Norris, last changed 2021-02-23 09:21 by eryksun.

Files
File name Uploaded Description Edit
22299_1.patch steve.dower, 2014-09-05 16:51 review
22299_2.patch steve.dower, 2014-09-05 17:33
Messages (28)
msg226060 - (view) Author: Kevin Norris (Kevin.Norris) Date: 2014-08-29 13:17
Run Python as an administrator:

    >>> import pathlib 
    >>> pth = pathlib.Path('//?/C:/foo.')
    >>> pth.mkdir()
    >>> pth.resolve().rmdir()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python34\lib\pathlib.py", line 1141, in rmdir
        self._accessor.rmdir(self)
      File "C:\Python34\lib\pathlib.py", line 323, in wrapped
        return strfunc(str(pathobj), *args)
    FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\foo.'
    >>> pth.rmdir()

You do not need to be an administrator so long as you can create a directory in the requested location, but the \\?\ prefix only works with absolute paths so it's easier to demonstrate in the root of the drive.
msg226061 - (view) Author: Kevin Norris (Kevin.Norris) Date: 2014-08-29 13:20
When the directory name is '...', the error is different:

    >>> pth = pathlib.Path('//?/C:/...')
    >>> pth.mkdir()
    >>> pth.resolve().rmdir()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python34\lib\pathlib.py", line 1141, in rmdir
        self._accessor.rmdir(self)
      File "C:\Python34\lib\pathlib.py", line 323, in wrapped
        return strfunc(str(pathobj), *args)
    PermissionError: [WinError 5] Access is denied: 'C:\\...'
    >>> pth.rmdir()
    >>>
msg226072 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-29 20:20
Why is it a "pathological path"? Can you explain?
msg226076 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-08-29 20:42
Is resolve() using an *A() API rather than *W()? The \\?\ prefix does not work with *A() APIs.

Also, names that are all dots are not supported by Windows at all. I'd expect mkdir() to fail on that, but the \\?\ prefix disables some validation, so it's possible that it is getting through that way.
msg226077 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-29 20:43
resolve() should use the *W APIs since it is using only functions from the os module with str objects. Perhaps you want to double-check that, since I don't have a Windows VM anymore.
msg226081 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-08-29 20:53
The \\?\ extended-path prefix bypasses normal path processing. The path is passed directly to the filesystem driver. For example, to accommodate the POSIX namespace, NTFS allows any character except NUL and slash, so it happily creates a directory named "foo.". This name is invalid in the Win32 namespace. 

resolve() should skip calling _ext_to_normal on the result of _getfinalpathname if the input path is extended.

http://hg.python.org/cpython/file/c0e311e010fc/Lib/pathlib.py#l178
msg226082 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-08-29 20:55
Removing the _ext_to_normal() call in resolve() looks like the right fix to me.
msg226084 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-29 21:02
Does one of you want to provide a patch? (with tests)
msg226428 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-05 16:51
Patch attached. (Kinda feel like this was too simple...)
msg226431 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-09-05 17:17
It should only skip _ext_to_normal for an already extended path, i.e. a path that starts with ext_namespace_prefix. Otherwise it needs to call _ext_to_normal. For example:

Strip the prefix in this case:

    >>> os.path._getfinalpathname('C:\\Windows')       
    '\\\\?\\C:\\Windows'

but not in this case:

    >>> os.path._getfinalpathname(r'\\?\GLOBALROOT\Device\HarddiskVolume1\Windows')
    '\\\\?\\C:\\Windows'
msg226433 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-05 17:20
Ah, thought it was too simple. I didn't realise that _getfinalpathname adds the prefix.

New patch soon.
msg226435 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-05 17:33
Strips the prefix if it wasn't in the original path - otherwise, keeps it.
msg226441 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-05 18:52
As far as I can say, the patch looks fine to me. Thanks, Steve.
msg226458 - (view) Author: Kevin Norris (Kevin.Norris) Date: 2014-09-05 21:34
I'm a little concerned about this fix.  In particular, if I understand the design of the patch correctly, it is intended to produce this behavior:

    Path('C:/foo').resolve() != Path('//?/C:/foo').resolve()

Since both paths are valid and both paths refer to the same file, some developers may find this result counterintuitive.  The Path.resolve() docs do not expressly forbid it, however.

How many developers assume Path.resolve() is always the same for the same file?
msg226462 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-09-06 01:23
Maybe for an extended path it could try _getfinalpathname without the prefix. If it isn't a valid path or the result isn't the same as _getfinalpathname including the prefix, then skip calling _ext_to_normal. For example:

    def resolve(self, path):
        s = str(path)
        if not s:
            return os.getcwd()
        if _getfinalpathname is not None:
            prefix, t = self._split_extended_path(s)
            s = _getfinalpathname(s)
            if prefix:
                try:
                    if _getfinalpathname(t) != s:
                        return s
                except FileNotFoundError:
                    return s
            return self._ext_to_normal(s)        
        # Means fallback on absolute
        return None

The 'foo.' path in this issue would keep the prefix:

    >>> Path('//?/C:/foo.').resolve()
    WindowsPath('//?/C:/foo.')
    >>> Path('//?/UNC/server/C$/foo.').resolve()
    WindowsPath('//?/UNC/server/C$/foo.')

But regular paths would remove the prefix:

    >>> Path('//?/C:/bar').resolve()
    WindowsPath('C:/bar')
    >>> Path('//?/UNC/server/C$/bar').resolve() 
    WindowsPath('//server/C$/bar')

On a related note, _split_extended_path only looks for uppercase "UNC", which makes the above resolve method fail:

    >>> Path('//?/unc/server/C$/bar').resolve()
    WindowsPath('//?/UNC/server/C$/bar')
msg226484 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-06 10:57
> Since both paths are valid and both paths refer to the same file, some developers may find this result counterintuitive.

On the other hand the proposed solution is regular. If you input an extended path, you get an extended path as output.

There are other factors that can come into play, such as hard links under Unix (and perhaps under Windows too). The recommended way to check if two paths point to the same file is still os.path.samefile().

Another approach would be for pathlib to *always* use extended paths internally on Windows absolute paths; I don't know which side effects that could have, though.

Note we could also add methods to switch from the extended to the regular form and vice-versa.
msg226492 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-06 15:00
Actually, I'd be inclined to never use the prefix within pathlib and then add it on if necessary when converting to a string. That would also solve a problem like:

>>> p = Path("C:\\") / ('a'*150) / ('a'*150)
>>> p.stat()
FileNotFoundError: [WinError 3] The system cannot find the path specified: ...
>>> p2 = Path("\\\\?\\" + str(p))
>>> p2.stat()
os.stat_result(...)

The hardest part about this is knowing with certainty whether it's needed. We can certainly detect most cases automatically.

Maybe we also need an extra method or a format character to force a str() with prefix? Or maybe having an obvious enough function that people can monkey-patch if necessary - the "\\?\" prefix is an edge case for most people already, and checking the total length would bring that to >99% IME.
msg226493 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-06 15:04
If anyone wanted to test that really long path, here's the incantation to create it:

>>> import os, pathlib
>>> os.mkdir("C:\\a")
>>> os.mkdir("C:\\a\\" + "a"*150)
>>> os.rename("C:\\a", "C:\\" + "a"*150)
>>> p = pathlib.Path("C:\\") / ("a"*150) / ("a"*150)
msg226514 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-06 20:11
> I'd be inclined to never use the prefix within pathlib and then add it on if necessary when converting to a string

That may be very surprising when that prefix appears, though... At least with explicit methods the user would have to invoke them, instead of getting unexpected results implicitly.

I don't know what Windows users think about all this, though (I uses Linux myself).
msg226516 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-06 20:40
It's no less expected than having OS functions fail because the path is too long.

Using it to maintain dots at the end of directory/file names is a little less safe and may break some code. Maybe pathlib should strip these if there is no a prefix? (For example, "C:\Test." == "C:\Test" == "\\?\C:\Test" != "\\?\C:\Test.")

If most (or all) of the file handling functions in Python are using *W() APIs and can support the prefix, I'd rather add it in silently if only to avoid the long path issue. It's really the sort of implementation detail that pathlib should be able to hide from the app developer and the user (Node.js does this, for example, as its node_modules hierarchies regularly exceed the max path limitation).

Maybe the best approach is to preserve the prefix if it already exists, and add it if it becomes necessary. File operations are most likely to succeed in this case, even if it may be surprising to users.
msg226519 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-06 22:25
> If most (or all) of the file handling functions in Python are using *W() APIs and can support the prefix, I'd rather add it in silently if only to avoid the long path issue.

This would only work for fully-qualified paths, right? Not relative ones.

I'm all for making things higher-level, I just want to make sure it won't break existing use cases :-)
msg226524 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-06 23:37
> This would only work for fully-qualified paths, right? Not relative ones.

Correct, and I think we're most of the way there with how drives are handled. Since the prefix only works with absolute paths, why not treat it as part of the drive name?
msg226533 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-07 12:12
I'm a bit worried about the consequences still. If you take "C:" and join it with e.g. "foo", you get a drive-relative path. If you take "//?/C:/" and join it with e.g. "foo", you get an absolute path (or, if you remove the drive's trailing slash, you get something that's invalid AFAIK).

So the question is: how implicit/explicit will the conversion be, and at which stages will it happen?
msg226540 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-09-07 18:03
> If you take "//?/C:/" and join it with e.g. "foo", you get an 
> absolute path (or, if you remove the drive's trailing slash, you 
> get something that's invalid AFAIK).

FYI, DOS device names such as "C:" are NT symlinks. Win32 looks for DOS device links in NT's \Global?? directory, and also per logon-session in \Sessions\0\DosDevices\LOGON_ID. 

A link named "C:foo" is actually possible:

    >>> windll.kernel32.DefineDosDeviceW(0, "C:foo", "C:\\Python34")
    1
    >>> gfpn = os.path._getfinalpathname
    >>> gfpn(r'\\?\C:foo')
    '\\\\?\\C:\\Python34'
    >>> os.listdir(r'\\?\C:foo')
    ['DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'NEWS.txt', 'python.exe', 'pythonw.exe', 'README.txt', 'Scripts', 'tcl', 'Tools']

GLOBALROOT links to the native NT root:

    >>> gfpn('\\\\?\\GLOBALROOT\\Global??\C:\\')
    '\\\\?\\C:\\'
    >>> gfpn('\\\\?\\GLOBALROOT\\Device\\HarddiskVolume1\\')   
    '\\\\?\\C:\\'
    >>> gfpn(r'\\?\GLOBALROOT\SystemRoot') 
    '\\\\?\\C:\\Windows'
    >>> p = r'\\?\GLOBALROOT\Sessions\0\DosDevices\00000000-0f341de9\C:foo'        
    >>> gfpn(p)
    '\\\\?\\C:\\Python34'

Without the \\?\ prefix, "C:foo" is relative to the current directory on the C: drive:

    >>> os.chdir('C:\\')
    >>> os.mkdir('foo')
    >>> os.listdir('C:foo')
    []

where the current directory on C: is stored in the "=C:" environment variable:

    >>> buf = (c_wchar * 100)()                
    >>> windll.kernel32.GetEnvironmentVariableW("=C:", buf)
    3
    >>> buf.value
    'C:\\'
msg226586 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-08 15:49
Right, what the prefix actually means is "treat this path as a blob and don't do any processing". Some of the things that 'processing' includes are:

* CWD
* invalid names ('foo.' -> 'foo')
* adjacent backslashes ('a\\b' -> 'a\b')
* forward slashes ('a/b' -> 'a\b')
* (probably) short/long file names ('progra~1' -> 'Program Files')

A nice side-effect is that you can also use path names longer than 260 characters, provided your path name is correctly normalized already.

Really, the test for whether to keep or remove the prefix should be to remove the prefix and try and resolve the path again. If it succeeds, remove the prefix; otherwise, keep it. This can only really be done as part of the resolve() call, which would address the original issue, but it may be quite a perf. hit. 

I'd still be inclined to add the prefix in str() if the final path length is greater than 260 characters, if only because we go from zero chance of it working to a non-zero chance. Unfortunately, there seems to be no way to process a long path to make it 'safe' to add the prefix (though we can do a few of the things and increase the chances) as GetFinalPathName will not work on a long path. FWIW, paths longer than 260 chars are a mess and everyone knows it, but it's really really hard to fix without breaking back-compat.
msg226796 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-11 19:14
> Really, the test for whether to keep or remove the prefix should be to 
> remove the prefix and try and resolve the path again. If it succeeds, 
> remove the prefix; otherwise, keep it. This can only really be done as 
> part of the resolve() call, which would address the original issue,
> but it may be quite a perf. hit. 

It would also be prone to race conditions. All in all it sounds like a bad idea.
I still think it should be asked for explicitly. I don't know how the method should be called, .extended() perhaps?
msg226799 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-09-11 19:50
Another alternative is to always leave the prefix there after calling resolve() (as opposed to the current behaviour which is to always remove it). If the Win32 API says that the path should include the prefix, then it should. There's no reliable way for a developer to decide that an arbitrary path should include the prefix other than by resolving it.

I still like the idea of a format character to omit the prefix, as that correctly implies that the only reason you should remove it is for displaying to the user. Alternatively, a ".without_prefix" property seems like a safer route than requiring the user to add it. Long paths are the only time you may want to add it, but even that doesn't guarantee that the path will work.
msg387556 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-23 09:21
ntpath.realpath() has since been implemented, and it addresses the problem by keeping the extended (verbatim) path prefix in the result if the input argument has it, and otherwise removing the prefix if the final path resolves correctly without it.
History
Date User Action Args
2021-02-23 09:21:35eryksunsetmessages: + msg387556
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.4, Python 3.5
2014-09-11 19:50:24steve.dowersetmessages: + msg226799
2014-09-11 19:14:32pitrousetmessages: + msg226796
2014-09-08 15:49:13steve.dowersetmessages: + msg226586
2014-09-07 18:03:52eryksunsetmessages: + msg226540
2014-09-07 12:12:13pitrousetmessages: + msg226533
2014-09-06 23:37:40steve.dowersetmessages: + msg226524
2014-09-06 22:25:44pitrousetmessages: + msg226519
2014-09-06 20:40:30steve.dowersetmessages: + msg226516
2014-09-06 20:11:36pitrousetmessages: + msg226514
2014-09-06 15:04:24steve.dowersetmessages: + msg226493
2014-09-06 15:00:41steve.dowersetmessages: + msg226492
2014-09-06 10:57:31pitrousetmessages: + msg226484
2014-09-06 01:23:36eryksunsetmessages: + msg226462
2014-09-05 21:34:59Kevin.Norrissetmessages: + msg226458
2014-09-05 18:52:08pitrousetmessages: + msg226441
stage: needs patch -> patch review
2014-09-05 17:33:44steve.dowersetfiles: + 22299_2.patch

messages: + msg226435
2014-09-05 17:20:10steve.dowersetmessages: + msg226433
2014-09-05 17:17:17eryksunsetmessages: + msg226431
2014-09-05 16:51:28steve.dowersetfiles: + 22299_1.patch
keywords: + patch
messages: + msg226428
2014-08-29 21:02:28pitrousetstage: test needed -> needs patch
messages: + msg226084
versions: + Python 3.5
2014-08-29 20:55:39steve.dowersetmessages: + msg226082
2014-08-29 20:53:29eryksunsetnosy: + eryksun
messages: + msg226081
2014-08-29 20:43:12pitrousetmessages: + msg226077
2014-08-29 20:42:07steve.dowersetmessages: + msg226076
2014-08-29 20:20:48pitrousetnosy: + tim.golden, zach.ware, steve.dower
messages: + msg226072
2014-08-29 20:17:30terry.reedysetnosy: + pitrou

stage: test needed
2014-08-29 13:20:44Kevin.Norrissetmessages: + msg226061
2014-08-29 13:17:08Kevin.Norriscreate