Issue22299
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-08-29 13:17 by Kevin.Norris, last changed 2022-04-11 14:58 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
22299_1.patch | steve.dower, 2014-09-05 16:51 | review | ||
22299_2.patch | steve.dower, 2014-09-05 17:33 |
Messages (28) | |||
---|---|---|---|
msg226060 - (view) | Author: Kevin Norris (Kevin.Norris) | Date: 2014-08-29 13:17 | |
Run Python as an administrator: >>> import pathlib >>> pth = pathlib.Path('//?/C:/foo.') >>> pth.mkdir() >>> pth.resolve().rmdir() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python34\lib\pathlib.py", line 1141, in rmdir self._accessor.rmdir(self) File "C:\Python34\lib\pathlib.py", line 323, in wrapped return strfunc(str(pathobj), *args) FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\foo.' >>> pth.rmdir() You do not need to be an administrator so long as you can create a directory in the requested location, but the \\?\ prefix only works with absolute paths so it's easier to demonstrate in the root of the drive. |
|||
msg226061 - (view) | Author: Kevin Norris (Kevin.Norris) | Date: 2014-08-29 13:20 | |
When the directory name is '...', the error is different: >>> pth = pathlib.Path('//?/C:/...') >>> pth.mkdir() >>> pth.resolve().rmdir() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python34\lib\pathlib.py", line 1141, in rmdir self._accessor.rmdir(self) File "C:\Python34\lib\pathlib.py", line 323, in wrapped return strfunc(str(pathobj), *args) PermissionError: [WinError 5] Access is denied: 'C:\\...' >>> pth.rmdir() >>> |
|||
msg226072 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-08-29 20:20 | |
Why is it a "pathological path"? Can you explain? |
|||
msg226076 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-08-29 20:42 | |
Is resolve() using an *A() API rather than *W()? The \\?\ prefix does not work with *A() APIs. Also, names that are all dots are not supported by Windows at all. I'd expect mkdir() to fail on that, but the \\?\ prefix disables some validation, so it's possible that it is getting through that way. |
|||
msg226077 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-08-29 20:43 | |
resolve() should use the *W APIs since it is using only functions from the os module with str objects. Perhaps you want to double-check that, since I don't have a Windows VM anymore. |
|||
msg226081 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-08-29 20:53 | |
The \\?\ extended-path prefix bypasses normal path processing. The path is passed directly to the filesystem driver. For example, to accommodate the POSIX namespace, NTFS allows any character except NUL and slash, so it happily creates a directory named "foo.". This name is invalid in the Win32 namespace. resolve() should skip calling _ext_to_normal on the result of _getfinalpathname if the input path is extended. http://hg.python.org/cpython/file/c0e311e010fc/Lib/pathlib.py#l178 |
|||
msg226082 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-08-29 20:55 | |
Removing the _ext_to_normal() call in resolve() looks like the right fix to me. |
|||
msg226084 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-08-29 21:02 | |
Does one of you want to provide a patch? (with tests) |
|||
msg226428 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-05 16:51 | |
Patch attached. (Kinda feel like this was too simple...) |
|||
msg226431 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-09-05 17:17 | |
It should only skip _ext_to_normal for an already extended path, i.e. a path that starts with ext_namespace_prefix. Otherwise it needs to call _ext_to_normal. For example: Strip the prefix in this case: >>> os.path._getfinalpathname('C:\\Windows') '\\\\?\\C:\\Windows' but not in this case: >>> os.path._getfinalpathname(r'\\?\GLOBALROOT\Device\HarddiskVolume1\Windows') '\\\\?\\C:\\Windows' |
|||
msg226433 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-05 17:20 | |
Ah, thought it was too simple. I didn't realise that _getfinalpathname adds the prefix. New patch soon. |
|||
msg226435 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-05 17:33 | |
Strips the prefix if it wasn't in the original path - otherwise, keeps it. |
|||
msg226441 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-09-05 18:52 | |
As far as I can say, the patch looks fine to me. Thanks, Steve. |
|||
msg226458 - (view) | Author: Kevin Norris (Kevin.Norris) | Date: 2014-09-05 21:34 | |
I'm a little concerned about this fix. In particular, if I understand the design of the patch correctly, it is intended to produce this behavior: Path('C:/foo').resolve() != Path('//?/C:/foo').resolve() Since both paths are valid and both paths refer to the same file, some developers may find this result counterintuitive. The Path.resolve() docs do not expressly forbid it, however. How many developers assume Path.resolve() is always the same for the same file? |
|||
msg226462 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-09-06 01:23 | |
Maybe for an extended path it could try _getfinalpathname without the prefix. If it isn't a valid path or the result isn't the same as _getfinalpathname including the prefix, then skip calling _ext_to_normal. For example: def resolve(self, path): s = str(path) if not s: return os.getcwd() if _getfinalpathname is not None: prefix, t = self._split_extended_path(s) s = _getfinalpathname(s) if prefix: try: if _getfinalpathname(t) != s: return s except FileNotFoundError: return s return self._ext_to_normal(s) # Means fallback on absolute return None The 'foo.' path in this issue would keep the prefix: >>> Path('//?/C:/foo.').resolve() WindowsPath('//?/C:/foo.') >>> Path('//?/UNC/server/C$/foo.').resolve() WindowsPath('//?/UNC/server/C$/foo.') But regular paths would remove the prefix: >>> Path('//?/C:/bar').resolve() WindowsPath('C:/bar') >>> Path('//?/UNC/server/C$/bar').resolve() WindowsPath('//server/C$/bar') On a related note, _split_extended_path only looks for uppercase "UNC", which makes the above resolve method fail: >>> Path('//?/unc/server/C$/bar').resolve() WindowsPath('//?/UNC/server/C$/bar') |
|||
msg226484 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-09-06 10:57 | |
> Since both paths are valid and both paths refer to the same file, some developers may find this result counterintuitive. On the other hand the proposed solution is regular. If you input an extended path, you get an extended path as output. There are other factors that can come into play, such as hard links under Unix (and perhaps under Windows too). The recommended way to check if two paths point to the same file is still os.path.samefile(). Another approach would be for pathlib to *always* use extended paths internally on Windows absolute paths; I don't know which side effects that could have, though. Note we could also add methods to switch from the extended to the regular form and vice-versa. |
|||
msg226492 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-06 15:00 | |
Actually, I'd be inclined to never use the prefix within pathlib and then add it on if necessary when converting to a string. That would also solve a problem like: >>> p = Path("C:\\") / ('a'*150) / ('a'*150) >>> p.stat() FileNotFoundError: [WinError 3] The system cannot find the path specified: ... >>> p2 = Path("\\\\?\\" + str(p)) >>> p2.stat() os.stat_result(...) The hardest part about this is knowing with certainty whether it's needed. We can certainly detect most cases automatically. Maybe we also need an extra method or a format character to force a str() with prefix? Or maybe having an obvious enough function that people can monkey-patch if necessary - the "\\?\" prefix is an edge case for most people already, and checking the total length would bring that to >99% IME. |
|||
msg226493 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-06 15:04 | |
If anyone wanted to test that really long path, here's the incantation to create it: >>> import os, pathlib >>> os.mkdir("C:\\a") >>> os.mkdir("C:\\a\\" + "a"*150) >>> os.rename("C:\\a", "C:\\" + "a"*150) >>> p = pathlib.Path("C:\\") / ("a"*150) / ("a"*150) |
|||
msg226514 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-09-06 20:11 | |
> I'd be inclined to never use the prefix within pathlib and then add it on if necessary when converting to a string That may be very surprising when that prefix appears, though... At least with explicit methods the user would have to invoke them, instead of getting unexpected results implicitly. I don't know what Windows users think about all this, though (I uses Linux myself). |
|||
msg226516 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-06 20:40 | |
It's no less expected than having OS functions fail because the path is too long. Using it to maintain dots at the end of directory/file names is a little less safe and may break some code. Maybe pathlib should strip these if there is no a prefix? (For example, "C:\Test." == "C:\Test" == "\\?\C:\Test" != "\\?\C:\Test.") If most (or all) of the file handling functions in Python are using *W() APIs and can support the prefix, I'd rather add it in silently if only to avoid the long path issue. It's really the sort of implementation detail that pathlib should be able to hide from the app developer and the user (Node.js does this, for example, as its node_modules hierarchies regularly exceed the max path limitation). Maybe the best approach is to preserve the prefix if it already exists, and add it if it becomes necessary. File operations are most likely to succeed in this case, even if it may be surprising to users. |
|||
msg226519 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-09-06 22:25 | |
> If most (or all) of the file handling functions in Python are using *W() APIs and can support the prefix, I'd rather add it in silently if only to avoid the long path issue. This would only work for fully-qualified paths, right? Not relative ones. I'm all for making things higher-level, I just want to make sure it won't break existing use cases :-) |
|||
msg226524 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-06 23:37 | |
> This would only work for fully-qualified paths, right? Not relative ones. Correct, and I think we're most of the way there with how drives are handled. Since the prefix only works with absolute paths, why not treat it as part of the drive name? |
|||
msg226533 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-09-07 12:12 | |
I'm a bit worried about the consequences still. If you take "C:" and join it with e.g. "foo", you get a drive-relative path. If you take "//?/C:/" and join it with e.g. "foo", you get an absolute path (or, if you remove the drive's trailing slash, you get something that's invalid AFAIK). So the question is: how implicit/explicit will the conversion be, and at which stages will it happen? |
|||
msg226540 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-09-07 18:03 | |
> If you take "//?/C:/" and join it with e.g. "foo", you get an > absolute path (or, if you remove the drive's trailing slash, you > get something that's invalid AFAIK). FYI, DOS device names such as "C:" are NT symlinks. Win32 looks for DOS device links in NT's \Global?? directory, and also per logon-session in \Sessions\0\DosDevices\LOGON_ID. A link named "C:foo" is actually possible: >>> windll.kernel32.DefineDosDeviceW(0, "C:foo", "C:\\Python34") 1 >>> gfpn = os.path._getfinalpathname >>> gfpn(r'\\?\C:foo') '\\\\?\\C:\\Python34' >>> os.listdir(r'\\?\C:foo') ['DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'NEWS.txt', 'python.exe', 'pythonw.exe', 'README.txt', 'Scripts', 'tcl', 'Tools'] GLOBALROOT links to the native NT root: >>> gfpn('\\\\?\\GLOBALROOT\\Global??\C:\\') '\\\\?\\C:\\' >>> gfpn('\\\\?\\GLOBALROOT\\Device\\HarddiskVolume1\\') '\\\\?\\C:\\' >>> gfpn(r'\\?\GLOBALROOT\SystemRoot') '\\\\?\\C:\\Windows' >>> p = r'\\?\GLOBALROOT\Sessions\0\DosDevices\00000000-0f341de9\C:foo' >>> gfpn(p) '\\\\?\\C:\\Python34' Without the \\?\ prefix, "C:foo" is relative to the current directory on the C: drive: >>> os.chdir('C:\\') >>> os.mkdir('foo') >>> os.listdir('C:foo') [] where the current directory on C: is stored in the "=C:" environment variable: >>> buf = (c_wchar * 100)() >>> windll.kernel32.GetEnvironmentVariableW("=C:", buf) 3 >>> buf.value 'C:\\' |
|||
msg226586 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-08 15:49 | |
Right, what the prefix actually means is "treat this path as a blob and don't do any processing". Some of the things that 'processing' includes are: * CWD * invalid names ('foo.' -> 'foo') * adjacent backslashes ('a\\b' -> 'a\b') * forward slashes ('a/b' -> 'a\b') * (probably) short/long file names ('progra~1' -> 'Program Files') A nice side-effect is that you can also use path names longer than 260 characters, provided your path name is correctly normalized already. Really, the test for whether to keep or remove the prefix should be to remove the prefix and try and resolve the path again. If it succeeds, remove the prefix; otherwise, keep it. This can only really be done as part of the resolve() call, which would address the original issue, but it may be quite a perf. hit. I'd still be inclined to add the prefix in str() if the final path length is greater than 260 characters, if only because we go from zero chance of it working to a non-zero chance. Unfortunately, there seems to be no way to process a long path to make it 'safe' to add the prefix (though we can do a few of the things and increase the chances) as GetFinalPathName will not work on a long path. FWIW, paths longer than 260 chars are a mess and everyone knows it, but it's really really hard to fix without breaking back-compat. |
|||
msg226796 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2014-09-11 19:14 | |
> Really, the test for whether to keep or remove the prefix should be to > remove the prefix and try and resolve the path again. If it succeeds, > remove the prefix; otherwise, keep it. This can only really be done as > part of the resolve() call, which would address the original issue, > but it may be quite a perf. hit. It would also be prone to race conditions. All in all it sounds like a bad idea. I still think it should be asked for explicitly. I don't know how the method should be called, .extended() perhaps? |
|||
msg226799 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-09-11 19:50 | |
Another alternative is to always leave the prefix there after calling resolve() (as opposed to the current behaviour which is to always remove it). If the Win32 API says that the path should include the prefix, then it should. There's no reliable way for a developer to decide that an arbitrary path should include the prefix other than by resolving it. I still like the idea of a format character to omit the prefix, as that correctly implies that the only reason you should remove it is for displaying to the user. Alternatively, a ".without_prefix" property seems like a safer route than requiring the user to add it. Long paths are the only time you may want to add it, but even that doesn't guarantee that the path will work. |
|||
msg387556 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-02-23 09:21 | |
ntpath.realpath() has since been implemented, and it addresses the problem by keeping the extended (verbatim) path prefix in the result if the input argument has it, and otherwise removing the prefix if the final path resolves correctly without it. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:07 | admin | set | github: 66495 |
2021-02-23 09:21:35 | eryksun | set | messages:
+ msg387556 versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.4, Python 3.5 |
2014-09-11 19:50:24 | steve.dower | set | messages: + msg226799 |
2014-09-11 19:14:32 | pitrou | set | messages: + msg226796 |
2014-09-08 15:49:13 | steve.dower | set | messages: + msg226586 |
2014-09-07 18:03:52 | eryksun | set | messages: + msg226540 |
2014-09-07 12:12:13 | pitrou | set | messages: + msg226533 |
2014-09-06 23:37:40 | steve.dower | set | messages: + msg226524 |
2014-09-06 22:25:44 | pitrou | set | messages: + msg226519 |
2014-09-06 20:40:30 | steve.dower | set | messages: + msg226516 |
2014-09-06 20:11:36 | pitrou | set | messages: + msg226514 |
2014-09-06 15:04:24 | steve.dower | set | messages: + msg226493 |
2014-09-06 15:00:41 | steve.dower | set | messages: + msg226492 |
2014-09-06 10:57:31 | pitrou | set | messages: + msg226484 |
2014-09-06 01:23:36 | eryksun | set | messages: + msg226462 |
2014-09-05 21:34:59 | Kevin.Norris | set | messages: + msg226458 |
2014-09-05 18:52:08 | pitrou | set | messages:
+ msg226441 stage: needs patch -> patch review |
2014-09-05 17:33:44 | steve.dower | set | files:
+ 22299_2.patch messages: + msg226435 |
2014-09-05 17:20:10 | steve.dower | set | messages: + msg226433 |
2014-09-05 17:17:17 | eryksun | set | messages: + msg226431 |
2014-09-05 16:51:28 | steve.dower | set | files:
+ 22299_1.patch keywords: + patch messages: + msg226428 |
2014-08-29 21:02:28 | pitrou | set | stage: test needed -> needs patch messages: + msg226084 versions: + Python 3.5 |
2014-08-29 20:55:39 | steve.dower | set | messages: + msg226082 |
2014-08-29 20:53:29 | eryksun | set | nosy:
+ eryksun messages: + msg226081 |
2014-08-29 20:43:12 | pitrou | set | messages: + msg226077 |
2014-08-29 20:42:07 | steve.dower | set | messages: + msg226076 |
2014-08-29 20:20:48 | pitrou | set | nosy:
+ tim.golden, zach.ware, steve.dower messages: + msg226072 |
2014-08-29 20:17:30 | terry.reedy | set | nosy:
+ pitrou stage: test needed |
2014-08-29 13:20:44 | Kevin.Norris | set | messages: + msg226061 |
2014-08-29 13:17:08 | Kevin.Norris | create |