classification
Title: os.stat() on Windows succeeds for nonexistent paths with trailing spaces
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, gaborjbernat, laloch, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-04-09 14:25 by laloch, last changed 2020-04-10 13:57 by eryksun.

Messages (6)
msg366054 - (view) Author: David Strobach (laloch) Date: 2020-04-09 14:25
On Windows (Server 2012 R2 in my case) os.stat() seems to be striping significant trailing spaces off the path argument:

>>> import os
>>> os.stat("c:\\Program Files     ")
os.stat_result(st_mode=16749, st_ino=281474976710717, st_dev=173025906, st_nlink=1, st_uid=0, st_gid=0, st_size=8192, st_atime=1586154685, st_mtime=1586154685, st_ctime=1377178576)
>>> os.stat("c:\\Program Files\\     ")
os.stat_result(st_mode=16749, st_ino=281474976710717, st_dev=173025906, st_nlink=1, st_uid=0, st_gid=0, st_size=8192, st_atime=1586154685, st_mtime=1586154685, st_ctime=1377178576)
>>> # consequently
>>> os.path.isdir("c:\\Program Files\\     ")
True
>>> os.path.isdir("c:\\Program Files     ")
True
>>> os.scandir("c:\\Program Files     ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'c:\\Program Files     '

The same also applies to regular files, not just directories.
msg366063 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-09 15:19
> os.stat() seems to be striping significant trailing spaces off 
> the path argument

The Windows file API normalizes paths to replace forward slashes with backslashes; resolve relative paths and "." and ".." components; strip trailing spaces and dots from the final component; and map reserved DOS device names in the final component of non-UNC paths to device paths (e.g. "C:/Temp/con " -> r"\\.\con").

Use a "\\?\" extended device path to bypass normalization, e.g. r"\\?\C:\Program Files     ". Because an extended path doesn't get normalized in an open or create context, it should only use backslash as the path separator and should be fully qualified, without any "." and ".." components. Also, a non-device UNC path has to explicitly use the "UNC" device, e.g. "//server/share/spam... " -> r"\\?\UNC\server\share\spam... ".

That said, I strongly advise against using an extended path to create or rename files to use reserved DOS device names or trailing dots and spaces. Such filenames will be inaccessible by most applications.
msg366111 - (view) Author: gaborjbernat (gaborjbernat) * Date: 2020-04-10 08:13
While I agree that Windows is safe to transform paths as he wishes to, the bug reported here is that os.stat/os.path.isdir behaves differently than os.scandir. Can we make them behave the same?
msg366120 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-10 12:06
> While I agree that Windows is safe to transform paths as he wishes to, 
> the bug reported here is that os.stat/os.path.isdir behaves 
> differently than os.scandir. Can we make them behave the same?

os.listdir and os.scandir can be modified to behave like os.stat, but not the other way around. They differ because a "*.*" wildcard component is appended to the path that's passed to FindFirstFileW, and trailing spaces and dots only get stripped in the final path component. 

To implement this without hard-coding Windows filename rules, the path needs to be normalized via WINAPI GetFullPathNameW before appending the "*.*" component (or just "*"; the ".*" is superfluous) -- but only for normal paths, i.e. paths that do not begin with exactly "\\\\?\\". The functions that would need to be updated are _listdir_windows_no_opendir and os_scandir_impl in Modules/posixmodule.c.

Another option would be to rewrite listdir and scandir to use CreateFileW and GetFileInformationByHandleEx: FileIdBothDirectoryInfo [1]. This query provides two additional fields in comparison to the classic find data: ChangeTime (Unix st_ctime) and FileId (Unix st_ino). If the file is flagged as a reparse point in its attributes, then the reparse tag is set in the EaSize field of the directory info, since extended attributes can't be set on a reparse point; see [MS-FSCC] 2.4.17 [2].

[1]: https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_both_dir_info
[2]: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/1e144bff-c056-45aa-bd29-c13d214ee2ba
msg366123 - (view) Author: David Strobach (laloch) Date: 2020-04-10 13:18
Hi Eryk, thanks for your time and for the explanation.

> The Windows file API normalizes paths to replace forward slashes with backslashes; resolve relative paths and "." and ".." components; strip trailing spaces and dots from the final component; and map reserved DOS device names in the final component of non-UNC paths to device paths (e.g. "C:/Temp/con " -> r"\\.\con").

OK, I understand. I know that Win32 documentation suggests to avoid using paths with trailing spaces and that the paths are subject to normalization. Then I'd say os.path.normpath() should  perform the same (GetFullPathNameW?) normalization as os.stat() and friends do.

Currently:

>>> import os
>>> path = r"c:\Program Files "
>>> os.path.normpath(path)
'c:\\Program Files '
>>> os.path.realpath(path)
'C:\\Program Files'
msg366125 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-04-10 13:57
> I'd say os.path.normpath() should  perform the same (GetFullPathNameW?)
> normalization as os.stat() and friends do.

ntpath.abspath calls GetFullPathNameW (i.e. nt._getfullpathname) in Windows, but ntpath.normpath is pure Python. I agree that normpath should trim trailing spaces and dots from the last component. It should also normalize device paths and extended paths that start with "\\\\.\\" and "\\\\?\\". An extended path only skips normalization in an open or create context.
History
Date User Action Args
2020-04-10 13:57:20eryksunsetmessages: + msg366125
2020-04-10 13:18:28lalochsetmessages: + msg366123
2020-04-10 12:06:33eryksunsetstatus: closed -> open
resolution: not a bug ->
messages: + msg366120

stage: resolved ->
2020-04-10 08:13:46gaborjbernatsetnosy: + gaborjbernat
messages: + msg366111
2020-04-09 15:19:47eryksunsetstatus: open -> closed
versions: + Python 3.9, - Python 3.6, Python 3.7
messages: + msg366063

resolution: not a bug
stage: resolved
2020-04-09 14:42:59xtreaksetnosy: + eryksun
2020-04-09 14:25:30lalochcreate