Title: pathlib is_reserved fails for some reserved paths on Windows
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.11, Python 3.10, Python 3.9
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barneygale, eryksun, lukasz.langa, miss-islington, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2016-08-22 09:33 by eryksun, last changed 2021-07-28 15:17 by lukasz.langa. This issue is now closed.

File name Uploaded Description Edit
issue_27827_01.patch eryksun, 2016-09-12 05:08 review
Pull Requests
URL Status Linked Edit
PR 26698 merged barneygale, 2021-06-12 20:39
PR 27421 merged miss-islington, 2021-07-28 14:28
PR 27422 merged miss-islington, 2021-07-28 14:28
Messages (13)
msg273344 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-08-22 09:33
pathlib._WindowsFlavour.is_reserved assumes Windows uses an exact match up to the file extension for reserved DOS device names. However, this misses cases involving trailing spaces and colons, such as the following examples:

Trailing colon:

    >>> pathlib.Path('C:/foo/NUL:').is_reserved()
    >>> print(os.path._getfullpathname('C:/foo/NUL:'))

Trailing spaces:

    >>> pathlib.Path('C:/foo/NUL  ').is_reserved()
    >>> print(os.path._getfullpathname('C:/foo/NUL  '))

Trailing spaces followed by a file extension:

    >>> pathlib.Path('C:/foo/NUL  .txt').is_reserved()
    >>> print(os.path._getfullpathname('C:/foo/NUL  .txt'))

Windows calls RtlIsDosDeviceName_Ustr to check whether a path represents a DOS device name. Here's a link to the reverse-engineered implementation of this function in ReactOS 4.1:

The ReactOS implementation performs the following steps:

    * Return false for a UNC or unknown path type or an empty path.
    * Strip a final ":" if present. Return false if it was the
      only character.
    * Strip trailing dot and space characters.
    * Iterate over the path in reverse. If the current character is
      a "\\" or "/" or a ":" drive letter separator (at index 1),
      then if the next character matches the first letter of a DOS
      device name, splice out the base name as a potential match.
      Else return false.
    * Return false if the first character at this point does not 
      match the first letter of a DOS device name.
    * Remove the file extension, starting at the first dot or colon.
    * Remove trailing spaces.
    * Return the name offset and length if it equals
        * "COM" or "LPT" plus a digit
        * "PRN", "AUX", "NUL, or "CON"
    * Else return false.

It seems that ":" and "." are effectively equivalent for the purposes of is_reserved. Given this is the case, it could return whether parts[-1].partition('.')[0].partition(':')[0].rstrip(' ').upper() is in self.reserved_names. Or maybe use a regex for the entire check.

If a script is running on Windows, I think the best approach is to call os.path.abspath, which calls _getfullpathname. This lets Windows itself determine if the path maps to the \\.\ device namespace. However, I realize that is_reserved is intended to be cross-platform.

By the way, the comment for this method says that r"foo\NUL" isn't reserved, but it is. Maybe the author checked by trying to open NUL in a non-existing foo directory. DOS device names are only reserved in practice when opening and creating files in existing directories (as opposed to reserved in principle with GetFullPathName, which doesn't check for a valid path). NT can thus return an error that's consistent with how DOS behaved in the 1980s -- because that's really important, you know.
msg273761 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-08-27 06:42
Also, "CONIN$" and "CONOUT$" need to be added to the list of reserved names. Prior to Windows 8 these two names are reserved only for the current directory, which for the most part also applies to "CON". 

For Windows 8+, the redesign to use a real console device means that these three console devices are handled in exactly the same way as the other reserved DOS device names. For example:

Windows 10

    >>> print(os.path.abspath('C:/Temp/conout$  : spam . eggs'))

Windows 7

    >>> print(os.path.abspath('C:/Temp/conout$  : spam . eggs'))
    C:\Temp\conout$  : spam . eggs
msg275964 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-12 05:08
The attached patch adds tests and the suggested enhancement to _WindowsFlavour.is_reserved. 

Shouldn't it also return True if the name contains ASCII control characters? They're only valid in NTFS stream names. Also, I think a name containing a colon that's not part of a DOS drive letter spec should be considered reserved. Otherwise it could designate an NTFS named stream (e.g. "path\filename:streamname:$DATA"), which is rarely desired and not universally supported, e.g. FAT32 doesn't support file streams. I'm thinking of a program that calls this method to ensure that a path is reasonably 'safe' for use on Windows -- i.e. isn't inherently invalid and won't do something surprising like open NUL or write to a named stream.
msg290515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-26 08:04
Are 'COM\u0661' or 'COM\u2074' reserved names?
msg290526 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-03-26 12:54
For COM[n] and LPT[n], only ASCII 1-9 and superscript 1-3 (U+00b9, U+00b2, and U+00b3) are handled as decimal digits. For example:

    >>> print(*(ascii(chr(c)) for c in range(1, 65536)
    ...     if _getfullpathname('COM%s' % chr(c))[0] == '\\'), sep=', ')
    '1', '2', '3', '4', '5', '6', '7', '8', '9', '\xb2', '\xb3', '\xb9'

The implementation uses iswdigit in ntdll.dll. (ntdll.dll is the system DLL that has the user-mode runtime library and syscall stubs -- except the Win32k syscall stubs are in win32u.dll.) ntdll's private CRT uses the C locale (Latin-1, not just ASCII), and it classifies these superscript digits as decimal digits:

    >>> ntdll = ctypes.WinDLL('ntdll')
    >>> print(*(chr(c) for c in range(1, 65536) if ntdll.iswdigit(c)))
    0 1 2 3 4 5 6 7 8 9 ² ³ ¹

Unicode, and thus Python, does not classify these superscript digits as decimal digits, so I just hard-coded the list. 

Here's an example with an attached debugger to show the runtime library calling iswdigit:

    >>> name = 'COM\u2074'
    >>> _getfullpathname(name)

    Breakpoint 0 hit
    00007ffe`9ad89d90 ba04000000      mov     edx,4
    0:000> kc 6
    Call Site

The argument is in register rcx:

    0:000> r rcx

Skip to the ret instruction, and check the result in register rax:

    0:000> pt
    00007ffe`9ad89e40 c3              ret
    0:000> r rax
    0:000> g

Since U+2074 isn't considered a decimal digit, 'COM⁴' is not a reserved DOS device name. The system handles it as a regular filename:

msg290530 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-26 13:55
Thanks for the estimation Eryk. Can you create a pull request for your patch?
msg378151 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-07 08:41
See also bpo-36534 "tarfile: handling Windows (path) illegal characters in archive member names".
msg378153 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-07 08:43
See also bpo-37517 "Improve error messages for Windows reserved file names".
msg395708 - (view) Author: Barney Gale (barneygale) * Date: 2021-06-12 20:42
I've put Eryk's patch up as a PR:
msg398390 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-07-28 14:28
New changeset 56c1f6d7edad454f382d3ecb8cdcff24ac898a50 by Barney Gale in branch 'main':
bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698)
msg398394 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-07-28 15:01
New changeset 8789add99164177f29a8cd319a834187c65ab16c by Miss Islington (bot) in branch '3.10':
bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698) (GH-27421)
msg398396 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-07-28 15:15
New changeset debb751f11f5221eafcdf07a14c7e9408350ad9a by Miss Islington (bot) in branch '3.9':
bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698) (#27422)
msg398397 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-07-28 15:17
Barney, thanks for pushing this across the finish line! ✨ 🍰 ✨  

And of course, Eryk for the report and original patch.
Date User Action Args
2021-07-28 15:17:51lukasz.langasetstatus: open -> closed
versions: + Python 3.11, - Python 3.8
messages: + msg398397

resolution: fixed
stage: patch review -> resolved
2021-07-28 15:15:55lukasz.langasetmessages: + msg398396
2021-07-28 15:01:56lukasz.langasetmessages: + msg398394
2021-07-28 14:28:30miss-islingtonsetpull_requests: + pull_request25950
2021-07-28 14:28:25miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request25949
2021-07-28 14:28:23lukasz.langasetnosy: + lukasz.langa
messages: + msg398390
2021-06-12 20:42:30barneygalesetmessages: + msg395708
2021-06-12 20:39:52barneygalesetnosy: + barneygale

pull_requests: + pull_request25283
stage: patch review
2021-03-13 11:29:59vstinnersetnosy: - vstinner
2021-03-12 23:16:06eryksunsetstage: needs patch -> (no value)
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.5, Python 3.6
2020-10-07 08:43:49vstinnersetmessages: + msg378153
2020-10-07 08:41:22vstinnersetnosy: + vstinner
messages: + msg378151
2017-03-26 13:55:42serhiy.storchakasetmessages: + msg290530
2017-03-26 12:54:32eryksunsetmessages: + msg290526
2017-03-26 08:04:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg290515
2016-09-12 05:08:37eryksunsetfiles: + issue_27827_01.patch
keywords: + patch
messages: + msg275964
2016-08-27 06:42:44eryksunsetmessages: + msg273761
2016-08-22 09:33:48eryksuncreate