Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib is_reserved fails for some reserved paths on Windows #72014

Closed
eryksun opened this issue Aug 22, 2016 · 13 comments
Closed

pathlib is_reserved fails for some reserved paths on Windows #72014

eryksun opened this issue Aug 22, 2016 · 13 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@eryksun
Copy link
Contributor

eryksun commented Aug 22, 2016

BPO 27827
Nosy @pfmoore, @tjguk, @ambv, @zware, @serhiy-storchaka, @eryksun, @zooba, @miss-islington, @barneygale
PRs
  • bpo-27827: pathlib: identify a greater range of reserved filename on Windows. #26698
  • [3.10] bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698) #27421
  • [3.9] bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698) #27422
  • Files
  • issue_27827_01.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-07-28.15:17:51.384>
    created_at = <Date 2016-08-22.09:33:47.875>
    labels = ['type-bug', '3.9', '3.10', '3.11', 'library', 'OS-windows']
    title = 'pathlib is_reserved fails for some reserved paths on Windows'
    updated_at = <Date 2021-07-28.15:17:51.382>
    user = 'https://github.com/eryksun'

    bugs.python.org fields:

    activity = <Date 2021-07-28.15:17:51.382>
    actor = 'lukasz.langa'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-07-28.15:17:51.384>
    closer = 'lukasz.langa'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2016-08-22.09:33:47.875>
    creator = 'eryksun'
    dependencies = []
    files = ['44588']
    hgrepos = []
    issue_num = 27827
    keywords = ['patch']
    message_count = 13.0
    messages = ['273344', '273761', '275964', '290515', '290526', '290530', '378151', '378153', '395708', '398390', '398394', '398396', '398397']
    nosy_count = 9.0
    nosy_names = ['paul.moore', 'tim.golden', 'lukasz.langa', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'miss-islington', 'barneygale']
    pr_nums = ['26698', '27421', '27422']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27827'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @eryksun
    Copy link
    Contributor Author

    eryksun commented Aug 22, 2016

    pathlib._WindowsFlavour.is_reserved assumes Windows uses an exact match up to the file extension for reserved DOS device names. However, this misses cases involving trailing spaces and colons, such as the following examples:

    Trailing colon:

        >>> pathlib.Path('C:/foo/NUL:').is_reserved()
        False
        >>> print(os.path._getfullpathname('C:/foo/NUL:'))
        \\.\NUL

    Trailing spaces:

        >>> pathlib.Path('C:/foo/NUL  ').is_reserved()
        False
        >>> print(os.path._getfullpathname('C:/foo/NUL  '))
        \\.\NUL

    Trailing spaces followed by a file extension:

        >>> pathlib.Path('C:/foo/NUL  .txt').is_reserved()
        False
        >>> print(os.path._getfullpathname('C:/foo/NUL  .txt'))
        \\.\NUL

    Windows calls RtlIsDosDeviceName_Ustr to check whether a path represents a DOS device name. Here's a link to the reverse-engineered implementation of this function in ReactOS 4.1:

    http://code.reactos.org/browse/reactos/branches/ros-branch-0_4_1/reactos/sdk/lib/rtl/path.c?r=71210#to85

    The ReactOS implementation performs the following steps:

    * Return false for a UNC or unknown path type or an empty path.
    * Strip a final ":" if present. Return false if it was the
      only character.
    * Strip trailing dot and space characters.
    * Iterate over the path in reverse. If the current character is
      a "\\" or "/" or a ":" drive letter separator (at index 1),
      then if the next character matches the first letter of a DOS
      device name, splice out the base name as a potential match.
      Else return false.
    * Return false if the first character at this point does not 
      match the first letter of a DOS device name.
    * Remove the file extension, starting at the first dot or colon.
    * Remove trailing spaces.
    * Return the name offset and length if it equals
        * "COM" or "LPT" plus a digit
        * "PRN", "AUX", "NUL, or "CON"
    * Else return false.
    

    It seems that ":" and "." are effectively equivalent for the purposes of is_reserved. Given this is the case, it could return whether parts[-1].partition('.')[0].partition(':')[0].rstrip(' ').upper() is in self.reserved_names. Or maybe use a regex for the entire check.

    If a script is running on Windows, I think the best approach is to call os.path.abspath, which calls _getfullpathname. This lets Windows itself determine if the path maps to the \\.\ device namespace. However, I realize that is_reserved is intended to be cross-platform.

    By the way, the comment for this method says that r"foo\NUL" isn't reserved, but it is. Maybe the author checked by trying to open NUL in a non-existing foo directory. DOS device names are only reserved in practice when opening and creating files in existing directories (as opposed to reserved in principle with GetFullPathName, which doesn't check for a valid path). NT can thus return an error that's consistent with how DOS behaved in the 1980s -- because that's really important, you know.

    @eryksun eryksun added stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Aug 22, 2016
    @eryksun
    Copy link
    Contributor Author

    eryksun commented Aug 27, 2016

    Also, "CONIN$" and "CONOUT$" need to be added to the list of reserved names. Prior to Windows 8 these two names are reserved only for the current directory, which for the most part also applies to "CON".

    For Windows 8+, the redesign to use a real console device means that these three console devices are handled in exactly the same way as the other reserved DOS device names. For example:

    Windows 10

        >>> print(os.path.abspath('C:/Temp/conout$  : spam . eggs'))
        \\.\conout$

    Windows 7

        >>> print(os.path.abspath('C:/Temp/conout$  : spam . eggs'))
        C:\Temp\conout$  : spam . eggs

    @eryksun
    Copy link
    Contributor Author

    eryksun commented Sep 12, 2016

    The attached patch adds tests and the suggested enhancement to _WindowsFlavour.is_reserved.

    Shouldn't it also return True if the name contains ASCII control characters? They're only valid in NTFS stream names. Also, I think a name containing a colon that's not part of a DOS drive letter spec should be considered reserved. Otherwise it could designate an NTFS named stream (e.g. "path\filename:streamname:$DATA"), which is rarely desired and not universally supported, e.g. FAT32 doesn't support file streams. I'm thinking of a program that calls this method to ensure that a path is reasonably 'safe' for use on Windows -- i.e. isn't inherently invalid and won't do something surprising like open NUL or write to a named stream.

    @serhiy-storchaka
    Copy link
    Member

    Are 'COM\u0661' or 'COM\u2074' reserved names?

    @eryksun
    Copy link
    Contributor Author

    eryksun commented Mar 26, 2017

    For COM[n] and LPT[n], only ASCII 1-9 and superscript 1-3 (U+00b9, U+00b2, and U+00b3) are handled as decimal digits. For example:

        >>> print(*(ascii(chr(c)) for c in range(1, 65536)
        ...     if _getfullpathname('COM%s' % chr(c))[0] == '\\'), sep=', ')
        '1', '2', '3', '4', '5', '6', '7', '8', '9', '\xb2', '\xb3', '\xb9'

    The implementation uses iswdigit in ntdll.dll. (ntdll.dll is the system DLL that has the user-mode runtime library and syscall stubs -- except the Win32k syscall stubs are in win32u.dll.) ntdll's private CRT uses the C locale (Latin-1, not just ASCII), and it classifies these superscript digits as decimal digits:

        >>> ntdll = ctypes.WinDLL('ntdll')
        >>> print(*(chr(c) for c in range(1, 65536) if ntdll.iswdigit(c)))
        0 1 2 3 4 5 6 7 8 9 ² ³ ¹

    Unicode, and thus Python, does not classify these superscript digits as decimal digits, so I just hard-coded the list.

    Here's an example with an attached debugger to show the runtime library calling iswdigit:

    >>> name = 'COM\u2074'
    >>> _getfullpathname(name)
    
    Breakpoint 0 hit
    ntdll!iswdigit:
    00007ffe`9ad89d90 ba04000000      mov     edx,4
    0:000> kc 6
    Call Site
    ntdll!iswdigit
    ntdll!RtlpIsDosDeviceName_Ustr
    ntdll!RtlGetFullPathName_Ustr
    ntdll!RtlGetFullPathName_UEx
    KERNELBASE!GetFullPathNameW
    python36_d!os__getfullpathname_impl
    

    The argument is in register rcx:

    0:000> r rcx
    rcx=0000000000002074
    

    Skip to the ret instruction, and check the result in register rax:

    0:000> pt
    ntdll!iswctype+0x20:
    00007ffe`9ad89e40 c3              ret
    0:000> r rax
    rax=0000000000000000
    0:000> g
    

    Since U+2074 isn't considered a decimal digit, 'COM⁴' is not a reserved DOS device name. The system handles it as a regular filename:

    'C:\\Temp\\COM⁴'
    

    @serhiy-storchaka
    Copy link
    Member

    Thanks for the estimation Eryk. Can you create a pull request for your patch?

    @vstinner
    Copy link
    Member

    vstinner commented Oct 7, 2020

    See also bpo-36534 "tarfile: handling Windows (path) illegal characters in archive member names".

    @vstinner
    Copy link
    Member

    vstinner commented Oct 7, 2020

    See also bpo-37517 "Improve error messages for Windows reserved file names".

    @eryksun eryksun added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Mar 12, 2021
    @barneygale
    Copy link
    Mannequin

    barneygale mannequin commented Jun 12, 2021

    I've put Eryk's patch up as a PR: #26698

    @ambv
    Copy link
    Contributor

    ambv commented Jul 28, 2021

    New changeset 56c1f6d by Barney Gale in branch 'main':
    bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698)
    56c1f6d

    @ambv
    Copy link
    Contributor

    ambv commented Jul 28, 2021

    New changeset 8789add by Miss Islington (bot) in branch '3.10':
    bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698) (GH-27421)
    8789add

    @ambv
    Copy link
    Contributor

    ambv commented Jul 28, 2021

    New changeset debb751 by Miss Islington (bot) in branch '3.9':
    bpo-27827: identify a greater range of reserved filename on Windows. (GH-26698) (bpo-27422)
    debb751

    @ambv
    Copy link
    Contributor

    ambv commented Jul 28, 2021

    Barney, thanks for pushing this across the finish line! ✨ 🍰 ✨

    And of course, Eryk for the report and original patch.

    @ambv ambv added 3.11 only security fixes and removed 3.8 only security fixes labels Jul 28, 2021
    @ambv ambv closed this as completed Jul 28, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants