Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.exists() ought to return False if pathname contains NUL #77902

Closed
pacujo mannequin opened this issue May 31, 2018 · 14 comments
Closed

os.path.exists() ought to return False if pathname contains NUL #77902

pacujo mannequin opened this issue May 31, 2018 · 14 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@pacujo
Copy link
Mannequin

pacujo mannequin commented May 31, 2018

BPO 33721
Nosy @ericvsmith, @stevendaprano, @bitdancer, @serhiy-storchaka, @eryksun
PRs
  • bpo-33721: Make some os.path functions and pathlib.Path methods be tolerant to invalid paths.  #7695
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-09-18.08:34:24.841>
    created_at = <Date 2018-05-31.17:19:25.905>
    labels = ['3.8', 'type-feature', 'library']
    title = 'os.path.exists() ought to return False if pathname contains NUL'
    updated_at = <Date 2018-09-18.08:34:24.839>
    user = 'https://bugs.python.org/pacujo'

    bugs.python.org fields:

    activity = <Date 2018-09-18.08:34:24.839>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-09-18.08:34:24.841>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2018-05-31.17:19:25.905>
    creator = 'pacujo'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33721
    keywords = ['patch']
    message_count = 14.0
    messages = ['318334', '318338', '318376', '318431', '318432', '318463', '318487', '319512', '319513', '319535', '319561', '319562', '325622', '325623']
    nosy_count = 7.0
    nosy_names = ['eric.smith', 'mrabarnett', 'steven.daprano', 'r.david.murray', 'serhiy.storchaka', 'eryksun', 'pacujo']
    pr_nums = ['7695']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33721'
    versions = ['Python 3.8']

    @pacujo
    Copy link
    Mannequin Author

    pacujo mannequin commented May 31, 2018

    os.path.exists() returns True or False for all imaginable string arguments except for one that contains NUL ("\0") (Linux). This behavior is not documented in the library. Moreover, it can easily lead to accidents if an externally supplied pathname were to contain a NUL because most test suites would not try to cover such a pathname.

    I propose os.path.exists() should return False even in this case.

    @pacujo pacujo mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels May 31, 2018
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented May 31, 2018

    It also raises a ValueError on Windows. For other invalid paths on Windows it returns False.

    @ericvsmith
    Copy link
    Member

    To be clear: os.path.exists('a\x00b') raises ValueError on both Windows and Linux.

    I think we should just document this behavior and not change it.

    @bitdancer
    Copy link
    Member

    I seem to recall that this ValueError behavior was discussed at some length and it is the desired behavior. (At some previous point I think everything after the NUL was ignored.) I'm not really sure why it needs to be documented either, NULs are invalid in filenames, aren't they? Or are there some OSes that allow them?

    @ericvsmith
    Copy link
    Member

    I don't know of any OS that supports NULs in filenames (not that my knowledge is encyclopedic).

    My reason for suggesting we document it is that os.path.exists() returns False for otherwise invalid filenames, where something like open() raises. On Windows:

    >>> os.path.exists('c::bar')
    False
    
    >>> open('c::bar')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 22] Invalid argument: 'c::bar'

    So I do think it's a little surprising that os.path.exists() would raise with a NUL, instead of returning False. But I don't think it's worth changing the behavior, due to potential (though unlikely) breakage.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 1, 2018

    It has to be a ValueError since the error is an invalid parameter at the Python level. Similarly, in 3.6+ when using bytes paths in Windows, for which Python uses UTF-8 as the file-system encoding, os.path.exists() may raise UnicodeDecodeError:

        >>> try:
        ...     os.path.exists(b'\xc8spam')
        ... except ValueError as e: err = e
        ...
        >>> err
        UnicodeDecodeError('utf-8', b'\xc8spam', 0, 1, 'invalid continuation byte')

    There's no practical support for NUL in paths in any OS as far as I know. In principle, the native NT API of Windows allows NUL in device names since it uses counted strings. However, Python only supports the Windows API, which uses null-terminated strings. So, practically speaking, not supporting NUL in paths is not an issue.

    Here's an example of using an NT device name that contains a NUL character. It creates a DOS 'device' named "A\x00B" that targets "C:\\Temp". It's like a SUBST drive, but with any device name instead of just a drive-letter name. I omitted the ctypes definitions for brevity.

        obja = OBJECT_ATTRIBUTES("\\??\\A\x00B")
        target = UNICODE_STRING("\\??\\C:\\Temp")
        handle = ctypes.c_void_p()
        NtCreateSymbolicLinkObject(ctypes.byref(handle), GENERIC_ALL,
            ctypes.byref(obja), ctypes.byref(target))

    Query the timestamps of the "A\x00B" device:

        info = (ctypes.c_ulonglong * 5)()
        NtQueryAttributesFile(ctypes.byref(obja), info)
    
        times = (ctypes.c_int * 4)()
        for i in range(4):
            RtlTimeToSecondsSince1970(ctypes.byref(info, i*8),
                ctypes.byref(times, i*4))
        
    Verify that the creation, last access, and last write times are the same as "C:\\Temp":
    
        s = os.stat('C:\\Temp')
        times2 = [int(x) for x in (s.st_ctime, s.st_atime, s.st_mtime)]
        >>> times[:3] == times2
        True

    (The fourth timestamp is the change time, which isn't supported in Windows Python for legacy reasons that also relate to the bizarre use of st_ctime for the creation time, instead of st_birthtime.)

    @serhiy-storchaka
    Copy link
    Member

    In earlier versions NULs in paths caused silent truncating the path, and this is considered a vulnerability.

    In 2.7 and earlier versions of 3.x a TypeError was raised in os.path.exists(). ValueError in this function (and many others) is raised since 3.5, because it looks more appropriate.

    Similar exceptions are raised perhaps in hundreds functions. It is impossible to document them all, and os.path.exists() doesn't look special enough for documenting this only for it.

    However os.path.exists() is special in the sense that this exception can be interpreted as a false value. Since os.path.exists() already catches OSError and interprets it as a false result, it is easier to add a ValueError here. I don't think this will break much code if add it only in 3.8. This will cover also the case of unencodable/undecodable paths ('\udfff', b'\x98').

    @stevendaprano
    Copy link
    Member

    Eric wrote:

    I don't know of any OS that supports NULs in filenames

    HFS, HFS Plus, and Apple File System all support NULs in filenames.

    HFS Plus volumes include a special special directory called the metadata directory, in the volume's root directory, called "\0\0\0\0HFS+ Private Data".

    https://developer.apple.com/library/archive/technotes/tn/tn1150.html#HFSPlusNames

    There are, I believe, Carbon APIs for checking for file names which do not rely on NUL-terminated strings (they use an array of Unicode characters with an explicit length), but I don't know enough about OS X APIs to know if they are current generation.

    @stevendaprano
    Copy link
    Member

    Eryk Sun says:

    It has to be a ValueError since the error is an invalid parameter at the Python level.

    How does the first follow from the second?

    Strings with NULs in them aren't errors or invalid parameters at the Python level, and they are legal file names in at least some file systems.

    Jython does this:

    >>> import os
    >>> os.path.exists('/tmp/foo\0bar')
    False
    >>> os.stat('/tmp/foo\0bar')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 2] No such file or directory: '/tmp/foo\x00bar'

    As far as I am concerned, raising ValueError is simply a bug. The documentation for the os module clearly states:

    All functions in this module raise OSError in the case of
    invalid or inaccessible file names and paths, or other
    arguments that have the correct type, but are not accepted
    by the operating system.
    

    I don't believe there is any good reason for singling out NULs for a different exception from other invalid file names like ">" on NTFS.

    This ought to be an OSError for functions like os.stat and False for os.path.exists, as Jython does.

    @serhiy-storchaka
    Copy link
    Member

    This change looks desirable to me. But it looks too large for backporting it to maintained versions.

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jun 14, 2018
    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 14, 2018

    > It has to be a ValueError since the error is an invalid
    > parameter at the Python level.

    How does the first follow from the second?

    I only meant that, as an honest error, it has to be ValueError. I didn't think about raising a fake OSError.

    Note that I didn't say the ValueError shouldn't be ignored by os.path.exists (et al). In the spirit of the current function, it probably should be ignored. For example, it returns False for paths that exist but are inaccessible.

    I don't believe there is any good reason for singling out NULs
    for a different exception from other invalid file names
    like ">" on NTFS.

    This ought to be an OSError for functions like os.stat and False
    for os.path.exists, as Jython does.

    Python can't pass a string that contains NUL characters to POSIX and Windows APIs that use null-terminated strings. That would yield wildly unpredictable results. We need this to be a reliable error. So for the low-level file I/O functions to return an OSError here, it would have to be a bit of a lie (i.e. an 'OS' error without making a system call and without an errno and/or winerror value). Maybe it could raise an InvalidFilename subclass of OSError. This could even handle some actual OS errors such as POSIX ENAMETOOLONG and Windows ERROR_INVALID_NAME, ERROR_BAD_PATHNAME, and ERROR_FILENAME_EXCED_RANGE.

    @pacujo
    Copy link
    Mannequin Author

    pacujo mannequin commented Jun 14, 2018

    Eryk Sun:

    I only meant that, as an honest error, it has to be ValueError. I didn't
    think about raising a fake OSError.

    Note that I didn't say the ValueError shouldn't be ignored by
    os.path.exists (et al). In the spirit of the current function, it
    probably should be ignored. For example, it returns False for paths that
    exist but are inaccessible.

    For the original complaint of mine, catching ValueError would work. I
    must say, though, that Steven's arguments for raising a fake OSError are
    very convincing.

    Steven D'Aprano:
    > Jython does this:
    > 
    > >>> import os
    > >>> os.path.exists('/tmp/foo\0bar')
    > False
    > >>> os.stat('/tmp/foo\0bar')
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    > OSError: [Errno 2] No such file or directory: '/tmp/foo\x00bar'
    > 
    > 
    > As far as I am concerned, raising ValueError is simply a bug. The
    > documentation for the os module clearly states:
    > 
    >     All functions in this module raise OSError in the case of
    >     invalid or inaccessible file names and paths, or other
    >     arguments that have the correct type, but are not accepted
    >     by the operating system.

    Now the question is not anymore if and how CPython should be fixed but
    if and how Jython should be fixed.

    IMO, Jython is doing the right thing. If that is not true, then Jython
    must be declared buggy.

    Maybe it could raise an InvalidFilename subclass of OSError. This
    could even handle some actual OS errors such as POSIX ENAMETOOLONG and
    Windows ERROR_INVALID_NAME, ERROR_BAD_PATHNAME, and
    ERROR_FILENAME_EXCED_RANGE.

    Maybe. You'll still need OSError.errno to hold a true error value.

    Marko

    @serhiy-storchaka
    Copy link
    Member

    New changeset 0185f34 by Serhiy Storchaka in branch 'master':
    bpo-33721: Make some os.path functions and pathlib.Path methods be tolerant to invalid paths. (bpo-7695)
    0185f34

    @serhiy-storchaka
    Copy link
    Member

    os.path.exists() and similar functions will return now False for invalid paths (non-encodable paths on Unix, non-decodable bytes paths on Windows, and paths containing null characters or bytes).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants