classification
Title: os.path.exists() ought to return False if pathname contains NUL
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, eryksun, mrabarnett, pacujo, r.david.murray, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2018-05-31 17:19 by pacujo, last changed 2018-09-18 08:34 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7695 merged serhiy.storchaka, 2018-06-14 18:05
Messages (14)
msg318334 - (view) Author: (pacujo) Date: 2018-05-31 17:19
os.path.exists() returns True or False for all imaginable string arguments except for one that contains NUL ("\0") (Linux). This behavior is not documented in the library. Moreover, it can easily lead to accidents if  an externally supplied pathname were to contain a NUL because most test suites would not try to cover such a pathname.

I propose os.path.exists() should return False even in this case.
msg318338 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2018-05-31 19:21
It also raises a ValueError on Windows. For other invalid paths on Windows it returns False.
msg318376 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-01 09:24
To be clear: os.path.exists('a\x00b') raises ValueError on both Windows and Linux.

I think we should just document this behavior and not change it.
msg318431 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-01 16:37
I seem to recall that this ValueError behavior was discussed at some length and it is the desired behavior.  (At some previous point I think everything after the NUL was ignored.)  I'm not really sure why it needs to be documented either, NULs are invalid in filenames, aren't they?  Or are there some OSes that allow them?
msg318432 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-01 16:53
I don't know of any OS that supports NULs in filenames (not that my knowledge is encyclopedic).

My reason for suggesting we document it is that os.path.exists() returns False for otherwise invalid filenames, where something like open() raises. On Windows:

>>> os.path.exists('c::bar')
False

>>> open('c::bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument: 'c::bar'

So I do think it's a little surprising that os.path.exists() would raise with a NUL, instead of returning False. But I don't think it's worth changing the behavior, due to potential (though unlikely) breakage.
msg318463 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-06-01 21:03
It has to be a ValueError since the error is an invalid parameter at the Python level. Similarly, in 3.6+ when using bytes paths in Windows, for which Python uses UTF-8 as the file-system encoding, os.path.exists() may raise UnicodeDecodeError:

    >>> try:
    ...     os.path.exists(b'\xc8spam')
    ... except ValueError as e: err = e
    ...
    >>> err
    UnicodeDecodeError('utf-8', b'\xc8spam', 0, 1, 'invalid continuation byte')

There's no practical support for NUL in paths in any OS as far as I know. In principle, the native NT API of Windows allows NUL in device names since it uses counted strings. However, Python only supports the Windows API, which uses null-terminated strings. So, practically speaking, not supporting NUL in paths is not an issue.

Here's an example of using an NT device name that contains a NUL character. It creates a DOS 'device' named "A\x00B" that targets "C:\\Temp". It's like a SUBST drive, but with any device name instead of just a drive-letter name. I omitted the ctypes definitions for brevity.

    obja = OBJECT_ATTRIBUTES("\\??\\A\x00B")
    target = UNICODE_STRING("\\??\\C:\\Temp")
    handle = ctypes.c_void_p()
    NtCreateSymbolicLinkObject(ctypes.byref(handle), GENERIC_ALL,
        ctypes.byref(obja), ctypes.byref(target))

Query the timestamps of the "A\x00B" device: 

    info = (ctypes.c_ulonglong * 5)()
    NtQueryAttributesFile(ctypes.byref(obja), info)

    times = (ctypes.c_int * 4)()
    for i in range(4):
        RtlTimeToSecondsSince1970(ctypes.byref(info, i*8),
            ctypes.byref(times, i*4))
    
Verify that the creation, last access, and last write times are the same as "C:\\Temp":

    s = os.stat('C:\\Temp')
    times2 = [int(x) for x in (s.st_ctime, s.st_atime, s.st_mtime)]

    >>> times[:3] == times2
    True

(The fourth timestamp is the change time, which isn't supported in Windows Python for legacy reasons that also relate to the bizarre use of st_ctime for the creation time, instead of st_birthtime.)
msg318487 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-02 04:17
In earlier versions NULs in paths caused silent truncating the path, and this is considered a vulnerability.

In 2.7 and earlier versions of 3.x a TypeError was raised in os.path.exists(). ValueError in this function (and many others) is raised since 3.5, because it looks more appropriate.

Similar exceptions are raised perhaps in hundreds functions. It is impossible to document them all, and os.path.exists() doesn't look special enough for documenting this only for it.

However os.path.exists() is special in the sense that this exception can be interpreted as a false value. Since os.path.exists() already catches OSError and interprets it as a false result, it is easier to add a ValueError here. I don't think this will break much code if add it only in 3.8. This will cover also the case of unencodable/undecodable paths ('\udfff', b'\x98').
msg319512 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-06-14 13:18
Eric wrote:
> I don't know of any OS that supports NULs in filenames

HFS, HFS Plus, and Apple File System all support NULs in filenames.

HFS Plus volumes include a special special directory called the metadata directory, in the volume's root directory, called "\0\0\0\0HFS+ Private Data".

https://developer.apple.com/library/archive/technotes/tn/tn1150.html#HFSPlusNames

There are, I believe, Carbon APIs for checking for file names which do not rely on NUL-terminated strings (they use an array of Unicode characters with an explicit length), but I don't know enough about OS X APIs to know if they are current generation.
msg319513 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-06-14 13:25
Eryk Sun says:
> It has to be a ValueError since the error is an invalid parameter at the Python level.

How does the first follow from the second?

Strings with NULs in them aren't errors or invalid parameters at the Python level, and they are legal file names in at least some file systems.

Jython does this:

>>> import os
>>> os.path.exists('/tmp/foo\0bar')
False
>>> os.stat('/tmp/foo\0bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 2] No such file or directory: '/tmp/foo\x00bar'


As far as I am concerned, raising ValueError is simply a bug. The documentation for the os module clearly states:

    All functions in this module raise OSError in the case of
    invalid or inaccessible file names and paths, or other
    arguments that have the correct type, but are not accepted
    by the operating system.

I don't believe there is any good reason for singling out NULs for a different exception from other invalid file names like ">" on NTFS.

This ought to be an OSError for functions like os.stat and False for os.path.exists, as Jython does.
msg319535 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-14 18:11
This change looks desirable to me. But it looks too large for backporting it to maintained versions.
msg319561 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-06-14 22:35
>> It has to be a ValueError since the error is an invalid 
>> parameter at the Python level.
>
> How does the first follow from the second?

I only meant that, as an honest error, it has to be ValueError. I didn't think about raising a fake OSError.

Note that I didn't say the ValueError shouldn't be ignored by os.path.exists (et al). In the spirit of the current function, it probably should be ignored. For example, it returns False for paths that exist but are inaccessible. 

> I don't believe there is any good reason for singling out NULs 
> for a different exception from other invalid file names 
> like ">" on NTFS.
>
> This ought to be an OSError for functions like os.stat and False 
> for os.path.exists, as Jython does.

Python can't pass a string that contains NUL characters to POSIX and Windows APIs that use null-terminated strings. That would yield wildly unpredictable results. We need this to be a reliable error. So for the low-level file I/O functions to return an OSError here, it would have to be a bit of a lie (i.e. an 'OS' error without making a system call and without an `errno` and/or `winerror` value). Maybe it could raise an InvalidFilename subclass of OSError. This could even handle some actual OS errors such as POSIX ENAMETOOLONG and Windows ERROR_INVALID_NAME, ERROR_BAD_PATHNAME, and ERROR_FILENAME_EXCED_RANGE.
msg319562 - (view) Author: (pacujo) Date: 2018-06-14 22:50
Eryk Sun:
> I only meant that, as an honest error, it has to be ValueError. I didn't
> think about raising a fake OSError.
>
> Note that I didn't say the ValueError shouldn't be ignored by
> os.path.exists (et al). In the spirit of the current function, it
> probably should be ignored. For example, it returns False for paths that
> exist but are inaccessible.

For the original complaint of mine, catching ValueError would work. I
must say, though, that Steven's arguments for raising a fake OSError are
very convincing.

Steven D'Aprano:
> Jython does this:
> 
> >>> import os
> >>> os.path.exists('/tmp/foo\0bar')
> False
> >>> os.stat('/tmp/foo\0bar')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OSError: [Errno 2] No such file or directory: '/tmp/foo\x00bar'
> 
> 
> As far as I am concerned, raising ValueError is simply a bug. The
> documentation for the os module clearly states:
> 
>     All functions in this module raise OSError in the case of
>     invalid or inaccessible file names and paths, or other
>     arguments that have the correct type, but are not accepted
>     by the operating system.

Now the question is not anymore if and how CPython should be fixed but
if and how Jython should be fixed.

IMO, Jython is doing the right thing. If that is not true, then Jython
must be declared buggy.

> Maybe it could raise an InvalidFilename subclass of OSError. This
> could even handle some actual OS errors such as POSIX ENAMETOOLONG and
> Windows ERROR_INVALID_NAME, ERROR_BAD_PATHNAME, and
> ERROR_FILENAME_EXCED_RANGE.

Maybe. You'll still need OSError.errno to hold a true error value.

Marko
msg325622 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-18 08:29
New changeset 0185f34ddcf07b78feb6ac666fbfd4615d26b028 by Serhiy Storchaka in branch 'master':
 bpo-33721: Make some os.path functions and pathlib.Path methods be tolerant to invalid paths.  (#7695)
https://github.com/python/cpython/commit/0185f34ddcf07b78feb6ac666fbfd4615d26b028
msg325623 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-18 08:34
os.path.exists() and similar functions will return now False for invalid paths (non-encodable paths on Unix, non-decodable bytes paths on Windows, and paths containing null characters or bytes).
History
Date User Action Args
2018-09-18 08:34:24serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg325623

stage: patch review -> resolved
2018-09-18 08:29:04serhiy.storchakasetmessages: + msg325622
2018-06-14 22:50:48pacujosetmessages: + msg319562
2018-06-14 22:35:54eryksunsetmessages: + msg319561
2018-06-14 18:11:43serhiy.storchakasettype: behavior -> enhancement
messages: + msg319535
versions: + Python 3.8, - Python 3.6
2018-06-14 18:05:59serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request7310
2018-06-14 13:25:32steven.dapranosetmessages: + msg319513
2018-06-14 13:18:57steven.dapranosetmessages: + msg319512
2018-06-02 11:09:49steven.dapranosetnosy: + steven.daprano
2018-06-02 04:17:59serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg318487
2018-06-01 21:03:25eryksunsetnosy: + eryksun
messages: + msg318463
2018-06-01 16:53:05eric.smithsetmessages: + msg318432
2018-06-01 16:37:17r.david.murraysetnosy: + r.david.murray
messages: + msg318431
2018-06-01 09:24:03eric.smithsetnosy: + eric.smith
messages: + msg318376
2018-05-31 19:21:43mrabarnettsetnosy: + mrabarnett
messages: + msg318338
2018-05-31 17:19:25pacujocreate