Issue33721
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-05-31 17:19 by pacujo, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 7695 | merged | serhiy.storchaka, 2018-06-14 18:05 |
Messages (14) | |||
---|---|---|---|
msg318334 - (view) | Author: (pacujo) | Date: 2018-05-31 17:19 | |
os.path.exists() returns True or False for all imaginable string arguments except for one that contains NUL ("\0") (Linux). This behavior is not documented in the library. Moreover, it can easily lead to accidents if an externally supplied pathname were to contain a NUL because most test suites would not try to cover such a pathname. I propose os.path.exists() should return False even in this case. |
|||
msg318338 - (view) | Author: Matthew Barnett (mrabarnett) * ![]() |
Date: 2018-05-31 19:21 | |
It also raises a ValueError on Windows. For other invalid paths on Windows it returns False. |
|||
msg318376 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2018-06-01 09:24 | |
To be clear: os.path.exists('a\x00b') raises ValueError on both Windows and Linux. I think we should just document this behavior and not change it. |
|||
msg318431 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2018-06-01 16:37 | |
I seem to recall that this ValueError behavior was discussed at some length and it is the desired behavior. (At some previous point I think everything after the NUL was ignored.) I'm not really sure why it needs to be documented either, NULs are invalid in filenames, aren't they? Or are there some OSes that allow them? |
|||
msg318432 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2018-06-01 16:53 | |
I don't know of any OS that supports NULs in filenames (not that my knowledge is encyclopedic). My reason for suggesting we document it is that os.path.exists() returns False for otherwise invalid filenames, where something like open() raises. On Windows: >>> os.path.exists('c::bar') False >>> open('c::bar') Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 22] Invalid argument: 'c::bar' So I do think it's a little surprising that os.path.exists() would raise with a NUL, instead of returning False. But I don't think it's worth changing the behavior, due to potential (though unlikely) breakage. |
|||
msg318463 - (view) | Author: Eryk Sun (eryksun) * ![]() |
Date: 2018-06-01 21:03 | |
It has to be a ValueError since the error is an invalid parameter at the Python level. Similarly, in 3.6+ when using bytes paths in Windows, for which Python uses UTF-8 as the file-system encoding, os.path.exists() may raise UnicodeDecodeError: >>> try: ... os.path.exists(b'\xc8spam') ... except ValueError as e: err = e ... >>> err UnicodeDecodeError('utf-8', b'\xc8spam', 0, 1, 'invalid continuation byte') There's no practical support for NUL in paths in any OS as far as I know. In principle, the native NT API of Windows allows NUL in device names since it uses counted strings. However, Python only supports the Windows API, which uses null-terminated strings. So, practically speaking, not supporting NUL in paths is not an issue. Here's an example of using an NT device name that contains a NUL character. It creates a DOS 'device' named "A\x00B" that targets "C:\\Temp". It's like a SUBST drive, but with any device name instead of just a drive-letter name. I omitted the ctypes definitions for brevity. obja = OBJECT_ATTRIBUTES("\\??\\A\x00B") target = UNICODE_STRING("\\??\\C:\\Temp") handle = ctypes.c_void_p() NtCreateSymbolicLinkObject(ctypes.byref(handle), GENERIC_ALL, ctypes.byref(obja), ctypes.byref(target)) Query the timestamps of the "A\x00B" device: info = (ctypes.c_ulonglong * 5)() NtQueryAttributesFile(ctypes.byref(obja), info) times = (ctypes.c_int * 4)() for i in range(4): RtlTimeToSecondsSince1970(ctypes.byref(info, i*8), ctypes.byref(times, i*4)) Verify that the creation, last access, and last write times are the same as "C:\\Temp": s = os.stat('C:\\Temp') times2 = [int(x) for x in (s.st_ctime, s.st_atime, s.st_mtime)] >>> times[:3] == times2 True (The fourth timestamp is the change time, which isn't supported in Windows Python for legacy reasons that also relate to the bizarre use of st_ctime for the creation time, instead of st_birthtime.) |
|||
msg318487 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-06-02 04:17 | |
In earlier versions NULs in paths caused silent truncating the path, and this is considered a vulnerability. In 2.7 and earlier versions of 3.x a TypeError was raised in os.path.exists(). ValueError in this function (and many others) is raised since 3.5, because it looks more appropriate. Similar exceptions are raised perhaps in hundreds functions. It is impossible to document them all, and os.path.exists() doesn't look special enough for documenting this only for it. However os.path.exists() is special in the sense that this exception can be interpreted as a false value. Since os.path.exists() already catches OSError and interprets it as a false result, it is easier to add a ValueError here. I don't think this will break much code if add it only in 3.8. This will cover also the case of unencodable/undecodable paths ('\udfff', b'\x98'). |
|||
msg319512 - (view) | Author: Steven D'Aprano (steven.daprano) * ![]() |
Date: 2018-06-14 13:18 | |
Eric wrote: > I don't know of any OS that supports NULs in filenames HFS, HFS Plus, and Apple File System all support NULs in filenames. HFS Plus volumes include a special special directory called the metadata directory, in the volume's root directory, called "\0\0\0\0HFS+ Private Data". https://developer.apple.com/library/archive/technotes/tn/tn1150.html#HFSPlusNames There are, I believe, Carbon APIs for checking for file names which do not rely on NUL-terminated strings (they use an array of Unicode characters with an explicit length), but I don't know enough about OS X APIs to know if they are current generation. |
|||
msg319513 - (view) | Author: Steven D'Aprano (steven.daprano) * ![]() |
Date: 2018-06-14 13:25 | |
Eryk Sun says: > It has to be a ValueError since the error is an invalid parameter at the Python level. How does the first follow from the second? Strings with NULs in them aren't errors or invalid parameters at the Python level, and they are legal file names in at least some file systems. Jython does this: >>> import os >>> os.path.exists('/tmp/foo\0bar') False >>> os.stat('/tmp/foo\0bar') Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 2] No such file or directory: '/tmp/foo\x00bar' As far as I am concerned, raising ValueError is simply a bug. The documentation for the os module clearly states: All functions in this module raise OSError in the case of invalid or inaccessible file names and paths, or other arguments that have the correct type, but are not accepted by the operating system. I don't believe there is any good reason for singling out NULs for a different exception from other invalid file names like ">" on NTFS. This ought to be an OSError for functions like os.stat and False for os.path.exists, as Jython does. |
|||
msg319535 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-06-14 18:11 | |
This change looks desirable to me. But it looks too large for backporting it to maintained versions. |
|||
msg319561 - (view) | Author: Eryk Sun (eryksun) * ![]() |
Date: 2018-06-14 22:35 | |
>> It has to be a ValueError since the error is an invalid >> parameter at the Python level. > > How does the first follow from the second? I only meant that, as an honest error, it has to be ValueError. I didn't think about raising a fake OSError. Note that I didn't say the ValueError shouldn't be ignored by os.path.exists (et al). In the spirit of the current function, it probably should be ignored. For example, it returns False for paths that exist but are inaccessible. > I don't believe there is any good reason for singling out NULs > for a different exception from other invalid file names > like ">" on NTFS. > > This ought to be an OSError for functions like os.stat and False > for os.path.exists, as Jython does. Python can't pass a string that contains NUL characters to POSIX and Windows APIs that use null-terminated strings. That would yield wildly unpredictable results. We need this to be a reliable error. So for the low-level file I/O functions to return an OSError here, it would have to be a bit of a lie (i.e. an 'OS' error without making a system call and without an `errno` and/or `winerror` value). Maybe it could raise an InvalidFilename subclass of OSError. This could even handle some actual OS errors such as POSIX ENAMETOOLONG and Windows ERROR_INVALID_NAME, ERROR_BAD_PATHNAME, and ERROR_FILENAME_EXCED_RANGE. |
|||
msg319562 - (view) | Author: (pacujo) | Date: 2018-06-14 22:50 | |
Eryk Sun: > I only meant that, as an honest error, it has to be ValueError. I didn't > think about raising a fake OSError. > > Note that I didn't say the ValueError shouldn't be ignored by > os.path.exists (et al). In the spirit of the current function, it > probably should be ignored. For example, it returns False for paths that > exist but are inaccessible. For the original complaint of mine, catching ValueError would work. I must say, though, that Steven's arguments for raising a fake OSError are very convincing. Steven D'Aprano: > Jython does this: > > >>> import os > >>> os.path.exists('/tmp/foo\0bar') > False > >>> os.stat('/tmp/foo\0bar') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > OSError: [Errno 2] No such file or directory: '/tmp/foo\x00bar' > > > As far as I am concerned, raising ValueError is simply a bug. The > documentation for the os module clearly states: > > All functions in this module raise OSError in the case of > invalid or inaccessible file names and paths, or other > arguments that have the correct type, but are not accepted > by the operating system. Now the question is not anymore if and how CPython should be fixed but if and how Jython should be fixed. IMO, Jython is doing the right thing. If that is not true, then Jython must be declared buggy. > Maybe it could raise an InvalidFilename subclass of OSError. This > could even handle some actual OS errors such as POSIX ENAMETOOLONG and > Windows ERROR_INVALID_NAME, ERROR_BAD_PATHNAME, and > ERROR_FILENAME_EXCED_RANGE. Maybe. You'll still need OSError.errno to hold a true error value. Marko |
|||
msg325622 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-09-18 08:29 | |
New changeset 0185f34ddcf07b78feb6ac666fbfd4615d26b028 by Serhiy Storchaka in branch 'master': bpo-33721: Make some os.path functions and pathlib.Path methods be tolerant to invalid paths. (#7695) https://github.com/python/cpython/commit/0185f34ddcf07b78feb6ac666fbfd4615d26b028 |
|||
msg325623 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-09-18 08:34 | |
os.path.exists() and similar functions will return now False for invalid paths (non-encodable paths on Unix, non-decodable bytes paths on Windows, and paths containing null characters or bytes). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:01 | admin | set | github: 77902 |
2018-09-18 08:34:24 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg325623 stage: patch review -> resolved |
2018-09-18 08:29:04 | serhiy.storchaka | set | messages: + msg325622 |
2018-06-14 22:50:48 | pacujo | set | messages: + msg319562 |
2018-06-14 22:35:54 | eryksun | set | messages: + msg319561 |
2018-06-14 18:11:43 | serhiy.storchaka | set | type: behavior -> enhancement messages: + msg319535 versions: + Python 3.8, - Python 3.6 |
2018-06-14 18:05:59 | serhiy.storchaka | set | keywords:
+ patch stage: patch review pull_requests: + pull_request7310 |
2018-06-14 13:25:32 | steven.daprano | set | messages: + msg319513 |
2018-06-14 13:18:57 | steven.daprano | set | messages: + msg319512 |
2018-06-02 11:09:49 | steven.daprano | set | nosy:
+ steven.daprano |
2018-06-02 04:17:59 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg318487 |
2018-06-01 21:03:25 | eryksun | set | nosy:
+ eryksun messages: + msg318463 |
2018-06-01 16:53:05 | eric.smith | set | messages: + msg318432 |
2018-06-01 16:37:17 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg318431 |
2018-06-01 09:24:03 | eric.smith | set | nosy:
+ eric.smith messages: + msg318376 |
2018-05-31 19:21:43 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg318338 |
2018-05-31 17:19:25 | pacujo | create |