This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pathlib.exists on Windows raises an exception on URL like/bad input
Type: Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: duplicate
Dependencies: Superseder: [Windows] OSError when testing whether pathlib.Path('*') exists
View: 35306
Assigned To: Nosy List: domdfcoding2, eryksun, gaborjbernat, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2021-01-07 09:23 by gaborjbernat, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (9)
msg384569 - (view) Author: gaborjbernat (gaborjbernat) * Date: 2021-01-07 09:23
❯ py -c "from pathlib import Path; Path('http://w.org').exists()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python39\lib\pathlib.py", line 1407, in exists
    self.stat()
  File "C:\Python39\lib\pathlib.py", line 1221, in stat
    return self._accessor.stat(self)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'http:\\w.org'

The above code returns correctly False on UNIX systems.
msg384570 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2021-01-07 09:33
It's an invalid filename so it raises an exception.

You can get the same on Unix by using an invalid filename (embedded null):

>>> from pathlib import Path
>>> Path("/usr/\0").exists()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.7/pathlib.py", line 1356, in exists
    self.stat()
  File "/usr/lib64/python3.7/pathlib.py", line 1178, in stat
    return self._accessor.stat(self)
ValueError: embedded null byte

You need to be prepared for exceptions if you aren't sure you have a valid path. One thing that might be useful, I guess, is a `Path.is_valid()` function. But I don't know if all platforms have a way of asking the OS "is this a valid pathname?" So catching the exception is probably best.
msg384571 - (view) Author: gaborjbernat (gaborjbernat) * Date: 2021-01-07 09:39
How come the link is invalid on Windows but valid on UNIX?
msg384572 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2021-01-07 09:45
"http:" isn't a valid drive letter, I'd imagine.
msg384573 - (view) Author: Dominic Davis-Foster (domdfcoding2) Date: 2021-01-07 09:55
Paul's example with the embedded null no longer works on Python 3.8 as Path.exists returns False on ValueError (added in gh-7695)

Path.exists already ignores some Windows-specific errors, so I don't see why it shouldn't also ignore invalid paths which can't exist.
msg384584 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-01-07 13:04
> "http:" isn't a valid drive letter, I'd imagine.

It's not a valid DOS drive, but that's not the problem. "http://w.org" is parsed as a relative path. The double slashes are replaced by a single backslash, and the Windows API tries to open the path relative to a handle for the current working directory. 

The issue is handling a path component named "http:". Some filesystems such as FAT32 reserve ":" as an invalid name character. Others such as NTFS and ReFS reserve ":" as the stream delimiter [1], and "http:" is not a valid stream name. I'm on the fence about how to handle names that the OS rejects as invalid in a boolean context (e.g. exists, isfile, etc). In one sense, returning False is reasonable because an invalid name cannot exist. But on the other hand, asking whether something that's invalid exists is a nonsense question that warrants an exception. That said, the issue has already been decided multiple times in favor of returning False, so at this point that's a pattern that should be consistently supported by the standard library.

Note that a filesystem may allow ":" as name character, such as the VirtualBox shared-folder filesystem redirector. But the latter brings up yet another twist. Adding a redirector into the device stack, and thus including the MUP (multiple UNC provider) device, brings along more path-parsing baggage. In this case a component name with ":" in it fails as bad syntax, which gets mapped to WinAPI ERROR_BAD_PATHNAME (161), and thus C ENOENT, and ultimately FileNotFoundError in Python. This is the case regardless of the filesystem. For example, let's use the SMB redirector to set the "//localhost/C$" share for the "C:" drive as the working directory:

    >>> os.chdir('//localhost/C$')
    >>> os.stat('http://w.org')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    FileNotFoundError: [WinError 161] The specified path is invalid: 'http://w.org'

    >>> Path('http://w.org').exists()
    False

---

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/file-streams
msg384585 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2021-01-07 13:15
So I guess the key question then is whether Path.exists() should trap exceptions and interpret them as "does not exist" (on all platforms, although it looks like the null character case in Unix has now been fixed). Which doesn't seem unreasonable, I guess.
msg384595 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-01-07 16:51
Yeah, I think saying "return True if it provably exists and False if existence cannot be proven (and never raise)" is a good general rule for boolean-returning functions.

This definitely raises some edge cases where we can infer from certain error codes that a path exists, but I don't think it obliges us to prioritise fixing those in order to handle more obvious cases.
msg384597 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-01-07 17:30
An alternative would be to add a "strict" parameter that defaults to False. In non-strict mode, map all OSError exceptions to a False return value. In strict mode, use _ignore_error(e) to determine whether to return False or propagate the exception. The question then is whether to add ERROR_INVALID_NAME (123) to _IGNORED_WINERRORS since the error means the name can never exist. On the other hand, ERROR_ACCESS_DENIED due to a permission error would be propagated in strict mode -- because the path's existence is unknown.
History
Date User Action Args
2022-04-11 14:59:40adminsetgithub: 87021
2021-03-27 05:12:40eryksunsetstatus: open -> closed
superseder: [Windows] OSError when testing whether pathlib.Path('*') exists
resolution: duplicate
stage: resolved
2021-03-17 19:49:01eryksunsetcomponents: + Library (Lib)
versions: - Python 3.6, Python 3.7
2021-01-07 17:30:44eryksunsetmessages: + msg384597
2021-01-07 16:51:41steve.dowersetmessages: + msg384595
2021-01-07 13:15:23paul.mooresetmessages: + msg384585
2021-01-07 13:04:38eryksunsetnosy: + eryksun
messages: + msg384584
2021-01-07 09:55:59domdfcoding2setnosy: + domdfcoding2
messages: + msg384573
2021-01-07 09:45:07paul.mooresetmessages: + msg384572
2021-01-07 09:39:34gaborjbernatsetmessages: + msg384571
2021-01-07 09:33:57paul.mooresetmessages: + msg384570
2021-01-07 09:23:21gaborjbernatcreate