Issue40996
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-06-17 00:19 by mjacob, last changed 2022-04-11 14:59 by admin.
Messages (1) | |||
---|---|---|---|
msg371702 - (view) | Author: Manuel Jacob (mjacob) * | Date: 2020-06-17 00:19 | |
On Unix, file names are bytes. Python mostly prefers to use unicode for file names. On the Python <-> system boundary, os.fsencode() / os.fsdecode() are used. In URIs, bytes can be percent-encoded. On Unix, most applications pass the percent-decoded bytes in file URIs to the file system unchanged. The remainder of this issue description is about Unix, except for the last paragraph. Pathlib fsencodes the path when making a file URI, roundtripping the bytes e.g. passed as an argument: % python3 -c 'import pathlib, sys; print(pathlib.Path(sys.argv[1]).as_uri())' /tmp/a$(echo -e '\xE4') file:///tmp/a%E4 Example with curl using this URL: % echo 'Hello, World!' > /tmp/a$(echo -e '\xE4') % curl file:///tmp/a%E4 Hello, World! Python 2’s urllib works the same: % python2 -c 'from urllib import urlopen; print(repr(urlopen("file:///tmp/a%E4").read()))' 'Hello, World!\n' However, Python 3’s urllib fails: % python3 -c 'from urllib.request import urlopen; print(repr(urlopen("file:///tmp/a%E4").read()))' Traceback (most recent call last): File "/usr/lib/python3.8/urllib/request.py", line 1507, in open_local_file stats = os.stat(localfile) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/a�' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.8/urllib/request.py", line 525, in open response = self._open(req, data) File "/usr/lib/python3.8/urllib/request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(*args) File "/usr/lib/python3.8/urllib/request.py", line 1485, in file_open return self.open_local_file(req) File "/usr/lib/python3.8/urllib/request.py", line 1524, in open_local_file raise URLError(exp) urllib.error.URLError: <urlopen error [Errno 2] No such file or directory: '/tmp/a�'> urllib.request.url2pathname() is the function converting the path of the file URI to a file name. On Unix, it uses urllib.parse.unquote() with the default settings (UTF-8 encoding and the "replace" error handler). I think that on Unix, the settings from os.fsdecode() should be used, so that it roundtrips with pathlib.Path.as_uri() and so that the percent-decoded bytes are passed to the file system as-is. On Windows, I couldn’t do experiments, but using UTF-8 seems like the right thing (according to https://en.wikipedia.org/wiki/File_URI_scheme#Windows_2). I’m not sure that the "replace" error handler is a good idea. I prefer "errors should never pass silently" from the Zen of Python, but I don’t a have a strong opinion on this. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:32 | admin | set | github: 85168 |
2020-06-17 10:14:42 | vstinner | set | nosy:
- vstinner |
2020-06-17 00:19:10 | mjacob | create |