At https://docs.python.org/3/library/pathlib.html#pure-paths one can read
> Each element of pathsegments can be either a string or bytes object representing a path segment; it can also be another path object:
which is a lie:
>>> pathlib.PurePath(b"/foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bru/code/cpython/Lib/pathlib.py", line 609, in __new__
return cls._from_parts(args)
File "/home/bru/code/cpython/Lib/pathlib.py", line 638, in _from_parts
drv, root, parts = self._parse_args(args)
File "/home/bru/code/cpython/Lib/pathlib.py", line 630, in _parse_args
% type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>
So either
(1) the doc is wrong
(2) PathLib path management fails: it should decode bytes parts with os.fsdecode()
I doubt I tagged both components. I'll be happy to provide a fix once you decide what is the right solution.
I take this opportunity to share an itch: filesystem encoding on Unix cannot be reliably determined. sys.getfilesystemencoding() is only a preference and there is no guarantee that an arbitrary file will respect it. This is extensively discussed in the following thread: https://mail.python.org/pipermail/python-dev/2014-August/135873.html
What is the right way to deal with those? If I use "surrogateescape" (see PEP383) how can I display the fake-unicode path to the user? `print()` does seems to use strict encoding. Should I encode it with "surrogateescape" or "ignore" myself beforehand?
|