Message240414
At https://docs.python.org/3/library/pathlib.html#pure-paths one can read
> Each element of pathsegments can be either a string or bytes object representing a path segment; it can also be another path object:
which is a lie:
>>> pathlib.PurePath(b"/foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bru/code/cpython/Lib/pathlib.py", line 609, in __new__
return cls._from_parts(args)
File "/home/bru/code/cpython/Lib/pathlib.py", line 638, in _from_parts
drv, root, parts = self._parse_args(args)
File "/home/bru/code/cpython/Lib/pathlib.py", line 630, in _parse_args
% type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>
So either
(1) the doc is wrong
(2) PathLib path management fails: it should decode bytes parts with os.fsdecode()
I doubt I tagged both components. I'll be happy to provide a fix once you decide what is the right solution.
I take this opportunity to share an itch: filesystem encoding on Unix cannot be reliably determined. sys.getfilesystemencoding() is only a preference and there is no guarantee that an arbitrary file will respect it. This is extensively discussed in the following thread: https://mail.python.org/pipermail/python-dev/2014-August/135873.html
What is the right way to deal with those? If I use "surrogateescape" (see PEP383) how can I display the fake-unicode path to the user? `print()` does seems to use strict encoding. Should I encode it with "surrogateescape" or "ignore" myself beforehand? |
|
Date |
User |
Action |
Args |
2015-04-10 11:11:22 | bru | set | recipients:
+ bru, docs@python |
2015-04-10 11:11:21 | bru | set | messageid: <1428664281.99.0.802237757059.issue23904@psf.upfronthosting.co.za> |
2015-04-10 11:11:21 | bru | link | issue23904 messages |
2015-04-10 11:11:21 | bru | create | |
|