This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pathlib.PurePath does not accept bytes components
Type: Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: bru, docs@python, ncoghlan, pitrou, python-dev
Priority: normal Keywords:

Created on 2015-04-10 11:11 by bru, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg240414 - (view) Author: Bruno Cauet (bru) * Date: 2015-04-10 11:11
At https://docs.python.org/3/library/pathlib.html#pure-paths one can read

> Each element of pathsegments can be either a string or bytes object representing a path segment; it can also be another path object:

which is a lie:

>>> pathlib.PurePath(b"/foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bru/code/cpython/Lib/pathlib.py", line 609, in __new__
    return cls._from_parts(args)
  File "/home/bru/code/cpython/Lib/pathlib.py", line 638, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/bru/code/cpython/Lib/pathlib.py", line 630, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>

So either
(1) the doc is wrong
(2) PathLib path management fails: it should decode bytes parts with os.fsdecode()
I doubt I tagged both components. I'll be happy to provide a fix once you decide what is the right solution.

I take this opportunity to share an itch: filesystem encoding on Unix cannot be reliably determined. sys.getfilesystemencoding() is only a preference and there is no guarantee that an arbitrary file will respect it. This is extensively discussed in the following thread: https://mail.python.org/pipermail/python-dev/2014-August/135873.html
What is the right way to deal with those? If I use "surrogateescape" (see PEP383) how can I display the fake-unicode path to the user? `print()` does seems to use strict encoding. Should I encode it with "surrogateescape" or "ignore" myself beforehand?
msg240502 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-11 22:02
Interesting. The doc is wrong here: pathlib was designed so that it only accepts text strings.

> If I use "surrogateescape" (see PEP383) how can I display the
> fake-unicode path to the user? `print()` does seems to use strict
> encoding. Should I encode it with "surrogateescape" or "ignore" myself 
> beforehand?

Yes, you should probably encode it yourself. If you are sure your terminal can eat the original bytestring, then use "surrogateescape". Otherwise, "replace" sounds better so that the user knows there are some undecodable characters out there.
msg240503 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-04-11 22:08
New changeset 7463c06f6e87 by Antoine Pitrou in branch '3.4':
Close #23904: fix pathlib documentation misleadingly mentioning that bytes objects are accepted in the PurePath constructor
https://hg.python.org/cpython/rev/7463c06f6e87

New changeset 386732087dfb by Antoine Pitrou in branch 'default':
Close #23904: fix pathlib documentation misleadingly mentioning that bytes objects are accepted in the PurePath constructor
https://hg.python.org/cpython/rev/386732087dfb
msg240504 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-11 22:09
Thanks for the report!
History
Date User Action Args
2022-04-11 14:58:15adminsetgithub: 68092
2015-04-11 22:09:05pitrousetmessages: + msg240504
versions: - Python 3.6
2015-04-11 22:08:46python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg240503

resolution: fixed
stage: resolved
2015-04-11 22:02:08pitrousetnosy: + ncoghlan, pitrou
messages: + msg240502
2015-04-10 11:11:21brucreate