Issue 23904: pathlib.PurePath does not accept bytes components

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68092

classification

Title:	pathlib.PurePath does not accept bytes components
Type:		Stage:	resolved
Components:	Documentation, Library (Lib)	Versions:	Python 3.4, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	bru, docs@python, ncoghlan, pitrou, python-dev
Priority:	normal	Keywords:

Created on 2015-04-10 11:11 by bru, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg240414 - (view)	Author: Bruno Cauet (bru) *	Date: 2015-04-10 11:11
At https://docs.python.org/3/library/pathlib.html#pure-paths one can read > Each element of pathsegments can be either a string or bytes object representing a path segment; it can also be another path object: which is a lie: >>> pathlib.PurePath(b"/foo") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/bru/code/cpython/Lib/pathlib.py", line 609, in __new__ return cls._from_parts(args) File "/home/bru/code/cpython/Lib/pathlib.py", line 638, in _from_parts drv, root, parts = self._parse_args(args) File "/home/bru/code/cpython/Lib/pathlib.py", line 630, in _parse_args % type(a)) TypeError: argument should be a path or str object, not <class 'bytes'> So either (1) the doc is wrong (2) PathLib path management fails: it should decode bytes parts with os.fsdecode() I doubt I tagged both components. I'll be happy to provide a fix once you decide what is the right solution. I take this opportunity to share an itch: filesystem encoding on Unix cannot be reliably determined. sys.getfilesystemencoding() is only a preference and there is no guarantee that an arbitrary file will respect it. This is extensively discussed in the following thread: https://mail.python.org/pipermail/python-dev/2014-August/135873.html What is the right way to deal with those? If I use "surrogateescape" (see PEP383) how can I display the fake-unicode path to the user? `print()` does seems to use strict encoding. Should I encode it with "surrogateescape" or "ignore" myself beforehand?
msg240502 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-04-11 22:02
Interesting. The doc is wrong here: pathlib was designed so that it only accepts text strings. > If I use "surrogateescape" (see PEP383) how can I display the > fake-unicode path to the user? `print()` does seems to use strict > encoding. Should I encode it with "surrogateescape" or "ignore" myself > beforehand? Yes, you should probably encode it yourself. If you are sure your terminal can eat the original bytestring, then use "surrogateescape". Otherwise, "replace" sounds better so that the user knows there are some undecodable characters out there.
msg240503 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-04-11 22:08
New changeset 7463c06f6e87 by Antoine Pitrou in branch '3.4': Close #23904: fix pathlib documentation misleadingly mentioning that bytes objects are accepted in the PurePath constructor https://hg.python.org/cpython/rev/7463c06f6e87 New changeset 386732087dfb by Antoine Pitrou in branch 'default': Close #23904: fix pathlib documentation misleadingly mentioning that bytes objects are accepted in the PurePath constructor https://hg.python.org/cpython/rev/386732087dfb
msg240504 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-04-11 22:09
Thanks for the report!

History
Date	User	Action	Args
2022-04-11 14:58:15	admin	set	github: 68092
2015-04-11 22:09:05	pitrou	set	messages: + msg240504 versions: - Python 3.6
2015-04-11 22:08:46	python-dev	set	status: open -> closed nosy: + python-dev messages: + msg240503 resolution: fixed stage: resolved
2015-04-11 22:02:08	pitrou	set	nosy: + ncoghlan, pitrou messages: + msg240502
2015-04-10 11:11:21	bru	create