Message 240414 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	bru
Recipients	bru, docs@python
Date	2015-04-10.11:11:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1428664281.99.0.802237757059.issue23904@psf.upfronthosting.co.za>
In-reply-to

Content
At https://docs.python.org/3/library/pathlib.html#pure-paths one can read > Each element of pathsegments can be either a string or bytes object representing a path segment; it can also be another path object: which is a lie: >>> pathlib.PurePath(b"/foo") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/bru/code/cpython/Lib/pathlib.py", line 609, in __new__ return cls._from_parts(args) File "/home/bru/code/cpython/Lib/pathlib.py", line 638, in _from_parts drv, root, parts = self._parse_args(args) File "/home/bru/code/cpython/Lib/pathlib.py", line 630, in _parse_args % type(a)) TypeError: argument should be a path or str object, not <class 'bytes'> So either (1) the doc is wrong (2) PathLib path management fails: it should decode bytes parts with os.fsdecode() I doubt I tagged both components. I'll be happy to provide a fix once you decide what is the right solution. I take this opportunity to share an itch: filesystem encoding on Unix cannot be reliably determined. sys.getfilesystemencoding() is only a preference and there is no guarantee that an arbitrary file will respect it. This is extensively discussed in the following thread: https://mail.python.org/pipermail/python-dev/2014-August/135873.html What is the right way to deal with those? If I use "surrogateescape" (see PEP383) how can I display the fake-unicode path to the user? `print()` does seems to use strict encoding. Should I encode it with "surrogateescape" or "ignore" myself beforehand?

At https://docs.python.org/3/library/pathlib.html#pure-paths one can read

> Each element of pathsegments can be either a string or bytes object representing a path segment; it can also be another path object:

which is a lie:

>>> pathlib.PurePath(b"/foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bru/code/cpython/Lib/pathlib.py", line 609, in __new__
    return cls._from_parts(args)
  File "/home/bru/code/cpython/Lib/pathlib.py", line 638, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/bru/code/cpython/Lib/pathlib.py", line 630, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>

So either
(1) the doc is wrong
(2) PathLib path management fails: it should decode bytes parts with os.fsdecode()
I doubt I tagged both components. I'll be happy to provide a fix once you decide what is the right solution.

I take this opportunity to share an itch: filesystem encoding on Unix cannot be reliably determined. sys.getfilesystemencoding() is only a preference and there is no guarantee that an arbitrary file will respect it. This is extensively discussed in the following thread: https://mail.python.org/pipermail/python-dev/2014-August/135873.html
What is the right way to deal with those? If I use "surrogateescape" (see PEP383) how can I display the fake-unicode path to the user? `print()` does seems to use strict encoding. Should I encode it with "surrogateescape" or "ignore" myself beforehand?

History
Date	User	Action	Args
2015-04-10 11:11:22	bru	set	recipients: + bru, docs@python
2015-04-10 11:11:21	bru	set	messageid: <1428664281.99.0.802237757059.issue23904@psf.upfronthosting.co.za>
2015-04-10 11:11:21	bru	link	issue23904 messages
2015-04-10 11:11:21	bru	create