classification
Title: Type: Allow paths to be joined without worrying about a leading slash behavior Library (Lib) Python 3.11
process
Status: Resolution: open eric.smith, eryksun, lutecki, serhiy.storchaka, smarie, veky, zbysz normal

Created on 2021-06-18 12:58 by zbysz, last changed 2021-11-26 21:02 by eryksun.

Messages (10)
msg396058 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2021-06-18 12:58
pathlib.Path.__truediv__(), i.e. pathlib.Path.joinpath() is surprising when the second argument starts with a slash.

>>> pathlib.Path('/foo') / '/bar'
>>> PosixPath('/bar')

I know that this follows the precedent set by os.path.join(), and
probably makes sense in some scenarios. But for the general operation
of "concatenating paths", it doesn't fit very well. In particular,
when concatenating multiple components this becomes even stranger:

>>> pathlib.Path('/var/tmp/instroot') / '/some/path' / '/suffix'
>>> PosixPath('/suffix')

In my particular use case, I'm concatenating some user specified paths
relative to some root. The paths may or may not be absolute.

To avoid this pitfall, something like this is necessary:

>>> pathlib.Path('/var/tmp/instroot') / p.lstrip('/') / q.lstrip('/')

Please provide a way to do this natively. I think it'd be nice to
use // or + for this:

>>> pathlib.Path('/var/tmp/instroot') // '/some/path' // '/suffix'
>>> PosixPath('/var/tmp/instroot/some/path/suffix')
msg396449 - (view) Author: Eric V. Smith (eric.smith) * Date: 2021-06-24 02:42
You should bring this up on the python-ideas mailing list if you want some discussion.
msg396562 - (view) Author: Vedran Čačić (veky) * Date: 2021-06-27 04:44
It doesn't make sense to "concatenate" one absolute path to another. / has a simple explanation: if you start at /foo, and then do cd bar, you'll end up in /foo/bar. But if you start at /foo, and then do cd /bar, you'll end up in /bar.

You mean, some of your users write '/some/path' when they mean 'some/path'? Then the users should be educated about the difference. These are not the same, just like '../some/path' is not the same as them, and '~some/path' is again something very different.
msg396566 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2021-06-27 08:20
I understand why this problem arose. If you parse an HTTP URL, its path always starts with "/" if not empty. And you usually want to interpret it as a relative to some base directory. But lstrip('/') works well here. In any case you need to have some validation to disallow "..".

I think that adding yet one operation will confuse users. And what to do with C:\foo\bar, C:foo\bar, \\?\c\foo\bar, etc?
msg396571 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2021-06-27 10:29
> It doesn't make sense to "concatenate" one absolute path to another.

Please see the original description of the issue, or Serhiy's example. I was thinking about about a case where paths are resolved relative to a container root in a filesystem. Serhiy brings up the case of a web server which concatenates paths from one namespace (URLs) to paths from another (fs paths).

> / has a simple explanation: if you start at /foo, and then do cd bar, you'll end up in /foo/bar. But if you start at /foo, and then do cd /bar, you'll end up in /bar.

You are thinking about a user doing operations in a shell. The two cases described are precisely NOT like this. In both examples, no matter what the path is, it is not allowed to go above the "root of the namespace". I.e. if you start at "/foo", and concatenate "/bar", you end up in "/foo/bar". If you are looking up "https://example.com/some/path", you want "/srv/www/some/path", etc.

>  what to do with C:\foo\bar, C:foo\bar, \\?\c\foo\bar, etc?

I think that with those paths there is no solution. They already don't work in any reasonable way:

>>> pathlib.Path('/asdf') / ("C:/some/path")
PosixPath('/asdf/C:/some/path')
>>> pathlib.Path('C:\\asdf') / ("C:/some/path")
PosixPath('C:\\asdf/C:/some/path')

Windows paths make no sense in the context of combination of namespaces and path concatenation, so I think it's fine to keep current behaviour (whatever it exactly is) for them. While the UNIX paths were designed to allow arbitrary nesting, the Windows paths were designed to always allow per-volume operations. The two concepts cannot be combined.

> In any case you need to have some validation to disallow "..".

Yes, that is a good point. In my code I do some validation that disallows ".." early on. But I think that the hypothetical //-operator should reject paths with "..".

> But lstrip('/') works well here.

It kind of does, but
- it's rather verbose
- it breaks the order, because it's on the right of the argument.

I'll write to python-ideas.
msg396588 - (view) Author: Eryk Sun (eryksun) * Date: 2021-06-27 17:00
> I was thinking about about a case where paths are resolved relative
> to a container root in a filesystem.

I can see the need for generalized 'drive' support that sets an arbitrary path prefix as the 'drive'. For example, if "/var/tmp/instroot" is a 'drive', then joining it to "/some/path" returns "/var/tmp/instroot/some/path". However, subsequently joining that result to "/suffix" would return "/var/tmp/instroot/suffix". The "/some/path" part is replaced by "/suffix". This doesn't match your example to lstrip('/') from paths joined with the proposed // operator.
msg396591 - (view) Author: Zbyszek Jędrzejewski-Szmek (zbysz) * Date: 2021-06-27 17:57
> I can see the need for generalized 'drive' support that sets an arbitrary path prefix as the 'drive'. For example, if "/var/tmp/instroot" is a 'drive', then joining it to "/some/path" returns "/var/tmp/instroot/some/path". However, subsequently joining that result to "/suffix" would return "/var/tmp/instroot/suffix".

I think that the "drive concept" only makes sense on Windows. With POSIX paths, the expectation is that you can concatenate paths recursively, and consistency is much more useful than the drive concept.

One special case where you might concat multiple levels of paths is when the paths are generated through some configuration mechanism, and an occasional absolute path might sneak in, but we still want to use the same "relative" concatenation.

For example:
def path_to_some_binary_in_container(container, usr_merge_was_done=True):
"""Construct a path with support for systems with /usr/bin and legacy systems
with /bin (https://fedoraproject.org/wiki/Features/UsrMove).
"""
path_to_containers = '/var/lib/machines'
prefix = 'usr/' if usr_merge_was_done else '/'
suffix = 'bin/some-binary'
return path_to_containers / container / prefix / suffix

path_to_some_binary('cont1') returns PosixPath('/var/lib/machines/cont1/usr/bin/some-binary'), but path_to_some_binary('cont1', False) returns PosixPath('/bin/some-binary'). The "bug" is that '/' was used instead of ''. This is exactly the
pitfall that I want to avoid:
return path_to_containers // container // prefix // suffix
will do the expected thing.
msg401563 - (view) Author: Sylvain Marie (smarie) * Date: 2021-09-10 08:38
+1 on this, I am totally in line with the original post.

The / operator semantics should not imply any notion of drive or of "cd" command on a shell. It should simply stick to "concatenate", i.e. "create child path" (and actually this is what the doc states https://docs.python.org/3/library/pathlib.html#operators )

Thanks Zbyszek for putting this on the table !
msg407076 - (view) Author: (lutecki) Date: 2021-11-26 17:38
So how this should work?
I'm testing this simple example on Windows:

a = Path("/a/b")
b = Path("c/d")

and b / a gives me WindowsPath('/a/b'). So I'm like "ok, a seems like absolute, I will test for that" but on Windows a.is_absolute() is False.
???

Regards
msg407094 - (view) Author: Eryk Sun (eryksun) * Date: 2021-11-26 21:02
> and b / a gives me WindowsPath('/a/b'). So I'm like "ok, a seems
> like absolute, I will test for that" but on Windows a.is_absolute()
> is False.

Path.is_absolute() is true if a path has a root and, for a Windows path, also a drive.

In C++, the filesystem::path is_absolute() method is similar. In Windows it's true if both has_root_name() (i.e. a drive) and has_root_directory() are true. In POSIX it just depends on has_root_directory(). pathlib.Path is also consistent with C++ filesystem::path with regard to appending paths with the slash operator [1]. I would prefer for it to remain so.

FYI, in Windows, "/a/b" is resolved using the drive of the process current working directory, which may be a UNC path. The drive of a UNC path is the share path, such as "\\server\share".

Another type of relative path in Windows is a drive-relative path, which applies to drive-letter drives only. For example:

>>> Path('C:spam') / "eggs"
WindowsPath('C:spam/eggs')
>>> Path('C:spam') / "/eggs"
WindowsPath('C:/eggs')

"C:spam" is relative to the current working directory on drive "C:". The API gets the working directory for the target drive either from the process current working directory, if it's a path on the target drive, or from an "=<drive letter>:" environment variable, such as "=Z:". (The Windows API allows environment variable names to begin with "=".) If the process current working directory is on a different drive, and the environment variable for the target drive isn't set, the API defaults to using the root directory on the target drive. Setting these per-drive working directory environment variables is up to the application. Python's os.chdir() supports them.

---
[1] https://en.cppreference.com/w/cpp/filesystem/path/append
History
Date User Action Args
2021-11-26 21:02:59eryksunsetmessages: + msg407094
2021-11-26 17:38:36luteckisetnosy: + lutecki
messages: + msg407076
2021-09-10 08:38:22smariesetnosy: + smarie
messages: + msg401563
2021-06-27 17:57:10zbyszsetmessages: + msg396591
2021-06-27 17:00:26eryksunsetnosy: + eryksun
messages: + msg396588
2021-06-27 10:29:29zbyszsetmessages: + msg396571
2021-06-27 08:20:20serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg396566
2021-06-27 04:44:12vekysetnosy: + veky
messages: + msg396562
2021-06-24 02:42:12eric.smithsetnosy: + eric.smith
messages: + msg396449
2021-06-18 12:58:26zbyszcreate