This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.join misjoins paths
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, eryksun, mesheb82, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2017-07-11 20:43 by mesheb82, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (12)
msg298181 - (view) Author: mesheb82 (mesheb82) Date: 2017-07-11 20:43
I'm trying to join paths on Windows with data taken from a user generated file.  In doing so, I came across:

    >>> os.path.join('dir1', '/dir2')
    '/dir2'

I'd expect an error or:

    'dir1\\dir2'

This has been tested and is consistent with Python 2.7.13 and 3.6.1.
msg298184 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-07-11 21:59
This is as documented - see https://docs.python.org/3.6/library/os.path.html#os.path.join (" If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component"). In this case, "/dir2" is an absolute path as it starts with a slash.
msg298186 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-07-11 22:27
This differs slightly from WinAPI PathCchCombineEx, which fails the example case as an invalid parameter. If the second path is rooted but without a drive or UNC share, then if the first path is relative it must be at least drive relative (e.g. "C:dir1"). Should Python's documented behavior change in 3.7 to match PathCchCombineEx in this case?
msg298198 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-12 06:38
Arguably it isn't even against the documented behavior, since a component starting with a slash an absolute path.

I'd be in favor of preserving the drive when encountering a component starting with a separator. Not sure of the value in changing the behavior in older versions - apparently I've never encountered this before, and I feel like I should have.
msg298199 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-12 06:39
> since a component starting with a slash *is not* an absolute path.
msg298202 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-12 08:01
I afraid that failing on os.path.join('', '/path') or os.path.join('.', '/path') can break a lot of code.

> I'd be in favor of preserving the drive when encountering a component starting with a separator.

Already done (issue19456).

>>> import ntpath
>>> ntpath.join('c:foo', '/bar')
'c:/bar'
msg298238 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2017-07-12 17:22
We absolutely cannot change this to give an error if the second or subsequent parameters is absolute. I have code that reads user-named config files. If the path is relative, it's relative to a config directory, but it's allowed to be absolute:

config_filename = os.path.join(config_dir, user_supplied_name)
msg298241 - (view) Author: mesheb82 (mesheb82) Date: 2017-07-12 18:38
Testing on Python 2.7.12 on through Windows 10 bash (so Linux), I find an inconsistency with the documented statement "If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component"

>>> os.path.join('dir', 'C:/dir2')
'dir/C:/dir2'

To me, the is very similar to the original problem (Windows 10 Python 2.7.13 and 3.6.1):

>>> os.path.join('dir1', '/dir2')
'/dir2'

I would argue that on Windows, '/dir2' is not an absolute path.  Testing from cmd and powershell on Windows 10 from `C:`
>>> cd /dir2
C:/dir2

I do agree though that is a terrible idea to not respect the second parameter in:
    os.path.join(absolute_path_or_local_path, absolute_path)

I think the question is what is considered an absolute path and does that change depending on the OS?
msg298308 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-13 19:04
There's absolutely no risk of ignoring later parameters or raising a ValueError here, so please don't let those cloud the discussion.

The behaviour of Python 3.6 seems to be correct for every case except:
    >>> os.path.join("C:\\dir1", "D:dir2")
    D:dir2
    (expected D:\dir1\dir2)

However, that's an incredible edge case that virtually nobody relies on and I'm sure nobody expects.

The other combinations of relative and absolute paths seem to be correct. I'm not convinced that changing the behaviour of Python 2.7 significantly improves either the maintainability or security of that release, so unless someone wants to argue about that I'm closing this as not a bug. (And if someone *does* want to argue about it, don't bother arguing with me :) )
msg298313 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-07-13 20:16
The difference compared to PathCchCombineEx stems from the following snippet in ntpath.join:

            if p_path and p_path[0] in seps:
                # Second path is absolute
                if p_drive or not result_drive:
                    result_drive = p_drive
                result_path = p_path

The case that PathCchCombineEx fails is that the second path is rooted [*] but neither UNC nor drive-absolute (i.e. p_drive is empty) and the first path is relative but neither rooted nor drive-relative. When the second path is rooted but not absolute, PathCchCombineEx requires the joined path to use the root of the first path as determined by PathCchStripToRoot. The latter fails for a completely relative path (i.e. no root or drive), as it rightly should. The question is whether the join operation itself should fail because the first path has no root. Python makes a different choice, but it isn't necessarily wrong.

[*] 
Path Type      | Example
====================================
Relative       | file
---------------|--------------------
Rooted         | \file
Drive-Relative | C:file
====================================
Drive-Absolute | C:\file 
UNC            | \\server\share\file
====================================
Extended       | \\?\C:\file
Device         | \\.\C:

In Windows, rooted paths are relative to the current drive or network share, but Python still classifies them as absolute. In contrast, C++ path::is_absolute() requires both has_root_name and has_root_directory in Windows, so L"\\dir" is classified as a relative path.
msg298330 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-07-14 03:27
BTW, I don't see why one would expect join(r"C:\dir1", "D:dir2") to return r"D:\dir1\dir2" instead of "D:dir2". Python's result is in agreement with Windows PathCchCombineEx. Paths on different drives should not be combined. The first path has to be ignored:

            elif p_drive and p_drive != result_drive:
                if p_drive.lower() != result_drive.lower():
                    # Different drives => ignore the first path entirely
                    result_drive = p_drive
                    result_path = p_path
                    continue
msg298341 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-14 05:58
Fair point. I was thinking of how chdir handles it, but of course that's relative to the cwd on D, so the rest of the path on C is ignored. D:dir2 is correct.
History
Date User Action Args
2022-04-11 14:58:48adminsetgithub: 75089
2017-07-14 05:58:23steve.dowersetmessages: + msg298341
2017-07-14 03:27:41eryksunsetmessages: + msg298330
2017-07-13 20:16:14eryksunsetmessages: + msg298313
2017-07-13 19:04:06steve.dowersetstatus: open -> closed
resolution: not a bug
messages: + msg298308

stage: test needed -> resolved
2017-07-12 18:38:27mesheb82setmessages: + msg298241
2017-07-12 17:22:02eric.smithsetnosy: + eric.smith
messages: + msg298238
2017-07-12 08:01:38serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg298202
2017-07-12 06:39:19steve.dowersetmessages: + msg298199
2017-07-12 06:38:28steve.dowersetversions: + Python 3.7, - Python 2.7, Python 3.6
resolution: not a bug -> (no value)
messages: + msg298198

type: behavior
stage: resolved -> test needed
2017-07-11 22:27:04eryksunsetstatus: closed -> open
nosy: + eryksun
messages: + msg298186

2017-07-11 21:59:18paul.mooresetstatus: open -> closed
resolution: not a bug
messages: + msg298184

stage: resolved
2017-07-11 20:43:54mesheb82create