Title: Pathlib incorrectly merges strings.
Type: behavior Stage: resolved
Components: IO, Library (Lib), Windows Versions: Python 3.6
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Roffild, eric.smith, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2018-11-13 00:06 by Roffild, last changed 2018-11-13 19:03 by eryksun. This issue is now closed.

Messages (4)
msg329776 - (view) Author: Roffild (Roffild) Date: 2018-11-13 00:06
import os
print(os.path.join("C:/123\\345", "\\", "folder///filename.bin"))
import pathlib
print(pathlib.PureWindowsPath("C:/123\\345", "\\", "folder///filename.bin"))


Expected result for Windows:

The number of slashes should be controlled by the library. Replacing / on \ should also depend on the OS.
msg329779 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-11-13 00:27
As far as which path components are returned, I think this is working as designed. The documentation for os.path.join says:

If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered.

pathlib's documentation for creating PurePath objects says:

When several absolute paths are given, the last is taken as an anchor (mimicking os.path.join()’s behaviour)
msg329783 - (view) Author: Roffild (Roffild) Date: 2018-11-13 01:06
It is necessary to assemble a single path from several lines depending on the OS.

It is logical to expect behavior in Java.
Converts a path string, or a sequence of strings that when joined form a path string, to a Path. If more does not specify any elements then the value of the first parameter is the path string to convert. If more specifies one or more elements then each non-empty string, including first, is considered to be a sequence of name elements (see Path) and is joined to form a path string. The details as to how the Strings are joined is provider specific but typically they will be joined using the name-separator as the separator. For example, if the name separator is "/" and getPath("/foo","bar","gus") is invoked, then the path string "/foo/bar/gus" is converted to a Path. A Path representing an empty path is returned if first is the empty string and more does not contain any non-empty strings.

My temporary fix is something like this:
print("\\".join(["C:/123\\345", "\\", "folder///filename.bin"]).replace("/", "\\").replace("\\\\", "\\").replace("\\\\", "\\"))
msg329861 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-11-13 19:03
os.path.join and pathlib are working as designed and documented. Similarly in POSIX we have the following:

    >>> p = os.path.join("/123/345", "/", "folder///filename.bin")
    >>> print(p)
    >>> print(os.path.normpath(p))

The difference is that in Windows there's no root ("/") filesystem, but instead a set of DOS devices (e.g. "C:", "CON:") and UNC shares (e.g. r"\\server\share"), so the Windows implementation of join() uses the drive from the already-joined components in order to resolve a rooted component.
Date User Action Args
2018-11-13 19:03:31eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg329861

resolution: not a bug
stage: resolved
2018-11-13 01:06:32Roffildsetmessages: + msg329783
2018-11-13 00:27:20eric.smithsetnosy: + eric.smith
messages: + msg329779
2018-11-13 00:06:16Roffildcreate