classification
Title: Support preserving path meaning in os.path.normpath() and abspath()
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barneygale, eryksun, terry.reedy
Priority: normal Keywords: patch

Created on 2021-06-04 21:21 by barneygale, last changed 2021-06-14 03:35 by eryksun.

Pull Requests
URL Status Linked Edit
PR 26694 open barneygale, 2021-06-12 17:45
Messages (7)
msg395122 - (view) Author: Barney Gale (barneygale) * Date: 2021-06-04 21:21
>>> os.path.normpath('a/./b/../c//.')
'a/c'
>>> pathlib.Path('a/./b/../c//.')
PosixPath('a/b/../c')

pathlib takes care not to change the meaning of the path when normalising. That means preserving '..' entries, as these can't be simplified without resolving symlinks etc.

normpath(), on the other handle, /always/ eliminates '..' entries, which can change the meaning of the path.

We could add a new argument to `normpath()` and `abspath()` that leaves '..' entries intact. This was closed as "won't fix" back in bpo-2289, but I think it's worth re-considering.

This enhancement would be helpful for my longer-term work to make pathlib an OOP wrapper of os + os.path, rather than a parallel implementation.
msg395672 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-06-12 00:38
I think you should propose this for discussion on python-ideas list to try for more support. If you can, say more about why reconsider.
msg395697 - (view) Author: Barney Gale (barneygale) * Date: 2021-06-12 18:29
Thanks Terry, I've added a topic here: https://discuss.python.org/t/pathlib-and-os-path-code-duplication-and-feature-parity/9239

The bit about `normpath()` is towards the middle of the post.
msg395698 - (view) Author: Barney Gale (barneygale) * Date: 2021-06-12 18:47
For this bug specifically, the pathlib docs describe the desirable behaviour:

<quote>

Spurious slashes and single dots are collapsed, but double dots ('..') are not, since this would change the meaning of a path in the face of symbolic links:

>>> PurePath('foo//bar')
PurePosixPath('foo/bar')
>>> PurePath('foo/./bar')
PurePosixPath('foo/bar')
>>> PurePath('foo/../bar')
PurePosixPath('foo/../bar')

(a naïve approach would make PurePosixPath('foo/../bar') equivalent to PurePosixPath('bar'), which is wrong if foo is a symbolic link to another directory)

</quote>
msg395714 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-06-12 21:05
> single dots are collapsed

For pathlib, I've previously discussed a desire to retain a leading dot component from the initializing path. This could be implemented in strict mode for normpath(). 

A leading dot is significant in the path of an executable in a search context, such as the first item in the args sequence of subprocess.Popen(). For example, if "./spam" is normalized as "spam", then the system will search the PATH directories instead of the current working directory. It's also import to note that in Windows a leading dot component is required in order to preclude searching even if the path contains slashes. For example, CreateProcessW() and SearchPathW() will try to resolve r"spam\eggs.exe" against every directory in the search path, whereas r".\spam\eggs.exe" is explicitly relative to just the current working directory.

Retaining a leading dot component can also be important in Windows in order to disambiguate a drive-relative path from a named data stream of a file that has a single-letter filename. For example, "C:spam" is a file named "spam" in the working path on drive "C:" (e.g. "C:spam" -> r"C:\working\path\spam"), but r".\C:spam" is a data stream named "spam" in a file named "C" in the current working directory.
msg395715 - (view) Author: Barney Gale (barneygale) * Date: 2021-06-12 21:14
I think I agree

How would you feel about two new arguments? Following `os.curdir` and `os.pardir` names:

def normpath(path, *, keep_curdir=False, keep_pardir=False)
    ...
msg395770 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-06-14 03:35
I think separate keep_curdir and keep_pardir options is over-complicating the signature. Also, I'd prefer to remove a dot component if it's not the first component since there's no reason to keep it.

If you plan to use normpath() in pathlib, then the case for special_prefixes in ntpath.normpath() should be removed. Actually, it never should have been added. IIRC it was added as a workaround for a buggy implementation that's no longer an issue. Only \\?\ is special, and that's only when opening/accessing a path. It's not special in GetFullPathNameW(), as is called by ntpath.abspath() in Windows. This needlessly introduces inconsistency for ntpath.abspath() calls in Windows vs Unix.
History
Date User Action Args
2021-06-14 03:35:12eryksunsetmessages: + msg395770
2021-06-12 21:14:41barneygalesetmessages: + msg395715
2021-06-12 21:05:45eryksunsetnosy: + eryksun
messages: + msg395714
2021-06-12 18:47:34barneygalesetmessages: + msg395698
2021-06-12 18:29:42barneygalesetmessages: + msg395697
2021-06-12 17:45:03barneygalesetkeywords: + patch
stage: patch review
pull_requests: + pull_request25279
2021-06-12 00:38:41terry.reedysetnosy: + terry.reedy

messages: + msg395672
versions: + Python 3.11
2021-06-04 21:21:19barneygalecreate