classification
Title: Support different modes in posixpath.realpath()
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barneygale, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-05-11 18:54 by serhiy.storchaka, last changed 2021-04-08 00:42 by barneygale.

Files
File name Uploaded Description Edit
realpath_mode.patch serhiy.storchaka, 2016-05-11 18:54 review
Messages (2)
msg265335 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-11 18:54
For now posixpath.realpath() don't raise an exception if encounter broken link. Instead it just lefts broken link name and following path components unresolved. This is dangerous since broken link name can be collapsed with following ".." and resulting valid path can point at wrong location. May be this is even security issue.

On other hand, Path.resolve() raises an exception when encounters broken link. This is not always desirable, there is a wish to make it more lenient. See issue19717 for more information.

The readlink utility from GNU coreutils has three mode for resolving file path:

       -f, --canonicalize
              canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist

       -e, --canonicalize-existing
              canonicalize by following every symlink in every component of the given name recursively, all components must exist

       -m, --canonicalize-missing
              canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence

Current behavior of posixpath.realpath() is matches (besides one minor detail) to `readlink -m`. The behavior of Path.resolve() matches `readlink -e`.

Proposed preliminary patch implements the support of all three modes in posixpath.realpath(): CAN_MISSING, CAN_ALL_BUT_LAST and CAN_EXISTING. It exactly matches the behavior of readlink. The default mode is CAN_MISSING.

There is minor behavior difference in the default mode. If there is a file "file", a link "link" that points to "file" and a broken link "broken", then "broken/../link" was resolved to "link" and now it is resolved to "file".

The patch lacks the documentation. Ternary flag looks as not the best API. Binary flag would be better. But I don't know what can be dropped. CAN_MISSING is needed for compatibility, but it looks less useful and may be insecure (not more than normpath()). CAN_EXISTING and CAN_ALL_BUT_LAST is needed in different cases. I think that in many cases CAN_ALL_BUT_LAST is actually needed instead of CAN_MISSING.

After resolving this issue the solution will be adopted for Path.resolve().
msg390500 - (view) Author: Barney Gale (barneygale) * Date: 2021-04-08 00:42
Just stumbled upon this issue after submitting a PR: https://github.com/python/cpython/pull/25264

In my PR, `strict=False` is like `--canonicalize-missing`, and `strict=True` is like `--canonicalize-existing`.

Looks like our patches are along similar lines. I've missed a trick by not calling `stat()` to trigger the ELOOP error.
History
Date User Action Args
2021-04-08 00:42:45barneygalesetnosy: + barneygale
messages: + msg390500
2016-05-11 18:54:44serhiy.storchakacreate