This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: C realpath not used by os.path.realpath
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, eric.araujo, matejcik, neologix, pitrou, rosslagerwall, santoso.wijaya, vstinner
Priority: normal Keywords: patch

Created on 2011-08-20 23:47 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
i12801.patch rosslagerwall, 2011-08-21 05:04
realpath.path vstinner, 2012-02-21 23:51 review
Messages (20)
msg142583 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-20 23:47
We have our own quirky implementation of C realpath() in posixpath.py.
It would probably be simpler, safer and more efficient to simply call the POSIX function instead (but it most be exposed somewhere, by posixmodule.c I suppose).
msg142587 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-20 23:57
The Python implementation (os.path.realpath) was introduced 10 years ago by the issue #461781.

The Python implemtation has known bugs:

 - #9949: os.path.realpath on Windows does not follow symbolic links
 - #11397: os.path.realpath() may produce incorrect results

See also the issue #1298813: _Py_wrealpath() must use canonicalize_file_name() if available.
msg142591 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-08-21 05:04
An initial patch.

The first problem is that before, os.path.realpath('') == os.path.realpath('.')
but this is not true for realpath(3).

Thus, the test suite does not run because it relies on this behaviour.
msg142619 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-08-21 13:53
+   Return a string representing the canonicalized pathname of *path*.

This sounds a bit strange to me; “Return a string representing the canonical version of *path*”.

Would it be a good idea to use the full docstring of posixpath.realpath to os.realpath and then just do “realpath = os.realpath” in posixpath?
msg142909 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-24 20:11
> It would probably be simpler, safer and more efficient to simply call
> the POSIX function instead (but it most be exposed somewhere, by
> posixmodule.c I suppose).

Indeed.

> Would it be a good idea to use the full docstring of posixpath.realpath
> to os.realpath and then just do “realpath = os.realpath” in posixpath?

Yes. That would also be a way to add the '.' default argument.

> An initial patch.

I'm actually concerned because realpath doesn't seem as portable as I first thought. Here's what my man page says:
"""
OSIX.1-2001 says that the behavior if resolved_path is NULL is implementation-defined.  POSIX.1-2008 specifies the behavior described in this page.
"""

So I'm not sure that all implementations support passing a NULL buffer as argument.

Also:
"""
Solaris may return a relative pathname when the path argument is relative.
"""

Well, I guess we can also commit this patch and see if a buildbot breaks...
msg142913 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 20:18
i12801.patch is not correct: on Windows, you should never encode a filename to bytes. Use PyUnicode_Decode() + _Py_wrealpath(), or a #ifdef (as many other posixmodule functions).

I prefer to reuse _Py_wrealpath because we will have to add special cases: realpath(NULL) and also maybe canonicalize_file_name().
msg142925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 21:22
Patch to get #ifdef REALPATH_SUPPORT_NULL:

diff --git a/configure.in b/configure.in
--- a/configure.in
+++ b/configure.in
@@ -1539,6 +1539,17 @@ if test "$have_pthread_t" = yes ; then
 #endif
   ])
 fi
+
+AC_MSG_CHECKING(for realpath with NULL)
+realpath_support_null=no
+AC_COMPILE_IFELSE([
+  AC_LANG_PROGRAM([[#include <stdlib.h>]], [[char *p = realpath(__FILE__, NULL);]])
+],[realpath_support_null=yes],[])
+AC_MSG_RESULT($realpath_support_null)
+if test "$have_pthread_t" = yes ; then
+  AC_DEFINE(REALPATH_SUPPORT_NULL, 1,
+  [Define if the realpath() function supports NULL as the second argument])
+fi
 CC="$ac_save_cc"
 
 AC_SUBST(OTHER_LIBTOOL_OPT)
msg142929 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 21:28
> I prefer to reuse _Py_wrealpath because we will have
> to add special cases: realpath(NULL) and also maybe
> canonicalize_file_name().

I read that canonicalize_file_name(name) just calls realpath(name, NULL), so we can maybe avoid this GNU exception.

I realized that _Py_wrealpath() returns wchar_t*, so it's maybe better to take the first approach (#ifdef MS_WINDOWS).
msg142932 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-24 21:44
> Patch to get #ifdef REALPATH_SUPPORT_NULL:

I'm not really familiar with autotools, but I have the impression that
this will only check that the given code snippet compiles (and it
will), and we want to check that a NULL buffer is supported at
runtime.
Am I missing something?
msg142943 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 22:59
Le mercredi 24 août 2011 23:45:00, vous avez écrit :
> Charles-François Natali <neologix@free.fr> added the comment:
> > Patch to get #ifdef REALPATH_SUPPORT_NULL:
> I'm not really familiar with autotools, but I have the impression that
> this will only check that the given code snippet compiles (and it
> will), and we want to check that a NULL buffer is supported at
> runtime.
> Am I missing something?

Oh, you are right. I don't know autotools, I'm trying to learn.
msg143152 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-29 16:07
Well, if we use two different paths based on the libc version, it might not be a good idea, since behaviour can be different in some cases.
It would be nice to know if some modern platforms have a non-compliant realpath().
msg143163 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-29 17:22
> Well, if we use two different paths based on the libc version, it might not be a good idea, since behaviour can be different in some cases.

Indeed.

> It would be nice to know if some modern platforms have a non-compliant realpath().

Alas, it doesn't seem to hold for OpenBSD:
http://old.nabble.com/Make-realpath(3)-conform-to-SUSv4-td32031895.html

A patch supporting NULL was committed two months ago, which means we
probably can't push this forward.

I've been quite disappointed by POSIX lately...
msg143164 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-29 17:24
> Alas, it doesn't seem to hold for OpenBSD:
> http://old.nabble.com/Make-realpath(3)-conform-to-SUSv4-td32031895.html
> 
> A patch supporting NULL was committed two months ago, which means we
> probably can't push this forward.
> 
> I've been quite disappointed by POSIX lately...

:-)
msg143165 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-08-29 17:25
> I've been quite disappointed by POSIX lately...

POSIX the standard, or the implementers??
msg143167 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-29 17:55
> POSIX the standard, or the implementers??
>

Both :-)

For those wondering why we can't use PATH_MAX (ignoring the buffer
overallocation), here's why:

https://www.securecoding.cert.org/confluence/display/cplusplus/FIO02-CPP.+Canonicalize+path+names+originating+from+untrusted+sources
"""
Avoid using this function. It is broken by design since (unless using
the non-standard resolved_path == NULL feature) it is impossible to
determine a suitable size for the output buffer, resolved_path.
According to POSIX a buffer of size PATH_MAX suffices, but PATH_MAX
need not be a defined constant, and may have to be obtained using
pathconf(3). And asking pathconf(3) does not really help, since on the
one hand POSIX warns that the result of pathconf(3) may be huge and
unsuitable for mallocing memory. And on the other hand pathconf(3) may
return -1 to signify that PATH_MAX is not bounded.
The libc4 and libc5 implementation contains a buffer overflow (fixed
in libc-5.4.13). As a result, set-user-ID programs like mount(8) need
a private version."""
msg146640 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-30 14:57
Another problem with the POSIX realpath(3): it can fail with ENOENT if the target path doesn't exist, whereas the current Python implementation "succeeds" even if a component in the path doesn't exist.
Note the quotes around "succeeds", because I'm not sure whether this behavior is a good thing: the result is just plain wrong. I wonder whether we should break backwards compatibility on this particular point...
msg153916 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-21 23:51
realpath.patch:

 - Add os.realpath()
 - posixpath.realpath() uses os.realpath(), or os.getcwd() if the path is an empty string
 - os.realpath() uses canonicalize_file_name() if available, or use realpath() with a buffer of MAXPATHLEN bytes

Don't use realpath(path, NULL) because it is not possible to test if realpath() supports NULL: OpenBSD did segfault if the buffer was NULL. Testing the support in configure is not a good idea because it depends on the version of the C library. You may use a more recent C library to compile Python for example.

genericpath, macpath and os2emxpath are not patched to use os.realpath(), even if it is available. I don't know if these modules should be patched. ntpath uses the native Windows function (if available): GetFullPathNameW().
msg154247 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-25 13:08
>  - os.realpath() uses canonicalize_file_name() if available, or use realpath() with a buffer of MAXPATHLEN bytes

MAXPATHLEN is not necessarily defined (e.g. on the Hurd): if it's not
defined, it is set either to MAX_PATH (if it's defined an greater than
1024), or arbitrarily to 1024.
Thus, we can't pass it to realpath(3), since we don't know for sure
that realpath(3) won't return a string greater than MAXPATHLEN, which
could result in a buffer overflow.

Also, there's the problem that realpath(3) returns ENOENT, whereas
os.path.realpath() doesn't:

without patch:
$ python -c "import os; os.path.realpath('/nosuchfile')"

with patch:
$ ./python -c "import os; os.path.realpath('/nosuchfile')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/cf/python/cpython/Lib/posixpath.py", line 385, in realpath
    return os.realpath(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/nosuchfile'
msg223878 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-24 19:42
Shall we close this, since realpath(3) is fundamentally broken, and pathlib now does The Right Thing?
msg223883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-24 19:53
I agree, Python should not use the C function realpath() for all the reasons give in the issue.
History
Date User Action Args
2022-04-11 14:57:20adminsetgithub: 57010
2014-07-24 19:53:55vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg223883
2014-07-24 19:42:57neologixsetmessages: + msg223878
2012-02-26 08:07:21Arfreversetnosy: + Arfrever
2012-02-25 13:08:13neologixsetmessages: + msg154247
2012-02-21 23:51:28vstinnersetfiles: + realpath.path

messages: + msg153916
2011-10-30 14:57:10neologixsetmessages: + msg146640
2011-08-29 17:55:44neologixsetmessages: + msg143167
2011-08-29 17:25:21rosslagerwallsetmessages: + msg143165
2011-08-29 17:24:01pitrousetmessages: + msg143164
2011-08-29 17:22:18neologixsetmessages: + msg143163
2011-08-29 16:07:12pitrousetmessages: + msg143152
2011-08-29 15:41:40matejciksetnosy: + matejcik
2011-08-24 22:59:21vstinnersetmessages: + msg142943
2011-08-24 21:45:00neologixsetmessages: + msg142932
2011-08-24 21:31:04santoso.wijayasetnosy: + santoso.wijaya
2011-08-24 21:28:33vstinnersetmessages: + msg142929
2011-08-24 21:22:15vstinnersetmessages: + msg142925
2011-08-24 20:18:44vstinnersetmessages: + msg142913
2011-08-24 20:11:58neologixsetmessages: + msg142909
2011-08-21 13:53:39eric.araujosetnosy: + eric.araujo
messages: + msg142619
2011-08-21 05:04:04rosslagerwallsetfiles: + i12801.patch
keywords: + patch
messages: + msg142591
2011-08-20 23:57:43vstinnersetmessages: + msg142587
2011-08-20 23:47:30pitroucreate