classification
Title: os.link(..., follow_symlinks=True) broken on Linux
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, jo-he, larry, serhiy.storchaka, takluyver
Priority: normal Keywords: patch

Created on 2019-07-17 23:04 by jo-he, last changed 2021-03-23 17:43 by takluyver.

Pull Requests
URL Status Linked Edit
PR 14843 open jo-he, 2019-07-18 14:53
PR 24997 open takluyver, 2021-03-23 17:43
Messages (4)
msg348086 - (view) Author: Jo Henke (jo-he) * Date: 2019-07-17 23:04
Regarding link() POSIX states (https://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html):

  "If path1 names a symbolic link, it is implementation-defined whether link() follows the symbolic link, or creates a new link to the symbolic link itself."

In Linux, link() does _not_ follow symlinks (http://man7.org/linux/man-pages/man2/link.2.html):

  "By default, linkat(), does not dereference oldpath if it is a symbolic link (like link())."

But Python 3 assumes the opposite to be always the case:

  https://github.com/python/cpython/blob/8cb65d1381b027f0b09ee36bfed7f35bb4dec9a9/Modules/posixmodule.c#L3517

...which suits e.g. NetBSD (https://netbsd.gw.com/cgi-bin/man-cgi?link+2):

  "When operating on a symlink, link() resolves the symlink and creates a hard link on the target."


Therefore, I recommend to always call linkat(), if the platform provides it. That's the modern superset of link() with clearly defined behavior.


Here are some commands to reproduce the issue on Linux (should hard link 'file' -> 'link', but tries '/tmp/symlink' -> 'link'):

~$ : >file
~$ ln -s "$PWD/file" /tmp/symlink
~$ strace -e link,linkat python -c 'import os; os.link("/tmp/symlink", "link", follow_symlinks=True)'
link("/tmp/symlink", "link")            = -1 EXDEV (Cross-device link)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OSError: [Errno 18] Cross-device link: '/tmp/symlink' -> 'link'
+++ exited with 1 +++


For comparison, calling linkat() without AT_SYMLINK_FOLLOW results in the same error:

~$ strace -e link,linkat python -c 'import os; os.link("/tmp/symlink", "link", follow_symlinks=False)'
linkat(AT_FDCWD, "/tmp/symlink", AT_FDCWD, "link", 0) = -1 EXDEV (Cross-device link)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OSError: [Errno 18] Cross-device link: '/tmp/symlink' -> 'link'
+++ exited with 1 +++


Currently, the only way to call linkat() with AT_SYMLINK_FOLLOW from Python, is to provide a directory file descriptor != AT_FDCWD:

~$ strace -e link,linkat python -c 'import os; d=os.open(".", 0); os.link("/tmp/symlink", "link", dst_dir_fd=d, follow_symlinks=True)'
linkat(AT_FDCWD, "/tmp/symlink", 3, "link", AT_SYMLINK_FOLLOW) = 0
+++ exited with 0 +++
msg348103 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-18 07:10
Not always linkat() can be used instead of link(). But on Windows (where there is no linkat()) os.link() creates a new link to the symbolic link itself. This is yet one argument for making follow_symlinks=False by default and changing the default behavior of os.link() on NetBSD.
msg348110 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-07-18 11:18
> on Windows (where there is no linkat()) os.link() creates a new link
> to the symbolic link itself. 

Yes, CreateHardLinkW opens the source file by calling NtOpenFile with the option FILE_OPEN_REPARSE_POINT. So the behavior is follow_symlinks=False. 

Note, however, that Windows has distinct file and directory reparse points, so we can't hardlink to a directory symlink, or any other type of directory reparse point such as a junction mountpoint. In Unix, follow_symlinks=False (if implemented) allows creating a hardlink to a symlink that targets a directory.

Also, I noticed that I can pass follow_symlinks=False in Windows, but this should raise NotImplementedError. It's supposed to be checked via follow_symlinks_specified().
msg348111 - (view) Author: Jo Henke (jo-he) * Date: 2019-07-18 11:23
The problem that POSIX does not define the behavior of link() regarding symlinks (and that Unix implementations differ indeed), is independent from Python's os.link() defaults.

Since it makes no sense to call link(), when linkat() is available, I propose this change:

--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -3512,15 +3512,11 @@ os_link_impl(PyObject *module, path_t *src, path_t *dst, int src_dir_fd,
 #else
     Py_BEGIN_ALLOW_THREADS
 #ifdef HAVE_LINKAT
-    if ((src_dir_fd != DEFAULT_DIR_FD) ||
-        (dst_dir_fd != DEFAULT_DIR_FD) ||
-        (!follow_symlinks))
-        result = linkat(src_dir_fd, src->narrow,
-            dst_dir_fd, dst->narrow,
-            follow_symlinks ? AT_SYMLINK_FOLLOW : 0);
-    else
+    result = linkat(src_dir_fd, src->narrow, dst_dir_fd, dst->narrow,
+                    follow_symlinks ? AT_SYMLINK_FOLLOW : 0);
+#else
+    result = link(src->narrow, dst->narrow);
 #endif /* HAVE_LINKAT */
-        result = link(src->narrow, dst->narrow);
     Py_END_ALLOW_THREADS

     if (result)


This fix also simplifies the code, and should be safe for back-porting.

Whether Python's defaults should be changed regarding Windows and those (mostly obsolete) Unix platforms that do not provide linkat(), is another question.
History
Date User Action Args
2021-03-23 17:43:01takluyversetnosy: + takluyver
pull_requests: + pull_request23755
2021-03-12 23:23:24eryksunsetcomponents: + Extension Modules, - Library (Lib)
versions: + Python 3.10, - Python 3.5, Python 3.6, Python 3.7
2019-07-18 14:53:36jo-hesetkeywords: + patch
stage: patch review
pull_requests: + pull_request14634
2019-07-18 11:23:07jo-hesetmessages: + msg348111
2019-07-18 11:18:03eryksunsetmessages: + msg348110
2019-07-18 07:10:03serhiy.storchakasetnosy: + eryksun, serhiy.storchaka, larry
messages: + msg348103
2019-07-17 23:04:55jo-hecreate