classification
Title: os.walk always follows Windows junctions
Type: enhancement Stage:
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: 29248 Superseder:
Assigned To: Nosy List: craigh, eric.fahlgren, eryksun, jamercee, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2015-02-07 20:59 by craigh, last changed 2021-05-02 08:29 by eryksun.

Files
File name Uploaded Description Edit
issue23407.patch craigh, 2016-09-25 17:09 review
issue23407-2.patch craigh, 2016-09-25 18:33 review
issue23407-3.patch craigh, 2017-01-14 14:54 review
issue23407-4.patch craigh, 2017-01-14 15:36 review
issue23407-5.patch craigh, 2017-03-10 22:01
Messages (16)
msg235531 - (view) Author: Craig Holmquist (craigh) Date: 2015-02-07 20:59
os.walk follows Windows junctions even if followlinks is False:

>>> import os
>>> appdata = os.environ['LOCALAPPDATA']
>>> for root, dirs, files in os.walk(appdata, followlinks=False):
...	print(root)

C:\Users\Test\AppData\Local
C:\Users\Test\AppData\Local\Apple
C:\Users\Test\AppData\Local\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Apple Computer
C:\Users\Test\AppData\Local\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data
C:\Users\Test\AppData\Local\Application Data\Apple
C:\Users\Test\AppData\Local\Application Data\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Application Data\Apple Computer
C:\Users\Test\AppData\Local\Application Data\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data\Application Data
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple Computer
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data\Apple
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data\Apple Computer
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data\Application Data
[...]

For directory symbolic links, os.walk seems to have the correct behavior.  However, Windows 7 (at least) employs junctions instead of symlinks in situations like the default user profile layout, i.e. the "Application Data" junction shown above.

I also noticed that, for junctions, os.path.islink returns False but os.stat and os.lstat return different results.
msg235533 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-02-07 23:36
To check for a link on Windows, os.walk calls ntpath.islink, which calls os.lstat. Currently the os.lstat implementation only sets S_IFLNK for symbolic links. attribute_data_to_stat could also check for junctions (IO_REPARSE_TAG_MOUNT_POINT). For consistency, os.readlink should also read junctions (rdb->MountPointReparseBuffer).

islink
https://hg.python.org/cpython/file/7b493dbf944b/Lib/ntpath.py#l239

attribute_data_to_stat
https://hg.python.org/cpython/file/7b493dbf944b/Modules/posixmodule.c#l1515

win_readlink
https://hg.python.org/cpython/file/7b493dbf944b/Modules/posixmodule.c#l10056

REPARSE_DATA_BUFFER
https://hg.python.org/cpython/file/7b493dbf944b/Modules/winreparse.h#l11
msg277385 - (view) Author: Craig Holmquist (craigh) Date: 2016-09-25 17:09
The attached patch changes _Py_attribute_data_to_stat to set S_IFLNK for both symlinks and junctions, and changes win_readlink to return the target path for junctions (IO_REPARSE_TAG_MOUNT_POINT) as well as symlinks.

I'm not sure what to do as far as adding a test--either Python needs a way to create junctions or the test needs to rely on the ones Windows creates by default.

Incidentally, the existing win_readlink doesn't always work correctly with symbolic links, either (this is from 3.5.2):  

>>> import os
>>> os.readlink(r'C:\Users\All Users')
'\x00\x00f\x00\u0201\x00\x02\x00\x00\x00f\x00\x00\x00'

The problem is that PrintNameOffset is an offset in bytes, so it needs to be divided by sizeof(WCHAR) if you're going to add it to a WCHAR pointer (https://msdn.microsoft.com/en-us/library/windows/hardware/ff552012(v=vs.85).aspx).  Some links still seem to work correctly because PrintNameOffset is 0.  The attached patch fixes this problem also--I wasn't sure if I should open a separate issue for it.
msg277389 - (view) Author: Craig Holmquist (craigh) Date: 2016-09-25 18:20
Actually, it looks like there is already a way to create junctions and a test for them in test_os.  However, it includes this line:

# Junctions are not recognized as links.        self.assertFalse(os.path.islink(self.junction))

That suggests the old behavior is intentional--does anyone know why?
msg277393 - (view) Author: Craig Holmquist (craigh) Date: 2016-09-25 18:33
Updated patch with changes to Win32JunctionTests.
msg285273 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2017-01-11 21:23
> # Junctions are not recognized as links.        self.assertFalse(os.path.islink(self.junction))

If the above comment is intended as a statement of fact, then it's inconsistent with the implementation of Py_DeleteFileW ( https://hg.python.org/cpython/file/v3.6.0/Modules/posixmodule.c#l4178 ).
msg285301 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-01-12 10:52
I opened issue 29248 for the os.readlink bug and issue 29250 for the inconsistency between os.path.islink and os.stat.

Handling junctions as links is new behavior, so I've changed this issue to be an enhancement for 3.7.

If the notion of a link is generalized to junctions, then maybe it should be further generalized to include all name-surrogate reparse tags [1]. Currently for Microsoft tags this includes 

    IO_REPARSE_TAG_MOUNT_POINT (junctions)
    IO_REPARSE_TAG_SYMLINK
    IO_REPARSE_TAG_IIS_CACHE

For non-Microsoft tags it includes 

    IO_REPARSE_TAG_SOLUTIONSOFT
    IO_REPARSE_TAG_OSR_SAMPLE
    IO_REPARSE_TAG_QI_TECH_HSM
    IO_REPARSE_TAG_MAXISCALE_HSM

The last two are outliers. HSM isn't the kind of immediate, fast access that one would expect from a symbolic link. All other HSM tags aren't categorized as name surrogates.

[1]: https://msdn.microsoft.com/en-us/library/aa365511
msg285351 - (view) Author: Craig Holmquist (craigh) Date: 2017-01-12 23:00
Can you point me toward any documentation on the additional tags you want to support?  Searching for IO_REPARSE_TAG_IIS_CACHE mostly seems to yield header files that define it (and nothing at all on MSDN), and the non-Microsoft tags just yield a few results each.

(For comparison, the junction and symbolic link tags yield 10K+ results each.)

Junctions are created with each user's home directory so they exist on every Windows system, even if the user never explicitly creates them.  The additional tags seem like they're far less common and much less well-documented.
msg285355 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-01-13 00:38
I simply listed the tags that have the name-surrogate bit set out of those defined in km\ntifs.h. 

To keeps things simple it might be better to only include Microsoft tags (i.e. bit 31 is set). That way we don't have to deal with REPARSE_GUID_DATA_BUFFER struct that's used from non-Microsoft reparse points.
msg285458 - (view) Author: Craig Holmquist (craigh) Date: 2017-01-14 06:16
FWIW, the only name-surrogate tags in the user-mode SDK headers (specifically winnt.h) are IO_REPARSE_TAG_MOUNT_POINT and IO_REPARSE_TAG_SYMLINK, as of at least the Windows 8.1 SDK.
msg285486 - (view) Author: Craig Holmquist (craigh) Date: 2017-01-14 14:54
Here's a new patch:  now, _Py_attribute_data_to_stat and Py_DeleteFileW will just use the IsReparseTagNameSurrogate macro to determine if the file is a link, so os.walk etc. will know not to follow them.  os.readlink, however, will only work with junctions and symbolic links; otherwise it will raise ValueError with "unsupported reparse tag".

This way, there's a basic level of support for all name-surrogate tags, but os.readlink only works with the ones whose internal structure is (semi-) documented.
msg285488 - (view) Author: Craig Holmquist (craigh) Date: 2017-01-14 15:36
New patch with spaces instead of tabs
msg285499 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-01-14 20:11
Craig, can you add a patch for issue 29248, including a test based on the "All Users" link?
msg356677 - (view) Author: Jim Carroll (jamercee) * Date: 2019-11-15 14:33
I can confirm the os.walk() behavior still exists on 3.8. Just curious on the status of the patch?
msg356882 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-11-18 16:53
At a minimum, it needs to be turned into a GitHub PR.

We've made some significant changes in this area in 3.8, so possibly the best available code is now in shutil.rmtree (or shutil.copytree) rather than the older patch files.
msg392674 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-05-02 08:29
Windows implements filesystem symlinks and mountpoints as name-surrogate reparse points. Python 3.8 introduced behavior changes to how reparse points are supported, but the stat st_mode value still sets S_IFLNK only for actual symlinks, not for mountpoints. This ensures that if os.path.islink() is true, it's safe to read its target and copy it via os.readlink() and os.symlink().

A mountpoint is not equivalent to a symlink in a few cases, so it shouldn't always be handled the same or copied as a symlink. The major difference is that mountpoints in a remote path are evaluated by the server, whereas symlinks in a remote path are evaluated by the client. Also, during path parsing, the target of a symlink replaces the opened path, but mountpoints are retained in the opened path (except if the target path contains a symlink, but that's broken in remote paths and should be avoided). This means that relative ".." components and rooted paths in a relative symlink target will traverse a mountpoint as if it's just a directory in the opened path. That's an important distinction, but in practice I'd steer someone away from relying on it, especially if a filesystem is mounted in multiple locations (e.g. on both a DOS drive and a directory), else resolution of the symlink will depend on which mountpoint is used.

It's best to handle mountpoints as if they're symlinks when deleting a tree because the way they're implemented as reparse points doesn't prevent loops. However, when walking a tree, you may or may not want to traverse a mountpoint. If it's traversed, a seen set() can be used to remember previously traversed directories, in order to prevent loops. As Steve mentioned, look to the implementation of shutil.rmtree() as an example. 

However, don't look to shutil.copytree() since it's wrong. The is_symlink() method of a scandir() entry is only true for an actual symlink, not a mountpoint, so the extra check that copytree() does is redundant. I think it was left in by mistake when the plan was to handle mountpoints as symlinks. It would be nice if we could copy a mountpoint instead of traversing it in copytree(), but the private implementation of _winapi.CreateJunction() isn't well-behaved and tested enough to be promoted into the standard library as something like os.mount().
History
Date User Action Args
2021-05-02 08:31:34eryksunlinkissue44008 superseder
2021-05-02 08:29:08eryksunsetmessages: + msg392674
2021-05-02 08:28:45eryksunsetmessages: - msg389286
2021-03-22 09:00:50eryksunsetmessages: + msg389286
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.7
2019-11-18 16:53:43steve.dowersetmessages: + msg356882
2019-11-15 14:33:24jamerceesetnosy: + jamercee
messages: + msg356677
2019-08-22 04:55:31eryksununlinkissue29250 dependencies
2017-03-10 22:01:42craighsetfiles: + issue23407-5.patch
2017-01-14 20:11:02eryksunsetdependencies: + os.readlink fails on Windows
messages: + msg285499
2017-01-14 15:36:30craighsetfiles: + issue23407-4.patch

messages: + msg285488
2017-01-14 14:54:07craighsetfiles: + issue23407-3.patch

messages: + msg285486
2017-01-14 06:16:35craighsetmessages: + msg285458
2017-01-13 00:38:24eryksunsetmessages: + msg285355
2017-01-12 23:00:38craighsetmessages: + msg285351
2017-01-12 10:53:22eryksunlinkissue29250 dependencies
2017-01-12 10:52:28eryksunsettype: behavior -> enhancement
messages: + msg285301
versions: + Python 3.7, - Python 3.4, Python 3.5
2017-01-11 21:23:26eric.fahlgrensetnosy: + eric.fahlgren
messages: + msg285273
2016-09-25 18:33:37craighsetfiles: + issue23407-2.patch

messages: + msg277393
2016-09-25 18:20:52craighsetmessages: + msg277389
2016-09-25 17:09:37craighsetfiles: + issue23407.patch
keywords: + patch
messages: + msg277385
2015-02-07 23:36:37eryksunsetversions: + Python 3.5
nosy: + tim.golden, eryksun, zach.ware, steve.dower

messages: + msg235533

components: + Windows
2015-02-07 20:59:23craighcreate