Title: Expose placeholder reparse points in Windows
Type: enhancement Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.9, Python 3.8
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-01-12 14:27 by eryksun, last changed 2020-01-14 18:08 by eryksun. This issue is now closed.

Messages (3)
msg359850 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-12 14:27
Windows 10 apparently defaults to disguising placeholder reparse points in python.exe processes, but exposes them to cmd.exe and powershell.exe processes. 

A common example is a user's OneDrive folder, which extensively uses placeholder reparse points for files and directories. The placeholder file attributes include FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_OFFLINE, and FILE_ATTRIBUTE_SPARSE_FILE, and the reparse tags are in the set
IO_REPARSE_TAG_CLOUD[_1-F] (0x9000[0-F]01A). Currently, we don't see any of this information in a python.exe process when we call FindFirstFile[Ex]W, GetFileAttributesW, or query file information on a file opened with FILE_FLAG_OPEN_REPARSE_POINT, such as when we call os.lstat. 

The behavior is determined by the process or per-thread placeholder-compatibility mode. The process mode can be queried via RtlQueryProcessPlaceholderCompatibilityMode [1]. The documentation says that "[m]ost Windows applications see exposed placeholders by default". I don't know what criteria Windows is using here, but in my tests with python.exe and a simple command-line test program, the default mode is PHCM_DISGUISE_PLACEHOLDER.

Should Python provide some way to call RtlSetProcessPlaceholderCompatibilityMode [2] to set PHCM_EXPOSE_PLACEHOLDERS mode for the current process? Should os.lstat be modified to temporarily expose placeholders -- for the current thread only -- via RtlSetThreadPlaceholderCompatibilityMode [3]? We can dynamically link to this ntdll function via GetProcAddress. It returns the previous mode, which we can restore after querying the file.

msg359982 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-01-14 17:13
Given the minimum version requirement, I'd rather this support go into a third-party library. (Seems like a great candidate for a context manager, too.)

Recalling our debates about symlinks, I'd have to say that nothing about placeholder files qualifies them as links, regardless of whether Powershell puts "l" in the attributes summary :)

The ecosystem could really do with a Windows-aware filesystem library for this kind of support (and I might already be working on one occasionally, pitching it as a MSFT-supported package, which is why it's not public yet).

I'd rather keep the standard library as lowest common denominator for system interactions, particularly for behaviour like this that is either automatic+surprising or manual+platform-specific and insufficiently compelling (os.add_dll_directory being an example of something that was sufficiently compelling).

So I'm going to mark this as rejected, and steal the idea for my own library :D
msg359985 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-14 18:08
Okay, a well-known third-party library will work if a script/application really needs this information. I just wanted to bring it up for consideration because I saw an issue for cross-platform PowerShell 6 [1] where it was decided to disable placeholder disguising, but that particular decision was motivated by the need to remain compatible with Windows PowerShell 5.


> Recalling our debates about symlinks, I'd have to say that nothing 
> about placeholder files qualifies them as links, regardless of 
> whether Powershell puts "l" in the attributes summary :)

Certainly. A link (broadly speaking, including Unix-style symlinks and mount points) has to be a name surrogate. These OneDrive reparse points do not have the [N]ame surrogate bit set. It's not even allowed to be set because they have the [D]irectory bit set, which allows the directory entry in the filesystem to contain files. This is explained in km\ntifs.h:

    // The reparse tags are a ULONG. The 32 bits are laid out as follows:
    //   3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
    //   1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
    //  +-+-+-+-+-----------------------+-------------------------------+
    //  |M|R|N|D|     Reserved bits     |       Reparse Tag Value       |
    //  +-+-+-+-+-----------------------+-------------------------------+
    // M is the Microsoft bit. When set to 1, it denotes a tag owned by Microsoft.
    //   All ISVs must use a tag with a 0 in this position.
    //   Note: If a Microsoft tag is used by non-Microsoft software, the
    //   behavior is not defined.
    // R is reserved.  Must be zero for non-Microsoft tags.
    // N is name surrogate. When set to 1, the file represents another named
    //   entity in the system.
    // D is the directory bit. When set to 1, indicates that any directory
    //   with this reparse tag can have children. Has no special meaning when used
    //   on a non-directory file. Not compatible with the name surrogate bit.
Date User Action Args
2020-01-14 18:08:32eryksunsetmessages: + msg359985
2020-01-14 17:13:17steve.dowersetstatus: open -> closed
resolution: rejected
messages: + msg359982

stage: needs patch -> resolved
2020-01-12 14:27:40eryksuncreate