Author eryksun
Recipients Deniz Bozyigit, eryksun, gary ruben, giampaolo.rodola, paul.moore, r.david.murray, steve.dower, tim.golden, zach.ware
Date 2019-02-19.03:27:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1550546848.14.0.155565185689.issue33935@roundup.psfhosted.org>
In-reply-to
Content
Here's a WebDAV example:

    net use Z: \\live.sysinternals.com\tools

Both st_dev (volume serial number) and st_ino (file number) are 0 for all files on this drive. _getfinalpathname also fails since WebDAV doesn't support the default FILE_NAME_NORMALIZED flag (i.e. replace 8.3 short names in the path with normal names). We need realpath(), which will reasonably handle cases where _getfinalpathname fails. See issue 14094.

samefile() has to go back into ntpath.py in Windows. The generic implementation relies on the POSIX guarantee that the tuple (st_dev, st_ino) uniquely identifies a file, which Windows doesn't provide.

I suggest the following for samefile(). If either st_ino or st_dev is different, return False. If they're equal and non-zero, return True. Otherwise compare the final paths. If st_ino is zero, compare the entire paths. If st_ino is non-zero, then compare only the drives. (This supports the unusual case of hardlinks on a volume that has no serial number.) The final paths can come from realpath() if issue 14094 is resolved. For example:

    def samefile(fn1, fn2):
        """Test whether two file names reference the same file"""
        s1 = os.stat(fn1)
        s2 = os.stat(fn2)
        if s1.st_ino != s2.st_ino or s1.st_dev != s2.st_dev:
            return False
        if s1.st_ino and s1.st_dev:
            return True
        rp1, rp2 = realpath(fn1), realpath(fn2)
        if s1.st_ino:
            return splitdrive(rp1)[0] == splitdrive(rp2)[0]
        return rp1 == rp2

For sameopenfile(), it's trivial to extend _getfinalpathname to support the argument as a file-descriptor, in which case it simply calls _get_osfhandle to get the handle instead of opening the file via CreateFileW. This loses the flexibility of realpath(), but below I propose extending the range of paths supported by _getfinalpathname.

---

Note that the root directory in FAT32 is file number 0. For NTFS, the file number is never 0. The high word of the 64-bit file reference number is a sequence number that begins at 1. 

For local drives the volume serial number shouldn't be zero. It's possible to manually change it to zero, but that's intentional mischief. There's a small chance that two drives have the same serial number, and an even smaller chance that we get an (st_dev, st_ino) match that's a false positive. I'm not happy with that, however small the probability, but I don't know a simple way to address the problem.

For local storage, Windows does have a device number that's similar to POSIX st_dev. It's actually three numbers -- device type (16-bit), device number (32-bit), and partition number (32-bit) -- that taken together constitute an 80-bit ID. The problem is that we have to query IOCTL_STORAGE_GET_DEVICE_NUMBER directly using a handle for the volume device. Getting a handle for the volume can be expensive since we may be starting from a file handle or have a volume that's mounted as a filesystem junction. Plus this lacks generality since it's not implemented by MUP (Multiple UNC Provider, the proxy device for UNC providers) -- not even for a local SMB share such as "\\localhost\C$". 

To improve reliability for corner cases, _getfinalpathname could be extended to try all path types, with and without normalization. Start with the DOS name. Next try the GUID name (i.e. a device that supports the mountpoint manager but isn't auto-mounted as a DOS drive or file-system junction) and finally the NT name (i.e. a device that doesn't support the mountpoint manager, such as an ImDisk virtual disk, the named-pipe file system, or mailslot file system). For an NT path, _getfinalpathname can try to manually resolve a normal mountpoint via QueryDosDeviceW, but limit this to just drive-letter names and well-known global names from "HKLM\System\CurrentControlSet\Control\Session Manager\DOS Devices" (e.g. PIPE, MAILSLOT). If there's no normal mountpoint, prefix the NT path with the "\\?\GLOBALROOT" link. For example a file named "\spam" on the devices "\Device\ImDisk0" (a ramdisk, say it's mounted as R:), "\Device\NamedPipe", and "\Device\NtOnly" would resolve to "\\?\R:\spam", "\\?\PIPE\spam", and "\\?\GLOBALROOT\Device\NtOnly\spam".
History
Date User Action Args
2019-02-19 03:27:28eryksunsetrecipients: + eryksun, paul.moore, giampaolo.rodola, tim.golden, r.david.murray, zach.ware, steve.dower, Deniz Bozyigit, gary ruben
2019-02-19 03:27:28eryksunsetmessageid: <1550546848.14.0.155565185689.issue33935@roundup.psfhosted.org>
2019-02-19 03:27:28eryksunlinkissue33935 messages
2019-02-19 03:27:27eryksuncreate