This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.samefile incorrect results for shadow copies
Type: behavior Stage: needs patch
Components: Windows Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, nijave, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2022-02-16 01:45 by nijave, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
shadow-repro.py nijave, 2022-02-16 01:50
Messages (4)
msg413307 - (view) Author: Nick Venenga (nijave) Date: 2022-02-16 01:45
shutil.copy fails to copy a file from a shadow copy back to its original file since os.path.samefile returns True. os.path.samefile doesn't reliably detect these files are different since it relies on ino which is the same for both files

>>> sc = pathlib.Path('//?/GLOBALROOT/Device/HarddiskVolumeShadowCopy3/test.file')
>>> o = pathlib.Path("V:/test.file")
>>> os.path.samefile(sc, o)
True
>>> os.stat(sc)
os.stat_result(st_mode=33206, st_ino=3458764513820579328, st_dev=1792739134, st_nlink=1, st_uid=0, st_gid=0, st_size=1, st_atime=1644973968, st_mtime=1644974052, st_ctime=1644973968)
>>> os.stat(o)
os.stat_result(st_mode=33206, st_ino=3458764513820579328, st_dev=1792739134, st_nlink=1, st_uid=0, st_gid=0, st_size=2, st_atime=1644973968, st_mtime=1644974300, st_ctime=1644973968)
>>> open(sc, "r").read()
'1'
>>> open(o, "r").read()
'12'

In the above example, you can see the shadow copy file and the original file. Their mode and ino are the same, but their modified time and contents are different
msg413309 - (view) Author: Nick Venenga (nijave) Date: 2022-02-16 01:50
This script can reproduce the issue. 

The computer must be a Windows computer with volume shadow copy service enabled
The computer must have shadow storage added to the drive being used
This script changes the host machine by creating a shadow copy (permissions to create shadow copies are required)
msg413331 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-02-16 12:48
Python uses the volume serial number (VSN) and file ID for st_dev and st_ino. The OS allows the file ID to be 0 if the filesystem doesn't support file IDs. Also, it does not require or force the VSN to be a unique ID in the system, though if it's not 0 it's almost always a random 32-bit number for which the chance of collision is vanishingly small (notwithstanding a volume shadow copy, apparently). This means that (st_dev, st_ino) by itself is not sufficient to check whether two paths are the same file in Windows.

Proposal:

When comparing two file paths, if their (st_dev, st_ino) values differ, then they're not the same file. If their (st_dev, st_ino) values are the same, use the final NT paths from calling GetFinalPathNameByHandleW() with the flags VOLUME_NAME_NT | FILE_NAME_NORMALIZED. If only one of the paths supports FILE_NAME_NORMALIZED, then they're not the same file. If neither supports FILE_NAME_NORMALIZED, fall back on VOLUME_NAME_NT | FILE_NAME_OPENED. If either st_dev is 0 or st_ino is 0, the files are the same only if the final NT paths are the same. Else split out each device path. If the device paths are the same, then the paths are the same file. Otherwise they're different files.

We should probably special case the comparison of a multiple UNC provider path with a local volume path. For example r'\\localhost\C$\Windows' is the same as r'C:\Windows'. The corresponding NT paths are r'\Device\Mup\localhost\C$\Windows' and typically r'\Device\HarddiskVolume2\Windows'. The special case is that when one of the device paths is "\Device\Mup", the two device paths are not required to be the same. Of course, this is given that the (st_dev, st_ino) values are the same, and neither st_dev nor st_ino is zero.

That said, we would need to exclude volume shadow copies from the special case. I suppose we could just look for "VolumeShadowCopy" in the device name. Maybe we can do better. I've noticed that querying IOCTL_STORAGE_GET_DEVICE_NUMBER fails for a volume shadow copy, but that's probably going overboard.
msg413385 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-02-17 04:52
Sample implementation:

    import os
    import msvcrt
    import win32file 

    def samefile(f1, f2):
        """Test whether two paths refer to the same file or directory."""
        s1 = os.stat(f1)
        s2 = os.stat(f2)
        return _common_same_file(f1, f2, s1, s2)


    def sameopenfile(fd1, fd2):
        """Test whether two file descriptors refer to the same file."""
        s1 = os.fstat(fd1)
        s2 = os.fstat(fd2)
        return _common_same_file(fd1, fd2, s1, s2)


    def _common_same_file(f1, f2, s1, s2):
        if s1.st_ino != s2.st_ino or s1.st_dev != s2.st_dev:
            return False

        # (st_dev, st_ino) may be insufficient on its own. Use the final
        # NT path of each file to refine the comparison.
        p = _get_final_nt_paths([f1, f2])
        
        # The stat result is unreliable if the volume serial number (st_dev)
        # or file ID (st_ino) is 0.
        if 0 in (s1.st_dev, s1.st_ino):
            if None in p:
                return False
            return p[0] == p[1]

        # A volume shadow copy has the same volume serial number as the
        # base volume. In this case, the device names have to be compared.
        d = _get_device_names(p)
        if any('volumeshadowcopy' in n for n in d if n):
            return d[0] == d[1]

        return True


    def _get_final_nt_paths(files):
        result = []
        nt_normal = 0x2 # VOLUME_NAME_NT | FILE_NAME_NORMALIZED
        nt_opened = 0xA # VOLUME_NAME_NT | FILE_NAME_OPENED
        for f in files:
            p = None
            if f is not None:
                try:
                    p = _getfinalpathname(f, nt_normal)
                except OSError:
                    try:
                        p = _getfinalpathname(f, nt_opened)
                    except OSError:
                        pass
            result.append(p)
        return result


    def _get_device_names(paths):
        # Look for "\Device\{device name}[\]".
        result = []
        for p in paths:
            d = None
            if p is not None:
                q = p.split('\\', 3)
                if len(q) > 2 and q[1].lower() == 'device' and q[2]:
                    d = q[2].lower()
            result.append(d)
        return result


    def _getfinalpathname(p, flags=0):
        try:
            if isinstance(p, int):
                h = msvcrt.get_osfhandle(p)
            else:
                h = win32file.CreateFile(p, 0, 0, None, win32file.OPEN_EXISTING,
                        win32file.FILE_FLAG_BACKUP_SEMANTICS, None)
            return win32file.GetFinalPathNameByHandle(h, flags)
        except win32file.error as e:
            strerror = e.strerror.rstrip('\r\n .')
            raise OSError(0, strerror, p, e.winerror) from None
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90919
2022-02-17 04:52:54eryksunsetmessages: + msg413385
2022-02-16 12:48:19eryksunsetversions: + Python 3.10, Python 3.11
nosy: + eryksun

messages: + msg413331

stage: needs patch
2022-02-16 01:50:19nijavesetfiles: + shadow-repro.py

messages: + msg413309
2022-02-16 01:45:13nijavecreate