classification
Title: shutil.copyfile throws incorrect SameFileError on Google Drive File Stream
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Deniz Bozyigit, eryksun, giampaolo.rodola, paul.moore, r.david.murray, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2018-06-22 00:38 by Deniz Bozyigit, last changed 2018-07-11 07:55 by serhiy.storchaka.

Messages (6)
msg320199 - (view) Author: Deniz Bozyigit (Deniz Bozyigit) Date: 2018-06-22 00:38
When using shutil.copyfile on the Google Drive File Stream file system, a incorrect SameFileError can occur. 

MWE (assuming foo.txt exists in your google drive G:\\):
>>> f1 = 'G:\\My Drive\\foo.txt'
>>> f2 = 'G:\\My Drive\\foo2.txt'
>>> import shutil
>>> shutil.copyfile(f1, f2)
>>> shutil.copyfile(f1, f2)

--> Last line throws incorrect SameFileError. In comparison, executing the same code on a different file system (e.g. local hard drive) will result in no errors.

More details described here: https://github.com/jupyter/notebook/issues/3615

The error originates in the library in generalpath.py in the function samestat: Google Drive File Stream reports inode==0 which makes os.path.samefile(f1, f2) == True for any files f1 and f2 on Google File Stream.

I propose the following patch, which currently works for me:

--- genericpath.py      2018-06-22 02:14:27.145744900 +0200
+++ genericpath_new.py  2018-06-22 02:10:44.485961100 +0200
@@ -86,8 +86,11 @@
 # describing the same file?
 def samestat(s1, s2):
     """Test whether two stat buffers reference the same file"""
-    return (s1.st_ino == s2.st_ino and
-            s1.st_dev == s2.st_dev)
+    return (s1.st_ino != 0 and
+                       s2.st_ino != 0 and
+                       s1.st_ino == s2.st_ino and
+            s1.st_dev == s2.st_dev)
+


 # Are two filenames really pointing to the same file?
msg320201 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-22 02:00
That patch would cause references to the same file on a google drive to report that the files were different.

I'd say this is a bug in Google Drive's posix emulation.

I'm not sure there's a good answer here, because even if every other element of the stat were equal, that wouldn't mean it was the same file: if you use copystat as well as copyfile the stats would otherwise be equal.

I think we should close this as a third party bug.
msg320202 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-06-22 02:07
David, this is a bug in Python, and the proposed patch is insufficient. We use the volume serial number as st_dev, and this is not guaranteed to be unique in Windows, and may be 0, just as a file's index number is also allowed to be 0. Both possibilities are well documented. With a WebDAV volume you will find that both numbers are 0. Using the VSN and file index as if they're the same as POSIX st_dev and st_ino is technically wrong. There is no guarantee that this tuple uniquely identifies a file in Windows.
msg320204 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-22 02:25
OK.  Finding a solution for this (other than never raising samefile on such systems and documenting it as a limitation) may be difficult.
msg320210 - (view) Author: Deniz Bozyigit (Deniz Bozyigit) Date: 2018-06-22 05:30
Hi, thank you for looking into this. I'm aware that the shown patch is not the ideal solution and a mere fix to get my jupyter operational. 

An indication on a workable solution could be the _samefile function in shutil that wraps os.path.samefile:

def _samefile(src, dst):
    # Macintosh, Unix.
    if hasattr(os.path, 'samefile'):
        try:
            return os.path.samefile(src, dst)
        except OSError:
            return False

    # All other platforms: check for same pathname.
    return (os.path.normcase(os.path.abspath(src)) ==
            os.path.normcase(os.path.abspath(dst)))


I understand that the implicit platform differentiation that is done here (see the comment line) is not valid anymore since os.path.samefile is now available on windows systems. It seems that for a windows system the here implemented file name comparison could be workable (even moving it into os.path.samefile?), if the platform is identified correctly.
msg320243 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-06-22 16:03
For Windows it would be best (though slower) to pass the paths through os._getfinalpathname before comparison. Detecting that function is an easy way to get the platform right, too.

Unfortunately, the MSDN docs don't make clear that the VSN can be modified, and even goes as far as saying you can compare the two values Eryk has pointed out that you shouldn't :(

The problem is that the values on Windows are coming directly from the filesystem, and apparently there's no requirement that they actually be provided by the filesystem...
History
Date User Action Args
2018-07-11 07:55:46serhiy.storchakasettype: crash -> behavior
2018-06-25 22:24:19giampaolo.rodolasetnosy: + giampaolo.rodola
2018-06-22 16:03:30steve.dowersetmessages: + msg320243
2018-06-22 05:30:49Deniz Bozyigitsetmessages: + msg320210
2018-06-22 02:25:34r.david.murraysetmessages: + msg320204
2018-06-22 02:07:35eryksunsetnosy: + eryksun
messages: + msg320202
2018-06-22 02:00:36r.david.murraysetnosy: + r.david.murray
messages: + msg320201
2018-06-22 00:39:10Deniz Bozyigitsettitle: shutil.copyfile throws incorrect SameFileError -> shutil.copyfile throws incorrect SameFileError on Google Drive File Stream
2018-06-22 00:38:26Deniz Bozyigitcreate