New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test.support.unlink issue on Windows platform #51692
Comments
On Windows there are tiny delay between call to os.unlink and real file test.support.unlink(filename)
f = open(filename, 'wb') Proposed solution: wait in support.unlink for end of deletion asking Also test.test_linecache:LineCacheTests.test_checkcache should be fixed
Both patches for trunk and py3k is attached. |
patch for python 2 is attached |
Problem can be reproduced with several run of test_bufio in both python |
I'm fairly skeptical that your analysis is right. Can you prove that a |
You right, problem is not in os.unlink (Windows call DeleteFile) itself. I have TortoiseSVN (very popular Explorer extension to work with So, sometimes I can see race condition:
I see this situation in sysinternals Process Monitor tool. |
Another report of something looking very similar on bpo-7712. |
Note: the fix in test_linecache.py is useless. The "with open(source_name, 'w') as source:" context manager is in charge of closing the file on __exit__. |
Andrew: There have been changes committed within the last week to bpo-7712, which Florent suggested might be related. Can you please re-test this to see if it still exists, and if it does, bug me and I'll try to get some more movement on this. |
It would be nice to see standalone test case that illustrates the problem. |
Florent, sorry. |
This is basically a rerun of this discussion a couple of years ago: http://mail.python.org/pipermail/python-dev/2008-April/078333.html The problem certainly still happens against trunk -- I have a semi-aggressive test-harness which can cause it to reproduce pretty much on-demand. I proposed an approach here: http://mail.python.org/pipermail/python-dev/2008-April/078339.html but when I started digging into test_support it all got a bit hairy because -- naturally -- test.support.unlink is used in a *lot* of places. In short, there's still a problem to be fixed. I believe that a rename-unlink dance would solve it, but only at the cost of affecting a lot of tests. |
It is unlikely that it will go further then discussion unless this bug can be reliably reproduced to be debugged. If not testcase when at least Process Monitor log would be helpful. |
In one window run the attached script (assumes you have pywin32 installed) with a parameter of the directory the TESTFN file will end up in. Then run, eg, test_zipfile in another window. For me: c:\temp> watch_dir.py C:\work_in_progress\make-snapshots\trunk\python\Lib C:\work_in_progress\make-snapshots\trunk\python\Lib> ..\pcbuild\python.exe -m test.test_zipfile Obviously, you'd have to change the path to be wherever you're running the test suite from. The watch_dir script sits there looking for file activity, then takes and releases a delete-share handle on the file. It's enough to disrupt certain tests (such as test_zipfile) pretty much every time. Other tests are affected less, or only the first few times. Not sure why, but it's certainly enough to reproduce the general effect of TortoiseSVN or indexer or virus checker. |
If the problem with the fix is that lots of tests use test_support.unlink, then I don't see why the rename dance can't be implemented in test_support.unlink. (Possibly conditioned on whether or not the tests are running on a windows platform.) Dealing with unlink problems is why that method exists in the first place. There are probably places in the test suite that *don't* use test_support.unlink, though, so fixing test_support.unlink will not necessarily fix all of the problems. We'll have to fix those other tests (probably by using the new test_support.unlink) as we find them. An actual patch will need a test that doesn't rely on win32file (ctypes would be OK). It may be necessary to rename to a unique filename, too. (To be clear, I think a unit test that reproduces the problem by doing an open with FILE_SHARE_DELETE is fine, we don't need a test that reproduces the race condition itself. The windows experts will correct me if I'm wrong :) I'm changing the stage to patch needed because it seems to me that using a technique like rename that doesn't introduce additional delays into the test suite is to be preferred. |
I put together a trivial patch against the 2.7 trunk (basically: I added At this point several things occur to me:
I'm willing to do the legwork of moving all the tests use, eg, support.unlink, grep "os\.unlink" *.py | wc -l
I'd like to hear opinions before I move forward with a non-trivial patch For the sake of completeness, I attach a tiny test case which shows that the |
When I was working on a routine to checkout/patch/build/test/cleanup Python (see https://svn.jaraco.com/jaraco/python/jaraco.develop, and particularly scripts/test-windows-symlink-patch.py), I ran into a similar problem during the cleanup step. I tried using shutil.rmtree to clean up the folder that was checked out, but I repeatedly got 'access denied' exceptions. I ended up working around the problem by using subprocess and cmd.exe's "rmdir /s /q". I think this demonstrates three facets to this problem:
|
|
I'm afraid that the problem doesn't lie in the unlink: DeleteFile Making os.unlink on Windows more robust may be a good http://bugs.python.org/file16869 TJG |
Then we shouldn't use DeleteFile in the first place to delete the file,
It certainly will help this case also. It would detect that the file is |
I see what you're getting at. I'm getting to the end of my day Would you agree that py3k is the only target branch worth aiming for? |
Most certainly, yes. |
I closed bpo-9627 as a duplicate of this - the buildbot failure referenced there was most likely due to something else holding open a temporary file that the test suite thought was closed. |
As I can see cygwin's unlink_nt uses Nt* functions family (NtSetInformationFile etc) which are part of Windows DDK. Do you like to use them or prefer SDK ones (say SetFileInformationByHandle)? In second case python unlink can borrow deletion schema from cygwin for modern Windows versions (Vista+) and return to legacy trivial DeleteFile call if OS is WinXP. |
FWIW, Mercurial uses the following dance: (Mercurial is under the GPL, so we can't copy that code verbatim; but it can serve as an inspiration) |
For clarity, while making unlink more robust is no bad thing, the error occurs when the unlink *succeeds* but a subsequent create of the same name fails. This happens when an indexer, Virus scanner or TortoiseSvn etc. has opened the file with SHARE_DELETE. This allows a DeleteFile to succeed but continues to hold an open handle on the file. A following test which uses an identically named file (such as the global TESTFN) will fail if not enough time has elapsed for the background process to release its handle. A good candidate to see this in action is the test for tarfile. I did start to undertake a conversion of TESTFN to a named temporary, but it started to sprawl all over the place and came up against a number of corner cases (eg where tests deliberately wanted two filenames to be the same) so I gave up. |
How about renaming to a unique random name just before the unlink(), as |
Well http://bugs.python.org/issue7443#msg102833 outlines the problems I encountered while trying to do essentially that. Nothing insurmountable, but definitely bigger than simply adding one line of code. Looks to me like there are two avenues of approach (and, given the chatter on this issue, several people willing to address them):
Opinions? I'm willing to do either. |
Well, since I'm not a Windows expert, I can only give an intuitive (as an aside, for higher-level variants of OS functions, the shutil may |
I vote for fixing the test package. File system "extensions" may track and record this activity. To use DropBox as an example, doing the rename and delete will cause the renamed and deleted file to be recorded. Just my opinion, but the code path to delete a file should be as short as possible. Making lots of other OS calls just doesn't seem right. I understand the wish to have a reliable unlink call but I'd be uncomfortable with a workaround that may be visible around the edges. |
I'll throw in my 2 cents as to a possible way forward:
|
FYI: implementation of unlink already has multiple calls for unicode version to process symlinks. Ansi version is the simple DeleteFileA call. Now I'm working on unlink implementations partially borrowed from cygwin as Martin suggested. I still not have a final patch covering all known cases. Perhaps I can introduce it next week. |
See bpo-15496 for an alternative approach to solving this problem, at least in the test suite - as noted in that issue, the rename dance isn't sufficient when the problem gets triggered by a different sequence like: unlink(file_in_parent_dir)
unlink(parent_dir) # This fails if file is still locked |
This has been covered off by work done with the test.support package including context managers for temporary files / directories, and a waitfor mechanism which waits some time if a file can't be accessed. |
The support package is fixed (I presume :), but there was also a desire expressed to expose this functionality in shutil, since it can arise in "normal" Python programs as well as in the test suite. Given that the fix has been successful in the test suite, shouldn't we open an issue to expose it in shutil? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: