classification
Title: test.support.unlink issue on Windows platform
Type: behavior Stage: needs patch
Components: Tests, Windows Versions: Python 3.3
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: tim.golden Nosy List: asvetlov, brian.curtin, chris.jerdonek, flox, jafo, jaraco, jdigital, loewis, nadeem.vawda, ncoghlan, pitrou, r.david.murray, techtonik, tim.golden
Priority: high Keywords: needs review, patch

Created on 2009-12-05 15:20 by asvetlov, last changed 2013-07-31 15:26 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
python3.patch asvetlov, 2009-12-05 15:20 patch for py3k review
python2.patch asvetlov, 2009-12-05 15:21 patch for trunk review
watch_dir.py tim.golden, 2010-04-09 08:34
test-case.py tim.golden, 2010-04-11 14:11
Messages (34)
msg95991 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-12-05 15:20
On Windows there are tiny delay between call to os.unlink and real file 
removing. Periodically it leads to unittest crashes in cases like this:

test.support.unlink(filename)
f = open(filename, 'wb')

Proposed solution: wait in support.unlink for end of deletion asking 
os.stat for removed file (only if os.name == 'nt', of course).

Also test.test_linecache:LineCacheTests.test_checkcache should be fixed 
- this one miss to close last opened file and Windows cannot remove it.

Both patches for trunk and py3k is attached.
msg95992 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-12-05 15:21
patch for python 2 is attached
msg95993 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-12-05 15:32
Problem can be reproduced with several run of test_bufio in both python 
versions. trunk contains more tests (+1 test case) so it's easer to catch 
'access denied' exception on it.
msg96004 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-05 20:12
I'm fairly skeptical that your analysis is right. Can you prove that a
file still exists after unlink returns?
msg96012 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-12-05 23:09
You right, problem is not in os.unlink (Windows call DeleteFile) itself.

I have TortoiseSVN (very popular Explorer extension to work with 
Subversion) installed. 
This tool run TSVNCache.exe process to update own data in background.
TSVNCache.exe receive NotifyChangeDirectory events for svn checkout 
folders. 
support.TESTFN is tempfile name in current working directory. If I try 
to run unittests from some folder from python checkout TSVNCache.exe get 
change notify and analyze it.

So, sometimes I can see race condition:
- python: os.unlink(file)
- TSVNCache.exe: get change event
- TSVNCache.exe: query changes
- python: open(file) - oops, TSVNCache.exe still processing notification 
and lock deleted file by holding opened handles to it.
According to MSDN for DeleteFile function:
The DeleteFile function marks a file for deletion on close. Therefore, 
the file deletion does not occur until the last handle to the file is 
closed. Subsequent calls to CreateFile to open the file fail with 
ERROR_ACCESS_DENIED.
We get exactly same.
- TSVNCache.exe finish of change update and release handle. File is 
actually deleted.

I see this situation in sysinternals Process Monitor tool.
Probability of race condition is tiny but non-zero. Intensive 
create/drop/create again sequences can catch this one.
msg99089 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 07:15
Another report of something looking very similar on #7712.
msg99093 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 08:05
Note: the fix in test_linecache.py is useless.

The "with open(source_name, 'w') as source:" context manager is in charge of closing the file on __exit__.
msg101355 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2010-03-20 02:19
Andrew: There have been changes committed within the last week to #7712, which Florent suggested might be related.  Can you please re-test this to see if it still exists, and if it does, bug me and I'll try to get some more movement on this.
msg102049 - (view) Author: anatoly techtonik (techtonik) Date: 2010-04-01 00:28
It would be nice to see standalone test case that illustrates the problem.
msg102137 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2010-04-01 22:56
Florent, sorry. 
I have no Windows workstation now, so I cannot follow this issue.
msg102685 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2010-04-09 06:40
This is basically a rerun of this discussion a couple of years ago:

  http://mail.python.org/pipermail/python-dev/2008-April/078333.html

The problem certainly still happens against trunk -- I have a semi-aggressive test-harness which can cause it to reproduce pretty much on-demand. I proposed an approach here:

  http://mail.python.org/pipermail/python-dev/2008-April/078339.html

but when I started digging into test_support it all got a bit hairy because -- naturally -- test.support.unlink is used in a *lot* of places. In short, there's still a problem to be fixed. I believe that a rename-unlink dance would solve it, but only at the cost of affecting a lot of tests.
msg102696 - (view) Author: anatoly techtonik (techtonik) Date: 2010-04-09 08:11
It is unlikely that it will go further then discussion unless this bug can be reliably reproduced to be debugged. If not testcase when at least Process Monitor log would be helpful.
msg102698 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2010-04-09 08:34
In one window run the attached script (assumes you have pywin32 installed) with a parameter of the directory the TESTFN file will end up in. Then run, eg, test_zipfile in another window. For me:

c:\temp> watch_dir.py C:\work_in_progress\make-snapshots\trunk\python\Lib

C:\work_in_progress\make-snapshots\trunk\python\Lib> ..\pcbuild\python.exe -m test.test_zipfile

Obviously, you'd have to change the path to be wherever you're running the test suite from.

The watch_dir script sits there looking for file activity, then takes and releases a delete-share handle on the file. It's enough to disrupt certain tests (such as test_zipfile) pretty much every time. Other tests are affected less, or only the first few times. Not sure why, but it's certainly enough to reproduce the general effect of TortoiseSVN or indexer or virus checker.
msg102720 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-09 13:05
If the problem with the fix is that lots of tests use test_support.unlink, then I don't see why the rename dance can't be implemented in test_support.unlink.  (Possibly conditioned on whether or not the tests are running on a windows platform.)  Dealing with unlink problems is why that method exists in the first place.

There are probably places in the test suite that *don't* use test_support.unlink, though, so fixing test_support.unlink will not necessarily fix all of the problems. We'll have to fix those other tests (probably by using the new test_support.unlink) as we find them.

An actual patch will need a test that doesn't rely on win32file (ctypes would be OK).  It may be necessary to rename to a unique filename, too. (To be clear, I think a unit test that reproduces the problem by doing an open with FILE_SHARE_DELETE is fine, we don't need a test that reproduces the race condition itself.  The windows experts will correct me if I'm wrong :)

I'm changing the stage to patch needed because it seems to me that using a technique like rename that doesn't introduce additional delays into the test suite is to be preferred.
msg102833 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2010-04-11 14:11
I put together a trivial patch against the 2.7 trunk (basically: I added
a os.rename before the os.remove in test_support.unlink) and reran my
test harness with test_zipfile... and it still failed because, of course,
test_zipfile calls shutil.rmtree which bypasses the test_support.unlink
altogether etc. etc.

At this point several things occur to me:

1) There's little point in targetting the 2.x tree since 2.7 is due
out any time now and is effectively end-of-line for 2.x and this
isn't a release-blocker. Therefore whatever effort is brought to bear
should be against the 3.x latest

2) This is a repeatable but relatively minority case: it only applies
to Windows boxes and only to those where some behind-the-scenes process
is holding this kind of lock on files for long enough to affect the
tests. I'd certainly like to nail it but...

3) The amount of work -- and intrusion in the tests -- is quite substantial.
Just looking [*] for straight os.unlink instances, without even considering
shutil use gives 71 instances; os.remove gives another 90. That's even
without the issues of the tests for shutil and os in any case.

I'm willing to do the legwork of moving all the tests use, eg, support.unlink,
support.rmtree and so on, but quis custodiet? who'll test the tests?

grep "os\.unlink" *.py | wc -l
grep "os\.remove" *.py | wc -l

4) All that said, the result should be a cleaner, more controllable test environment,
at least for temp files. Another solution would be to rework the use of
TESTFN on Windows to use a new temporary file every time. But that would be as much
work and more than the unlink / rmtree work above.

I'd like to hear opinions before I move forward with a non-trivial patch
for this.

For the sake of completeness, I attach a tiny test case which shows that the
rename/remove approach should in fact work for the symptom we're seeing.
msg102837 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2010-04-11 14:24
When I was working on a routine to checkout/patch/build/test/cleanup Python (see https://svn.jaraco.com/jaraco/python/jaraco.develop, and particularly scripts/test-windows-symlink-patch.py), I ran into a similar problem during the cleanup step. I tried using shutil.rmtree to clean up the folder that was checked out, but I repeatedly got 'access denied' exceptions.

I ended up working around the problem by using subprocess and cmd.exe's "rmdir /s /q".

I think this demonstrates three facets to this problem:

1) It doesn't just affect the test suite. It happens to other Python programs that are using shutil.rmtree (and possibly remove/unlink) to remove files that are in use.

2) It doesn't have to be that way. At the very least, there could (should?) be a function in Python that replicates 'rmdir /s /q', which is not subject to the 'access denied' error.

3) We could use subprocess and cmd.exe to perform the unlink and rmtree operations in the test suite to bypass the transient failures until Python supports the behavior natively.
msg102860 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-11 17:57
1. I agree that we should fix the unlinking problem on Windows. I also think that such a fix should be independent of the test suite - many people run into failed unlink problems.

2. Tim already said it, but I repeat: the common theory is that the culprit for this kind of problem is software like virus checkers, desktop search spiders, Tortoise, ...

3. I'm not convinced that "rmdir/s/q" *really* solves the problem reliably. Because it's a timing issue, it may be that the additional startup cost to invoke rmdir was enough to let the virus scanner win the race, so that rmdir actually had no problems with removing the file.

4. I believe the official party line for removing files on Windows is this: "If DeleteFile fails, move the file to the trash bin (of the disk), and use NtSetInformationFile to set the delete disposition for the file". See cygwin's unlink_nt for an elaborate implementation of unlinking:

http://tinyurl.com/y7w6rrj
msg102863 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2010-04-11 18:38
I'm afraid that the problem doesn't lie in the unlink: DeleteFile
succeeds. The problem is that the file is only marked for delete
until such time as the last SHARE_DELETE handle on it is closed.
Until that time, an attempt to (re)create the file for anything
other than SHARE_DELETE will fail. As you say, it's a timing
issue.

Making os.unlink on Windows more robust may be a good
idea, but it's not going to help this issue. See my test-case.py
on an earlier message for reproduction:

http://bugs.python.org/file16869

TJG
msg102869 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-11 19:12
> I'm afraid that the problem doesn't lie in the unlink: DeleteFile
> succeeds. The problem is that the file is only marked for delete
> until such time as the last SHARE_DELETE handle on it is closed.

Then we shouldn't use DeleteFile in the first place to delete the file,
but instead CreateFile, with DELETE access (and FILE_SHARE_DELETE
sharing). If that fails, we need to move the file to the bin
(see unlink_nt for details).

> Making os.unlink on Windows more robust may be a good
> idea, but it's not going to help this issue. See my test-case.py
> on an earlier message for reproduction:

It certainly will help this case also. It would detect that the file is
still open, and move it into the trash bin.
msg102873 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2010-04-11 19:31
> Then we shouldn't use DeleteFile in the first place to delete the file,
> but instead CreateFile, with DELETE access (and FILE_SHARE_DELETE
> sharing). If that fails, we need to move the file to the bin
> (see unlink_nt for details).

I see what you're getting at. I'm getting to the end of my day
here, but I'll try to put a patch together for posixmodule.c
when I can, if no-one else gets there first.

Would you agree that py3k is the only target branch worth aiming for?
msg102880 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-11 20:06
> Would you agree that py3k is the only target branch worth aiming for?

Most certainly, yes.
msg114124 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-17 13:39
I closed issue 9627 as a duplicate of this - the buildbot failure referenced there was most likely due to something else holding open a temporary file that the test suite thought was closed.
msg132203 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2011-03-26 01:25
As I can see cygwin's unlink_nt uses Nt* functions family (NtSetInformationFile etc) which are part of Windows DDK.

Do you like to use them or prefer SDK ones (say SetFileInformationByHandle)?

In second case python unlink can borrow deletion schema from cygwin for modern Windows versions (Vista+) and return to legacy trivial DeleteFile call if OS is WinXP.
msg132326 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-27 14:14
FWIW, Mercurial uses the following dance:
http://selenic.com/repo/hg/file/463aca32a937/mercurial/windows.py#l296

(Mercurial is under the GPL, so we can't copy that code verbatim; but it can serve as an inspiration)
msg132330 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2011-03-27 14:23
For clarity, while making unlink more robust is no bad thing, the error occurs when the unlink *succeeds* but a subsequent create of the same name fails. This happens when an indexer, Virus scanner or TortoiseSvn etc. has opened the file with SHARE_DELETE. This allows a DeleteFile to succeed but continues to hold an open handle on the file. A following test which uses an identically named file (such as the global TESTFN) will fail if not enough time has elapsed for the background process to release its handle. A good candidate to see this in action is the test for tarfile.

I did start to undertake a conversion of TESTFN to a named temporary, but it started to sprawl all over the place and came up against a number of corner cases (eg where tests deliberately wanted two filenames to be the same) so I gave up.
msg132333 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-27 14:33
> I did start to undertake a conversion of TESTFN to a named temporary,
> but it started to sprawl all over the place and came up against a
> number of corner cases (eg where tests deliberately wanted two
> filenames to be the same) so I gave up.

How about renaming to a unique random name just before the unlink(), as
Mercurial does?
Would it alleviate the problem or am I missing something?
msg132352 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2011-03-27 17:16
Well http://bugs.python.org/issue7443#msg102833 outlines the problems I encountered while trying to do essentially that. Nothing insurmountable, but definitely bigger than simply adding one line of code.

Looks to me like there are two avenues of approach (and, given the chatter on this issue, several people willing to address them):

* Patch Py_DeleteFileW in posixmodule.c so that it renames before deleting: should solve the problem overall but obviously has a possible wider impact, in general and on performance in particular. This rename might be a simple rename-to-guid or something more sophisticated such as the rename-to-recycler which cygwin uses.

* Patch support.unlink in the test package to do the rename dance on the basis that it'll fix at least some of the problems with less impact overall.

Opinions? I'm willing to do either.
msg132589 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-30 13:49
> * Patch Py_DeleteFileW in posixmodule.c so that it renames before
> deleting: should solve the problem overall but obviously has a
> possible wider impact, in general and on performance in particular.
> This rename might be a simple rename-to-guid or something more
> sophisticated such as the rename-to-recycler which cygwin uses.
> 
> * Patch support.unlink in the test package to do the rename dance on
> the basis that it'll fix at least some of the problems with less
> impact overall.
> 
> Opinions? I'm willing to do either.

Well, since I'm not a Windows expert, I can only give an intuitive
opinion. I think that we should start with patching support.unlink();
tweaking Py_DeleteFile() so as to do something more sophisticated than a
simple removal sounds like it could hide some behaviour change that
could hit some legitimate uses.

(as an aside, for higher-level variants of OS functions, the shutil may
be an appropriate recipient)
msg132766 - (view) Author: Jeff Dean (jdigital) Date: 2011-04-01 21:08
> * Patch Py_DeleteFileW in posixmodule.c so that it renames before
> deleting: should solve the problem overall but obviously has a
> possible wider impact, in general and on performance in particular.
> This rename might be a simple rename-to-guid or something more
> sophisticated such as the rename-to-recycler which cygwin uses.
> 
> * Patch support.unlink in the test package to do the rename dance on
> the basis that it'll fix at least some of the problems with less
> impact overall.
> 
> Opinions? I'm willing to do either.

I vote for fixing the test package.

File system "extensions" may track and record this activity.  To use DropBox as an example, doing the rename and delete will cause the renamed and deleted file to be recorded.

Just my opinion, but the code path to delete a file should be as short as possible.  Making lots of other OS calls just doesn't seem right.

I understand the wish to have a reliable unlink call but I'd be uncomfortable with a workaround that may be visible around the edges.
msg132787 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-02 04:33
I'll throw in my 2 cents as to a possible way forward:

- fix test.support.unlink in all current maintenance branches (2.7, 3.1, 3.2, default)
- expose the feature to users as "shutil.rmfile" for 3.3
msg132793 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2011-04-02 09:57
FYI: implementation of unlink already has multiple calls for unicode version to process symlinks. Ansi version is the simple DeleteFileA call.
Now we have different behavior for unicode and ansi variants as I can see.

Now I'm working on unlink implementations partially borrowed from cygwin as Martin suggested. I still not have a final patch covering all known cases. Perhaps I can introduce it next week.
msg167191 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-02 05:11
See #15496 for an alternative approach to solving this problem, at least in the test suite - as noted in that issue, the rename dance isn't sufficient when the problem gets triggered by a different sequence like:

unlink(file_in_parent_dir)
unlink(parent_dir)  # This fails if file is still locked
msg194002 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2013-07-31 15:12
This has been covered off by work done with the test.support package including context managers for temporary files / directories, and a waitfor mechanism which waits some time if a file can't be accessed.
msg194003 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-31 15:26
The support package is fixed (I presume :), but there was also a desire expressed to expose this functionality in shutil, since it can arise in "normal" Python programs as well as in the test suite.  Given that the fix has been successful in the test suite, shouldn't we open an issue to expose it in shutil?
History
Date User Action Args
2013-07-31 15:26:00r.david.murraysetmessages: + msg194003
2013-07-31 15:12:19tim.goldensetstatus: open -> closed
resolution: works for me
messages: + msg194002
2012-09-15 16:38:31brian.curtinlinkissue15946 superseder
2012-08-02 05:11:48ncoghlansetmessages: + msg167191
2012-08-02 03:04:08chris.jerdoneksetnosy: + chris.jerdonek
2011-05-16 22:22:58nadeem.vawdasetnosy: + nadeem.vawda
2011-04-02 09:57:49asvetlovsetmessages: + msg132793
2011-04-02 04:33:01ncoghlansetmessages: + msg132787
2011-04-01 21:08:51jdigitalsetnosy: + jdigital
messages: + msg132766
2011-03-30 13:49:40pitrousetmessages: + msg132589
2011-03-27 17:16:45tim.goldensetmessages: + msg132352
2011-03-27 14:33:49pitrousetmessages: + msg132333
2011-03-27 14:23:37tim.goldensetmessages: + msg132330
2011-03-27 14:14:40pitrousetnosy: + pitrou
messages: + msg132326
2011-03-26 01:40:54asvetlovsetversions: + Python 3.3, - Python 2.7, Python 3.2
2011-03-26 01:25:26asvetlovsetmessages: + msg132203
2010-08-17 13:39:20ncoghlansetnosy: + ncoghlan
messages: + msg114124
2010-08-17 13:37:41ncoghlanlinkissue9627 superseder
2010-08-06 15:02:45tim.goldensetassignee: tim.golden
2010-04-11 20:06:54loewissetmessages: + msg102880
2010-04-11 19:31:56tim.goldensetmessages: + msg102873
2010-04-11 19:12:23loewissetmessages: + msg102869
2010-04-11 18:38:07tim.goldensetmessages: + msg102863
2010-04-11 17:57:55loewissetmessages: + msg102860
2010-04-11 14:24:50jaracosetmessages: + msg102837
2010-04-11 14:12:01tim.goldensetfiles: + test-case.py

messages: + msg102833
2010-04-09 13:05:32r.david.murraysetnosy: + r.david.murray

messages: + msg102720
stage: patch review -> needs patch
2010-04-09 08:34:47tim.goldensetfiles: + watch_dir.py

messages: + msg102698
2010-04-09 08:11:34techtoniksetmessages: + msg102696
2010-04-09 06:40:57tim.goldensetnosy: + tim.golden
messages: + msg102685
2010-04-09 01:13:20jaracosetnosy: + jaraco
2010-04-01 22:56:46asvetlovsetmessages: + msg102137
2010-04-01 00:28:06techtoniksetnosy: + techtonik
messages: + msg102049
2010-03-20 02:19:52jafosetnosy: + jafo
messages: + msg101355
2010-02-09 08:05:50floxsetmessages: + msg99093
2010-02-09 07:15:54floxsetpriority: low -> high
nosy: + brian.curtin, flox
messages: + msg99089

2009-12-05 23:09:04asvetlovsetmessages: + msg96012
2009-12-05 20:12:58loewissetnosy: + loewis
messages: + msg96004
2009-12-05 19:56:57brett.cannonsetpriority: low
keywords: + needs review
stage: patch review
2009-12-05 15:32:44asvetlovsetmessages: + msg95993
2009-12-05 15:21:23asvetlovsetfiles: + python2.patch

messages: + msg95992
2009-12-05 15:20:02asvetlovcreate