Issue7443
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-12-05 15:20 by asvetlov, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
python3.patch | asvetlov, 2009-12-05 15:20 | patch for py3k | review | |
python2.patch | asvetlov, 2009-12-05 15:21 | patch for trunk | review | |
watch_dir.py | tim.golden, 2010-04-09 08:34 | |||
test-case.py | tim.golden, 2010-04-11 14:11 |
Messages (34) | |||
---|---|---|---|
msg95991 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2009-12-05 15:20 | |
On Windows there are tiny delay between call to os.unlink and real file removing. Periodically it leads to unittest crashes in cases like this: test.support.unlink(filename) f = open(filename, 'wb') Proposed solution: wait in support.unlink for end of deletion asking os.stat for removed file (only if os.name == 'nt', of course). Also test.test_linecache:LineCacheTests.test_checkcache should be fixed - this one miss to close last opened file and Windows cannot remove it. Both patches for trunk and py3k is attached. |
|||
msg95992 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2009-12-05 15:21 | |
patch for python 2 is attached |
|||
msg95993 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2009-12-05 15:32 | |
Problem can be reproduced with several run of test_bufio in both python versions. trunk contains more tests (+1 test case) so it's easer to catch 'access denied' exception on it. |
|||
msg96004 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-12-05 20:12 | |
I'm fairly skeptical that your analysis is right. Can you prove that a file still exists after unlink returns? |
|||
msg96012 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2009-12-05 23:09 | |
You right, problem is not in os.unlink (Windows call DeleteFile) itself. I have TortoiseSVN (very popular Explorer extension to work with Subversion) installed. This tool run TSVNCache.exe process to update own data in background. TSVNCache.exe receive NotifyChangeDirectory events for svn checkout folders. support.TESTFN is tempfile name in current working directory. If I try to run unittests from some folder from python checkout TSVNCache.exe get change notify and analyze it. So, sometimes I can see race condition: - python: os.unlink(file) - TSVNCache.exe: get change event - TSVNCache.exe: query changes - python: open(file) - oops, TSVNCache.exe still processing notification and lock deleted file by holding opened handles to it. According to MSDN for DeleteFile function: The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED. We get exactly same. - TSVNCache.exe finish of change update and release handle. File is actually deleted. I see this situation in sysinternals Process Monitor tool. Probability of race condition is tiny but non-zero. Intensive create/drop/create again sequences can catch this one. |
|||
msg99089 - (view) | Author: Florent Xicluna (flox) * | Date: 2010-02-09 07:15 | |
Another report of something looking very similar on #7712. |
|||
msg99093 - (view) | Author: Florent Xicluna (flox) * | Date: 2010-02-09 08:05 | |
Note: the fix in test_linecache.py is useless. The "with open(source_name, 'w') as source:" context manager is in charge of closing the file on __exit__. |
|||
msg101355 - (view) | Author: Sean Reifschneider (jafo) * | Date: 2010-03-20 02:19 | |
Andrew: There have been changes committed within the last week to #7712, which Florent suggested might be related. Can you please re-test this to see if it still exists, and if it does, bug me and I'll try to get some more movement on this. |
|||
msg102049 - (view) | Author: anatoly techtonik (techtonik) | Date: 2010-04-01 00:28 | |
It would be nice to see standalone test case that illustrates the problem. |
|||
msg102137 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2010-04-01 22:56 | |
Florent, sorry. I have no Windows workstation now, so I cannot follow this issue. |
|||
msg102685 - (view) | Author: Tim Golden (tim.golden) * | Date: 2010-04-09 06:40 | |
This is basically a rerun of this discussion a couple of years ago: http://mail.python.org/pipermail/python-dev/2008-April/078333.html The problem certainly still happens against trunk -- I have a semi-aggressive test-harness which can cause it to reproduce pretty much on-demand. I proposed an approach here: http://mail.python.org/pipermail/python-dev/2008-April/078339.html but when I started digging into test_support it all got a bit hairy because -- naturally -- test.support.unlink is used in a *lot* of places. In short, there's still a problem to be fixed. I believe that a rename-unlink dance would solve it, but only at the cost of affecting a lot of tests. |
|||
msg102696 - (view) | Author: anatoly techtonik (techtonik) | Date: 2010-04-09 08:11 | |
It is unlikely that it will go further then discussion unless this bug can be reliably reproduced to be debugged. If not testcase when at least Process Monitor log would be helpful. |
|||
msg102698 - (view) | Author: Tim Golden (tim.golden) * | Date: 2010-04-09 08:34 | |
In one window run the attached script (assumes you have pywin32 installed) with a parameter of the directory the TESTFN file will end up in. Then run, eg, test_zipfile in another window. For me: c:\temp> watch_dir.py C:\work_in_progress\make-snapshots\trunk\python\Lib C:\work_in_progress\make-snapshots\trunk\python\Lib> ..\pcbuild\python.exe -m test.test_zipfile Obviously, you'd have to change the path to be wherever you're running the test suite from. The watch_dir script sits there looking for file activity, then takes and releases a delete-share handle on the file. It's enough to disrupt certain tests (such as test_zipfile) pretty much every time. Other tests are affected less, or only the first few times. Not sure why, but it's certainly enough to reproduce the general effect of TortoiseSVN or indexer or virus checker. |
|||
msg102720 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2010-04-09 13:05 | |
If the problem with the fix is that lots of tests use test_support.unlink, then I don't see why the rename dance can't be implemented in test_support.unlink. (Possibly conditioned on whether or not the tests are running on a windows platform.) Dealing with unlink problems is why that method exists in the first place. There are probably places in the test suite that *don't* use test_support.unlink, though, so fixing test_support.unlink will not necessarily fix all of the problems. We'll have to fix those other tests (probably by using the new test_support.unlink) as we find them. An actual patch will need a test that doesn't rely on win32file (ctypes would be OK). It may be necessary to rename to a unique filename, too. (To be clear, I think a unit test that reproduces the problem by doing an open with FILE_SHARE_DELETE is fine, we don't need a test that reproduces the race condition itself. The windows experts will correct me if I'm wrong :) I'm changing the stage to patch needed because it seems to me that using a technique like rename that doesn't introduce additional delays into the test suite is to be preferred. |
|||
msg102833 - (view) | Author: Tim Golden (tim.golden) * | Date: 2010-04-11 14:11 | |
I put together a trivial patch against the 2.7 trunk (basically: I added a os.rename before the os.remove in test_support.unlink) and reran my test harness with test_zipfile... and it still failed because, of course, test_zipfile calls shutil.rmtree which bypasses the test_support.unlink altogether etc. etc. At this point several things occur to me: 1) There's little point in targetting the 2.x tree since 2.7 is due out any time now and is effectively end-of-line for 2.x and this isn't a release-blocker. Therefore whatever effort is brought to bear should be against the 3.x latest 2) This is a repeatable but relatively minority case: it only applies to Windows boxes and only to those where some behind-the-scenes process is holding this kind of lock on files for long enough to affect the tests. I'd certainly like to nail it but... 3) The amount of work -- and intrusion in the tests -- is quite substantial. Just looking [*] for straight os.unlink instances, without even considering shutil use gives 71 instances; os.remove gives another 90. That's even without the issues of the tests for shutil and os in any case. I'm willing to do the legwork of moving all the tests use, eg, support.unlink, support.rmtree and so on, but quis custodiet? who'll test the tests? grep "os\.unlink" *.py | wc -l grep "os\.remove" *.py | wc -l 4) All that said, the result should be a cleaner, more controllable test environment, at least for temp files. Another solution would be to rework the use of TESTFN on Windows to use a new temporary file every time. But that would be as much work and more than the unlink / rmtree work above. I'd like to hear opinions before I move forward with a non-trivial patch for this. For the sake of completeness, I attach a tiny test case which shows that the rename/remove approach should in fact work for the symptom we're seeing. |
|||
msg102837 - (view) | Author: Jason R. Coombs (jaraco) * | Date: 2010-04-11 14:24 | |
When I was working on a routine to checkout/patch/build/test/cleanup Python (see https://svn.jaraco.com/jaraco/python/jaraco.develop, and particularly scripts/test-windows-symlink-patch.py), I ran into a similar problem during the cleanup step. I tried using shutil.rmtree to clean up the folder that was checked out, but I repeatedly got 'access denied' exceptions. I ended up working around the problem by using subprocess and cmd.exe's "rmdir /s /q". I think this demonstrates three facets to this problem: 1) It doesn't just affect the test suite. It happens to other Python programs that are using shutil.rmtree (and possibly remove/unlink) to remove files that are in use. 2) It doesn't have to be that way. At the very least, there could (should?) be a function in Python that replicates 'rmdir /s /q', which is not subject to the 'access denied' error. 3) We could use subprocess and cmd.exe to perform the unlink and rmtree operations in the test suite to bypass the transient failures until Python supports the behavior natively. |
|||
msg102860 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2010-04-11 17:57 | |
1. I agree that we should fix the unlinking problem on Windows. I also think that such a fix should be independent of the test suite - many people run into failed unlink problems. 2. Tim already said it, but I repeat: the common theory is that the culprit for this kind of problem is software like virus checkers, desktop search spiders, Tortoise, ... 3. I'm not convinced that "rmdir/s/q" *really* solves the problem reliably. Because it's a timing issue, it may be that the additional startup cost to invoke rmdir was enough to let the virus scanner win the race, so that rmdir actually had no problems with removing the file. 4. I believe the official party line for removing files on Windows is this: "If DeleteFile fails, move the file to the trash bin (of the disk), and use NtSetInformationFile to set the delete disposition for the file". See cygwin's unlink_nt for an elaborate implementation of unlinking: http://tinyurl.com/y7w6rrj |
|||
msg102863 - (view) | Author: Tim Golden (tim.golden) * | Date: 2010-04-11 18:38 | |
I'm afraid that the problem doesn't lie in the unlink: DeleteFile succeeds. The problem is that the file is only marked for delete until such time as the last SHARE_DELETE handle on it is closed. Until that time, an attempt to (re)create the file for anything other than SHARE_DELETE will fail. As you say, it's a timing issue. Making os.unlink on Windows more robust may be a good idea, but it's not going to help this issue. See my test-case.py on an earlier message for reproduction: http://bugs.python.org/file16869 TJG |
|||
msg102869 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2010-04-11 19:12 | |
> I'm afraid that the problem doesn't lie in the unlink: DeleteFile > succeeds. The problem is that the file is only marked for delete > until such time as the last SHARE_DELETE handle on it is closed. Then we shouldn't use DeleteFile in the first place to delete the file, but instead CreateFile, with DELETE access (and FILE_SHARE_DELETE sharing). If that fails, we need to move the file to the bin (see unlink_nt for details). > Making os.unlink on Windows more robust may be a good > idea, but it's not going to help this issue. See my test-case.py > on an earlier message for reproduction: It certainly will help this case also. It would detect that the file is still open, and move it into the trash bin. |
|||
msg102873 - (view) | Author: Tim Golden (tim.golden) * | Date: 2010-04-11 19:31 | |
> Then we shouldn't use DeleteFile in the first place to delete the file, > but instead CreateFile, with DELETE access (and FILE_SHARE_DELETE > sharing). If that fails, we need to move the file to the bin > (see unlink_nt for details). I see what you're getting at. I'm getting to the end of my day here, but I'll try to put a patch together for posixmodule.c when I can, if no-one else gets there first. Would you agree that py3k is the only target branch worth aiming for? |
|||
msg102880 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2010-04-11 20:06 | |
> Would you agree that py3k is the only target branch worth aiming for? Most certainly, yes. |
|||
msg114124 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-08-17 13:39 | |
I closed issue 9627 as a duplicate of this - the buildbot failure referenced there was most likely due to something else holding open a temporary file that the test suite thought was closed. |
|||
msg132203 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2011-03-26 01:25 | |
As I can see cygwin's unlink_nt uses Nt* functions family (NtSetInformationFile etc) which are part of Windows DDK. Do you like to use them or prefer SDK ones (say SetFileInformationByHandle)? In second case python unlink can borrow deletion schema from cygwin for modern Windows versions (Vista+) and return to legacy trivial DeleteFile call if OS is WinXP. |
|||
msg132326 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-03-27 14:14 | |
FWIW, Mercurial uses the following dance: http://selenic.com/repo/hg/file/463aca32a937/mercurial/windows.py#l296 (Mercurial is under the GPL, so we can't copy that code verbatim; but it can serve as an inspiration) |
|||
msg132330 - (view) | Author: Tim Golden (tim.golden) * | Date: 2011-03-27 14:23 | |
For clarity, while making unlink more robust is no bad thing, the error occurs when the unlink *succeeds* but a subsequent create of the same name fails. This happens when an indexer, Virus scanner or TortoiseSvn etc. has opened the file with SHARE_DELETE. This allows a DeleteFile to succeed but continues to hold an open handle on the file. A following test which uses an identically named file (such as the global TESTFN) will fail if not enough time has elapsed for the background process to release its handle. A good candidate to see this in action is the test for tarfile. I did start to undertake a conversion of TESTFN to a named temporary, but it started to sprawl all over the place and came up against a number of corner cases (eg where tests deliberately wanted two filenames to be the same) so I gave up. |
|||
msg132333 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-03-27 14:33 | |
> I did start to undertake a conversion of TESTFN to a named temporary, > but it started to sprawl all over the place and came up against a > number of corner cases (eg where tests deliberately wanted two > filenames to be the same) so I gave up. How about renaming to a unique random name just before the unlink(), as Mercurial does? Would it alleviate the problem or am I missing something? |
|||
msg132352 - (view) | Author: Tim Golden (tim.golden) * | Date: 2011-03-27 17:16 | |
Well http://bugs.python.org/issue7443#msg102833 outlines the problems I encountered while trying to do essentially that. Nothing insurmountable, but definitely bigger than simply adding one line of code. Looks to me like there are two avenues of approach (and, given the chatter on this issue, several people willing to address them): * Patch Py_DeleteFileW in posixmodule.c so that it renames before deleting: should solve the problem overall but obviously has a possible wider impact, in general and on performance in particular. This rename might be a simple rename-to-guid or something more sophisticated such as the rename-to-recycler which cygwin uses. * Patch support.unlink in the test package to do the rename dance on the basis that it'll fix at least some of the problems with less impact overall. Opinions? I'm willing to do either. |
|||
msg132589 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-03-30 13:49 | |
> * Patch Py_DeleteFileW in posixmodule.c so that it renames before > deleting: should solve the problem overall but obviously has a > possible wider impact, in general and on performance in particular. > This rename might be a simple rename-to-guid or something more > sophisticated such as the rename-to-recycler which cygwin uses. > > * Patch support.unlink in the test package to do the rename dance on > the basis that it'll fix at least some of the problems with less > impact overall. > > Opinions? I'm willing to do either. Well, since I'm not a Windows expert, I can only give an intuitive opinion. I think that we should start with patching support.unlink(); tweaking Py_DeleteFile() so as to do something more sophisticated than a simple removal sounds like it could hide some behaviour change that could hit some legitimate uses. (as an aside, for higher-level variants of OS functions, the shutil may be an appropriate recipient) |
|||
msg132766 - (view) | Author: Jeff Dean (jdigital) | Date: 2011-04-01 21:08 | |
> * Patch Py_DeleteFileW in posixmodule.c so that it renames before > deleting: should solve the problem overall but obviously has a > possible wider impact, in general and on performance in particular. > This rename might be a simple rename-to-guid or something more > sophisticated such as the rename-to-recycler which cygwin uses. > > * Patch support.unlink in the test package to do the rename dance on > the basis that it'll fix at least some of the problems with less > impact overall. > > Opinions? I'm willing to do either. I vote for fixing the test package. File system "extensions" may track and record this activity. To use DropBox as an example, doing the rename and delete will cause the renamed and deleted file to be recorded. Just my opinion, but the code path to delete a file should be as short as possible. Making lots of other OS calls just doesn't seem right. I understand the wish to have a reliable unlink call but I'd be uncomfortable with a workaround that may be visible around the edges. |
|||
msg132787 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2011-04-02 04:33 | |
I'll throw in my 2 cents as to a possible way forward: - fix test.support.unlink in all current maintenance branches (2.7, 3.1, 3.2, default) - expose the feature to users as "shutil.rmfile" for 3.3 |
|||
msg132793 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2011-04-02 09:57 | |
FYI: implementation of unlink already has multiple calls for unicode version to process symlinks. Ansi version is the simple DeleteFileA call. Now we have different behavior for unicode and ansi variants as I can see. Now I'm working on unlink implementations partially borrowed from cygwin as Martin suggested. I still not have a final patch covering all known cases. Perhaps I can introduce it next week. |
|||
msg167191 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2012-08-02 05:11 | |
See #15496 for an alternative approach to solving this problem, at least in the test suite - as noted in that issue, the rename dance isn't sufficient when the problem gets triggered by a different sequence like: unlink(file_in_parent_dir) unlink(parent_dir) # This fails if file is still locked |
|||
msg194002 - (view) | Author: Tim Golden (tim.golden) * | Date: 2013-07-31 15:12 | |
This has been covered off by work done with the test.support package including context managers for temporary files / directories, and a waitfor mechanism which waits some time if a file can't be accessed. |
|||
msg194003 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2013-07-31 15:26 | |
The support package is fixed (I presume :), but there was also a desire expressed to expose this functionality in shutil, since it can arise in "normal" Python programs as well as in the test suite. Given that the fix has been successful in the test suite, shouldn't we open an issue to expose it in shutil? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:55 | admin | set | github: 51692 |
2013-07-31 15:26:00 | r.david.murray | set | messages: + msg194003 |
2013-07-31 15:12:19 | tim.golden | set | status: open -> closed resolution: works for me messages: + msg194002 |
2012-09-15 16:38:31 | brian.curtin | link | issue15946 superseder |
2012-08-02 05:11:48 | ncoghlan | set | messages: + msg167191 |
2012-08-02 03:04:08 | chris.jerdonek | set | nosy:
+ chris.jerdonek |
2011-05-16 22:22:58 | nadeem.vawda | set | nosy:
+ nadeem.vawda |
2011-04-02 09:57:49 | asvetlov | set | messages: + msg132793 |
2011-04-02 04:33:01 | ncoghlan | set | messages: + msg132787 |
2011-04-01 21:08:51 | jdigital | set | nosy:
+ jdigital messages: + msg132766 |
2011-03-30 13:49:40 | pitrou | set | messages: + msg132589 |
2011-03-27 17:16:45 | tim.golden | set | messages: + msg132352 |
2011-03-27 14:33:49 | pitrou | set | messages: + msg132333 |
2011-03-27 14:23:37 | tim.golden | set | messages: + msg132330 |
2011-03-27 14:14:40 | pitrou | set | nosy:
+ pitrou messages: + msg132326 |
2011-03-26 01:40:54 | asvetlov | set | versions: + Python 3.3, - Python 2.7, Python 3.2 |
2011-03-26 01:25:26 | asvetlov | set | messages: + msg132203 |
2010-08-17 13:39:20 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg114124 |
2010-08-17 13:37:41 | ncoghlan | link | issue9627 superseder |
2010-08-06 15:02:45 | tim.golden | set | assignee: tim.golden |
2010-04-11 20:06:54 | loewis | set | messages: + msg102880 |
2010-04-11 19:31:56 | tim.golden | set | messages: + msg102873 |
2010-04-11 19:12:23 | loewis | set | messages: + msg102869 |
2010-04-11 18:38:07 | tim.golden | set | messages: + msg102863 |
2010-04-11 17:57:55 | loewis | set | messages: + msg102860 |
2010-04-11 14:24:50 | jaraco | set | messages: + msg102837 |
2010-04-11 14:12:01 | tim.golden | set | files:
+ test-case.py messages: + msg102833 |
2010-04-09 13:05:32 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg102720 stage: patch review -> needs patch |
2010-04-09 08:34:47 | tim.golden | set | files:
+ watch_dir.py messages: + msg102698 |
2010-04-09 08:11:34 | techtonik | set | messages: + msg102696 |
2010-04-09 06:40:57 | tim.golden | set | nosy:
+ tim.golden messages: + msg102685 |
2010-04-09 01:13:20 | jaraco | set | nosy:
+ jaraco |
2010-04-01 22:56:46 | asvetlov | set | messages: + msg102137 |
2010-04-01 00:28:06 | techtonik | set | nosy:
+ techtonik messages: + msg102049 |
2010-03-20 02:19:52 | jafo | set | nosy:
+ jafo messages: + msg101355 |
2010-02-09 08:05:50 | flox | set | messages: + msg99093 |
2010-02-09 07:15:54 | flox | set | priority: low -> high nosy: + brian.curtin, flox messages: + msg99089 |
2009-12-05 23:09:04 | asvetlov | set | messages: + msg96012 |
2009-12-05 20:12:58 | loewis | set | nosy:
+ loewis messages: + msg96004 |
2009-12-05 19:56:57 | brett.cannon | set | priority: low keywords: + needs review stage: patch review |
2009-12-05 15:32:44 | asvetlov | set | messages: + msg95993 |
2009-12-05 15:21:23 | asvetlov | set | files:
+ python2.patch messages: + msg95992 |
2009-12-05 15:20:02 | asvetlov | create |