classification
Title: harden directory removal for tests on Windows
Type: enhancement Stage: resolved
Components: Tests, Windows Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brian.curtin Nosy List: brian.curtin, chris.jerdonek, jeremy.kloth, jkloth, loewis, ncoghlan, pitrou, python-dev, tim.golden
Priority: normal Keywords: patch

Created on 2012-07-30 02:14 by jkloth, last changed 2012-08-13 22:28 by brian.curtin. This issue is now closed.

Files
File name Uploaded Description Edit
support.diff jkloth, 2012-07-30 02:14 Patch for test.support review
support.diff jkloth, 2012-08-02 14:44 Patch without ctypes review
support.diff jkloth, 2012-08-04 21:15 v3: updated comments
Messages (23)
msg166853 - (view) Author: Jeremy Kloth (jkloth) * Date: 2012-07-30 02:14
Currently, removing directories during testing on Windows w/NTFS can causing sporadic failures due to access denied errors.  This is caused by other processes getting a handle to the directory itself (change notifications) or a handle to a file within the directory.  The most notable offender is the Indexing Service.

Most (but not all!) external programs can be configured to ignore the development directory.  For example, the Indexing Service can be disabled or have those directories ignored.  TortoiseSVN is another offender that can exclude directories but each directory needs to be listed separately so it is easy to forgot one.  On my machine I have programs that simply do not have an option to ignore any directories thus causing some grief during testing.

The attached patch to test.support eliminates the need to disable or and ignores to any programs (tested on a Win7-x64 i7-3770K@4.3GHz).

It achieves this by checking for the removal of the directory before returning to the caller.  It performs an exponential backoff timeout loop that amounts to a total of ~1 second in the worst case.  If the directory is not removed from the filesystem by then, it will probably be in error anyway.  However, the loop is seldom executed more than once.
msg166862 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2012-07-30 07:33
This is a (near) duplicate of issue7443, I think.
msg166892 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-07-30 12:42
> This is a (near) duplicate of issue7443, I think.

Partially so it seems.  However, my testing with Process Monitor (from
sysinterals) shows that most of the access denied errors occur when
removing directories.

The blind rename-remove dance doesn't work when removing entire
directory trees as now the renamed file/directory might be held
pending delete when the removing parent directory.

For some background for newcomers, see http://support.microsoft.com/kb/159199

For testing, I used regrtest -F -j6 test_import

Without patching it would fail consistently (albeit randomly).
msg167132 - (view) Author: Jeremy Kloth (jkloth) * Date: 2012-08-01 14:17
I must also add that the proposed solution works well within the test suite as the access denied error can also occur when creating subsequent files, not just removing them.

This solution eliminates the need to wrap all creation calls with access denied handling, a huge plus IMHO.
msg167197 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2012-08-02 07:38
I'm +1 on the approach in principle. I'm tentative about using ctypes
for this just because I don't believe we use it anywhere else. But at
the least I suggest applying the patch to see how Jeremy's buildbot
behaves. If there are wider objections to ctypes we could always work up
a C patch. (In 3.3+ we could use the new _winapi module).
msg167200 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-02 10:02
My inclination is to say that using ctypes is a reasonable option for improving Windows buildbot stability in the near term, but we'd probably want to move this into _winapi long term.

Adding Antoine & MvL to get their opinion.
msg167201 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-02 10:20
I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.
msg167213 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-02 13:06
> On my machine I have programs that simply do not have an option to
> ignore any directories thus causing some grief during testing.

What are those programs exactly? A buildbot should ideally have a lean system install that does not interfere with the tests running.
msg167217 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-02 13:49
> What are those programs exactly? A buildbot should ideally have a lean system install that does not interfere with the tests running.

My development machine has add'l programs, not the buildbot machine.
Sorry is there was any confusion.  I get the same occasional access
denied errors when running the tests locally.
msg167220 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-02 14:01
> Tim Golden added the comment:
> I'm tentative about using ctypes this just because I don't believe we use it anywhere else.

ctypes is already used later in test_support so I figured it was fine
to use for this as well.
msg167222 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-02 14:08
> I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.

It is possible to do the same thing with just os.listdir.  The use of
the Find*File functions was chosen to eliminate duplicating the wait
function for the two different cases.  Also,  it is just less resource
and time intensive to use the Find*File API directly (no need to build
an entire list when just testing if any entries exist).
msg167223 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-02 14:10
> > I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.
> 
> It is possible to do the same thing with just os.listdir.  The use of
> the Find*File functions was chosen to eliminate duplicating the wait
> function for the two different cases.  Also,  it is just less resource
> and time intensive to use the Find*File API directly (no need to build
> an entire list when just testing if any entries exist).

But it's certainly much easier to maintain using regular Python APIs. We
would use ctypes if the functionality wasn't accessible from pure
Python, but since it is, we don't want to reinvent the wheel.
msg167226 - (view) Author: Jeremy Kloth (jkloth) * Date: 2012-08-02 14:44
OK, here is another patch that uses just os.listdir
msg167232 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-02 15:24
I fail to see the point of not using os.stat. IIUC, Windows will delete the file once the last handle is been closed. Since stat will close any handle it temporarily gets, it will not prolong the live of the file; the file will still go away when the last process has closed it.

Performance is not an issue at all here, since we are waiting for the deletion of the file anyway. So checking whether the file is in the directory listing is fine with me as well. Unless someone can demonstrate how os.stat can prevent removal of the file, I'd like to see the comment corrected, though.
msg167441 - (view) Author: Jeremy Kloth (jkloth) * Date: 2012-08-04 21:15
I've updated the comment in the patch to reflect Martin's concern.

Martin is partially correct in that the handle opened in the stat() call will not prolong the pending status.  It is due to the fact that it does not open the handle with any sharing mode set, thus effectively blocking any other process from grabbing another handle to the file while the stat function has its handle open.
msg167445 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-04 21:47
With the latest changes, is there anything left preventing the
inclusion of this patch?

Without some change, the Win64 buildbot is relatively irrelevant as it
is nearly always in a state of failure due to these errors.
msg167446 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2012-08-04 21:53
> Without some change, the Win64 buildbot is relatively irrelevant as it
is nearly always in a state of failure due to these errors.

Not that some change isn't necessary, but what else are you running on your build slave? I ran a Windows 2008 R2 x64 slave for some time and it never had issues around file/directory removal. I only had to decommission it because the physical machine became unreliable.
msg167448 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-04 21:58
> Not that some change isn't necessary, but what else are you running on your build slave? I ran a Windows 2008 R2 x64 slave for some time and it never had issues around file/directory removal. I only had to decommission it because the physical machine became unreliable.

The errors only started happening after upgrading the HD from a PATA
Ultra133 to an SATA3 SSD.  The super-fast HD is allowing for these
timing errors to show through.
msg168148 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-13 21:50
Brian, Tim, do you think this should be committed?
msg168150 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2012-08-13 21:54
The latest patch to test.support looks reasonable. Go for it.
msg168151 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2012-08-13 22:02
Fine with me
msg168154 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-08-13 22:13
New changeset fcad4566910b by Brian Curtin in branch '3.2':
Fix #15496. Add directory removal helpers to make Windows tests more reliable. Patch by Jeremy Kloth
http://hg.python.org/cpython/rev/fcad4566910b
msg168157 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-08-13 22:27
New changeset c863dadc65eb by Brian Curtin in branch '2.7':
Fix #15496. Add directory removal helpers to make Windows tests more reliable. Patch by Jeremy Kloth
http://hg.python.org/cpython/rev/c863dadc65eb
History
Date User Action Args
2012-08-13 22:28:00brian.curtinsetstatus: open -> closed
assignee: brian.curtin
resolution: fixed
components: + Windows
stage: patch review -> resolved
2012-08-13 22:27:16python-devsetmessages: + msg168157
2012-08-13 22:13:13python-devsetnosy: + python-dev
messages: + msg168154
2012-08-13 22:02:58tim.goldensetmessages: + msg168151
2012-08-13 21:54:25brian.curtinsetmessages: + msg168150
2012-08-13 21:50:50pitrousetmessages: + msg168148
versions: - Python 3.4
2012-08-04 21:58:18jeremy.klothsetmessages: + msg167448
2012-08-04 21:53:49brian.curtinsetmessages: + msg167446
2012-08-04 21:47:39jeremy.klothsetmessages: + msg167445
2012-08-04 21:15:53jklothsetfiles: + support.diff

messages: + msg167441
2012-08-02 15:24:21loewissetmessages: + msg167232
2012-08-02 14:44:41jklothsetfiles: + support.diff

messages: + msg167226
2012-08-02 14:10:06pitrousetmessages: + msg167223
2012-08-02 14:08:03jeremy.klothsetmessages: + msg167222
2012-08-02 14:01:34jeremy.klothsetmessages: + msg167220
2012-08-02 13:49:01jeremy.klothsetmessages: + msg167217
2012-08-02 13:06:13pitrousetmessages: + msg167213
2012-08-02 10:20:39loewissetmessages: + msg167201
2012-08-02 10:02:38ncoghlansetnosy: + loewis, pitrou
messages: + msg167200
2012-08-02 07:38:25tim.goldensetmessages: + msg167197
2012-08-02 05:03:15ncoghlansetnosy: + ncoghlan
2012-08-02 03:03:53chris.jerdoneksetnosy: + chris.jerdonek
2012-08-01 14:37:46pitrousetnosy: + brian.curtin
stage: patch review

versions: + Python 2.7, Python 3.2
2012-08-01 14:17:23jklothsetmessages: + msg167132
2012-07-30 12:42:02jeremy.klothsetnosy: + jeremy.kloth
messages: + msg166892
2012-07-30 07:33:41tim.goldensetnosy: + tim.golden
messages: + msg166862
2012-07-30 02:14:55jklothcreate