msg166853 - (view) |
Author: Jeremy Kloth (jkloth) * |
Date: 2012-07-30 02:14 |
Currently, removing directories during testing on Windows w/NTFS can causing sporadic failures due to access denied errors. This is caused by other processes getting a handle to the directory itself (change notifications) or a handle to a file within the directory. The most notable offender is the Indexing Service.
Most (but not all!) external programs can be configured to ignore the development directory. For example, the Indexing Service can be disabled or have those directories ignored. TortoiseSVN is another offender that can exclude directories but each directory needs to be listed separately so it is easy to forgot one. On my machine I have programs that simply do not have an option to ignore any directories thus causing some grief during testing.
The attached patch to test.support eliminates the need to disable or and ignores to any programs (tested on a Win7-x64 i7-3770K@4.3GHz).
It achieves this by checking for the removal of the directory before returning to the caller. It performs an exponential backoff timeout loop that amounts to a total of ~1 second in the worst case. If the directory is not removed from the filesystem by then, it will probably be in error anyway. However, the loop is seldom executed more than once.
|
msg166862 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2012-07-30 07:33 |
This is a (near) duplicate of issue7443, I think.
|
msg166892 - (view) |
Author: Jeremy Kloth (jeremy.kloth) |
Date: 2012-07-30 12:42 |
> This is a (near) duplicate of issue7443, I think.
Partially so it seems. However, my testing with Process Monitor (from
sysinterals) shows that most of the access denied errors occur when
removing directories.
The blind rename-remove dance doesn't work when removing entire
directory trees as now the renamed file/directory might be held
pending delete when the removing parent directory.
For some background for newcomers, see http://support.microsoft.com/kb/159199
For testing, I used regrtest -F -j6 test_import
Without patching it would fail consistently (albeit randomly).
|
msg167132 - (view) |
Author: Jeremy Kloth (jkloth) * |
Date: 2012-08-01 14:17 |
I must also add that the proposed solution works well within the test suite as the access denied error can also occur when creating subsequent files, not just removing them.
This solution eliminates the need to wrap all creation calls with access denied handling, a huge plus IMHO.
|
msg167197 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2012-08-02 07:38 |
I'm +1 on the approach in principle. I'm tentative about using ctypes
for this just because I don't believe we use it anywhere else. But at
the least I suggest applying the patch to see how Jeremy's buildbot
behaves. If there are wider objections to ctypes we could always work up
a C patch. (In 3.3+ we could use the new _winapi module).
|
msg167200 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2012-08-02 10:02 |
My inclination is to say that using ctypes is a reasonable option for improving Windows buildbot stability in the near term, but we'd probably want to move this into _winapi long term.
Adding Antoine & MvL to get their opinion.
|
msg167201 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-02 10:20 |
I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.
|
msg167213 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-08-02 13:06 |
> On my machine I have programs that simply do not have an option to
> ignore any directories thus causing some grief during testing.
What are those programs exactly? A buildbot should ideally have a lean system install that does not interfere with the tests running.
|
msg167217 - (view) |
Author: Jeremy Kloth (jeremy.kloth) |
Date: 2012-08-02 13:49 |
> What are those programs exactly? A buildbot should ideally have a lean system install that does not interfere with the tests running.
My development machine has add'l programs, not the buildbot machine.
Sorry is there was any confusion. I get the same occasional access
denied errors when running the tests locally.
|
msg167220 - (view) |
Author: Jeremy Kloth (jeremy.kloth) |
Date: 2012-08-02 14:01 |
> Tim Golden added the comment:
> I'm tentative about using ctypes this just because I don't believe we use it anywhere else.
ctypes is already used later in test_support so I figured it was fine
to use for this as well.
|
msg167222 - (view) |
Author: Jeremy Kloth (jeremy.kloth) |
Date: 2012-08-02 14:08 |
> I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.
It is possible to do the same thing with just os.listdir. The use of
the Find*File functions was chosen to eliminate duplicating the wait
function for the two different cases. Also, it is just less resource
and time intensive to use the Find*File API directly (no need to build
an entire list when just testing if any entries exist).
|
msg167223 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-08-02 14:10 |
> > I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.
>
> It is possible to do the same thing with just os.listdir. The use of
> the Find*File functions was chosen to eliminate duplicating the wait
> function for the two different cases. Also, it is just less resource
> and time intensive to use the Find*File API directly (no need to build
> an entire list when just testing if any entries exist).
But it's certainly much easier to maintain using regular Python APIs. We
would use ctypes if the functionality wasn't accessible from pure
Python, but since it is, we don't want to reinvent the wheel.
|
msg167226 - (view) |
Author: Jeremy Kloth (jkloth) * |
Date: 2012-08-02 14:44 |
OK, here is another patch that uses just os.listdir
|
msg167232 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-02 15:24 |
I fail to see the point of not using os.stat. IIUC, Windows will delete the file once the last handle is been closed. Since stat will close any handle it temporarily gets, it will not prolong the live of the file; the file will still go away when the last process has closed it.
Performance is not an issue at all here, since we are waiting for the deletion of the file anyway. So checking whether the file is in the directory listing is fine with me as well. Unless someone can demonstrate how os.stat can prevent removal of the file, I'd like to see the comment corrected, though.
|
msg167441 - (view) |
Author: Jeremy Kloth (jkloth) * |
Date: 2012-08-04 21:15 |
I've updated the comment in the patch to reflect Martin's concern.
Martin is partially correct in that the handle opened in the stat() call will not prolong the pending status. It is due to the fact that it does not open the handle with any sharing mode set, thus effectively blocking any other process from grabbing another handle to the file while the stat function has its handle open.
|
msg167445 - (view) |
Author: Jeremy Kloth (jeremy.kloth) |
Date: 2012-08-04 21:47 |
With the latest changes, is there anything left preventing the
inclusion of this patch?
Without some change, the Win64 buildbot is relatively irrelevant as it
is nearly always in a state of failure due to these errors.
|
msg167446 - (view) |
Author: Brian Curtin (brian.curtin) * |
Date: 2012-08-04 21:53 |
> Without some change, the Win64 buildbot is relatively irrelevant as it
is nearly always in a state of failure due to these errors.
Not that some change isn't necessary, but what else are you running on your build slave? I ran a Windows 2008 R2 x64 slave for some time and it never had issues around file/directory removal. I only had to decommission it because the physical machine became unreliable.
|
msg167448 - (view) |
Author: Jeremy Kloth (jeremy.kloth) |
Date: 2012-08-04 21:58 |
> Not that some change isn't necessary, but what else are you running on your build slave? I ran a Windows 2008 R2 x64 slave for some time and it never had issues around file/directory removal. I only had to decommission it because the physical machine became unreliable.
The errors only started happening after upgrading the HD from a PATA
Ultra133 to an SATA3 SSD. The super-fast HD is allowing for these
timing errors to show through.
|
msg168148 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-08-13 21:50 |
Brian, Tim, do you think this should be committed?
|
msg168150 - (view) |
Author: Brian Curtin (brian.curtin) * |
Date: 2012-08-13 21:54 |
The latest patch to test.support looks reasonable. Go for it.
|
msg168151 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2012-08-13 22:02 |
Fine with me
|
msg168154 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-08-13 22:13 |
New changeset fcad4566910b by Brian Curtin in branch '3.2':
Fix #15496. Add directory removal helpers to make Windows tests more reliable. Patch by Jeremy Kloth
http://hg.python.org/cpython/rev/fcad4566910b
|
msg168157 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-08-13 22:27 |
New changeset c863dadc65eb by Brian Curtin in branch '2.7':
Fix #15496. Add directory removal helpers to make Windows tests more reliable. Patch by Jeremy Kloth
http://hg.python.org/cpython/rev/c863dadc65eb
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:33 | admin | set | github: 59701 |
2012-08-13 22:28:00 | brian.curtin | set | status: open -> closed assignee: brian.curtin resolution: fixed components:
+ Windows stage: patch review -> resolved |
2012-08-13 22:27:16 | python-dev | set | messages:
+ msg168157 |
2012-08-13 22:13:13 | python-dev | set | nosy:
+ python-dev messages:
+ msg168154
|
2012-08-13 22:02:58 | tim.golden | set | messages:
+ msg168151 |
2012-08-13 21:54:25 | brian.curtin | set | messages:
+ msg168150 |
2012-08-13 21:50:50 | pitrou | set | messages:
+ msg168148 versions:
- Python 3.4 |
2012-08-04 21:58:18 | jeremy.kloth | set | messages:
+ msg167448 |
2012-08-04 21:53:49 | brian.curtin | set | messages:
+ msg167446 |
2012-08-04 21:47:39 | jeremy.kloth | set | messages:
+ msg167445 |
2012-08-04 21:15:53 | jkloth | set | files:
+ support.diff
messages:
+ msg167441 |
2012-08-02 15:24:21 | loewis | set | messages:
+ msg167232 |
2012-08-02 14:44:41 | jkloth | set | files:
+ support.diff
messages:
+ msg167226 |
2012-08-02 14:10:06 | pitrou | set | messages:
+ msg167223 |
2012-08-02 14:08:03 | jeremy.kloth | set | messages:
+ msg167222 |
2012-08-02 14:01:34 | jeremy.kloth | set | messages:
+ msg167220 |
2012-08-02 13:49:01 | jeremy.kloth | set | messages:
+ msg167217 |
2012-08-02 13:06:13 | pitrou | set | messages:
+ msg167213 |
2012-08-02 10:20:39 | loewis | set | messages:
+ msg167201 |
2012-08-02 10:02:38 | ncoghlan | set | nosy:
+ loewis, pitrou messages:
+ msg167200
|
2012-08-02 07:38:25 | tim.golden | set | messages:
+ msg167197 |
2012-08-02 05:03:15 | ncoghlan | set | nosy:
+ ncoghlan
|
2012-08-02 03:03:53 | chris.jerdonek | set | nosy:
+ chris.jerdonek
|
2012-08-01 14:37:46 | pitrou | set | nosy:
+ brian.curtin stage: patch review
versions:
+ Python 2.7, Python 3.2 |
2012-08-01 14:17:23 | jkloth | set | messages:
+ msg167132 |
2012-07-30 12:42:02 | jeremy.kloth | set | nosy:
+ jeremy.kloth messages:
+ msg166892
|
2012-07-30 07:33:41 | tim.golden | set | nosy:
+ tim.golden messages:
+ msg166862
|
2012-07-30 02:14:55 | jkloth | create | |