classification
Title: test_startfile crash on Windows 7 AMD64
Type: crash Stage: resolved
Components: Tests Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: chris.jerdonek, georg.brandl, jeremy.kloth, jkloth, ncoghlan, pitrou, python-dev, sbt, vstinner
Priority: normal Keywords: buildbot, patch

Created on 2012-08-01 13:30 by ncoghlan, last changed 2017-05-11 00:43 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
test_startfile.diff jkloth, 2012-08-21 13:22 review
Pull Requests
URL Status Linked Edit
PR 1537 merged vstinner, 2017-05-10 16:30
Messages (16)
msg167126 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-01 13:30
regrtest bailed out completely on one of the Windows 7 builders due to a problem with accessing the temporary working directory:

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/410/steps/test/logs/stdio

It's seems a touch suspicious that test_fork1 just so happened to be the test executed right before test_startfile blew up.

Not marking as a blocker yet, since my initial assumption is that it's a race condition with a particular sequence of tests rather than a problem in Python itself. I'll change my mind if it happens again on the rebuild I just forced.
msg167128 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-01 13:36
Well, test_startfile seems to be a common issue on that buildbot.
msg167133 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-01 14:17
> Well, test_startfile seems to be a common issue on that buildbot.

I wouldn't really call it common (twice in the last 30 runs).

However I would wager that the PermissionError is being caused by an
unknown (without having Process Monitor running when the error occurs)
process temporarily grabbing a handle to a file/directory within the
tests temporary directory.

This just reinforces the need for the solution in issue15496 to be applied.
msg167184 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-02 01:16
I have just completed an upgrade of the Win64 buildbot slave with a
faster hard drive and the issue is now much more pronounced.
msg168537 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-18 22:53
It crashed again, despite issue15496 being fixed:
http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/496
msg168543 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-19 00:29
Unfortunately, this is a legitimate failure of the test.  The test
(actually the support code) is attempting to remove a directory that
is the current directory of an active process.  The test has
documented this issue and attempted to work around it by adding a
sleep().

To fix this test there a few options:
1) increase the sleep timeout
2) change the current working directory before calling os.startfile()
(possible that of the Python interpreter itself)
3) code a "wait for child process" function using ctypes and the
Toolhelp API (Process32First/Next)
4) change os.startfile() to use ShellExecuteEx and use the hProcess
handle as the return value and use that with os.waitpid()

#4 is the most accurate, but does introduce an API change (the
introduction of a return value to os.startfile)
#3 is more work but may be helpful for other tests
#2 will definitely work in this case but does violate the testing sandbox
#1 is the least disruptive, but is really just avoiding the root cause

I would like to see #4 but realize that it may be too late for 3.3.
#3 is not bad either as it has impact only on the test suite.  But any
of the above should fix this particular case.
msg168553 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-08-19 10:31
I think the reason that it is only this buildbot which fails is that the other Windows buildbots don't use multiple processes.  Therefore they don't use a different dir for each test.

> 4) change os.startfile() to use ShellExecuteEx and use the hProcess
> handle as the return value and use that with os.waitpid()

Would this cause a handle leak if os.waitpid() is not used?
msg168576 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-19 14:32
> I think the reason that it is only this buildbot which fails is that the other Windows buildbots don't use multiple processes.  Therefore they don't use a different dir for each test.

That might be it.  Also the failure possibly only happens when
multiple builds are being run thus slowing down process creation and
termination.

> Would this cause a handle leak if os.waitpid() is not used?

It seems so, yes.

So to expand on #4:
4a) create a new handle type that closes the handles on dealloc
4b) return the process ID instead using GetProcessId() and callers
interested in waiting would then need to use _winapi.OpenProcess() to
convert it to a handle for os.waitpid() or
_winapi.WaitForSingleObject()
4c) add a third optional argument to os.startfile() "mode" that mimics
the mode semantics of the os.spawn*() functions
msg168577 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-19 14:43
I think the two "simple and stupid" solutions (#1 and #2) have a certain charm myself :) Especially #1, which is the simplest of all.
msg168578 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2012-08-19 15:26
However #1 is the reason that is bug exists in the first place.  The
designer of the test guessed wrong on the "magic value" for the
timeout.  There will never be a correct timeout value as it varies
from machine to machine and from workload to workload on a given
machine.  For any value that is picked, there exists a scenario where
it will fail.

#2 is certainly a viable work-around and it appears that other tests
(notably the test for fork/exec) do similar so it wouldn't be
unprecedented

#3 is really only useful if other tests need a "wait for process"
helper on Windows.

#4 really just highlights a deficiency with os.startfile() so I'm fine
with deferring that to a feature request for 3.4.

I'll cook up a patch implementing #2 unless anyone else is feeling ambitious.
msg168767 - (view) Author: Jeremy Kloth (jkloth) * Date: 2012-08-21 13:22
Here is the patch implementing option #2
msg170520 - (view) Author: Jeremy Kloth (jkloth) * Date: 2012-09-15 15:00
This test is still intermittently failing on the AMD64 Windows7 SP1 buildbot: 

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/630

Any chance the patch could be committed?
msg170534 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-15 22:13
New changeset bc5c5b79b7e1 by Antoine Pitrou in branch '3.2':
Issue #15526: try to fix test_startfile's inability to clean up after itself in time.
http://hg.python.org/cpython/rev/bc5c5b79b7e1

New changeset 1704deb7e6d7 by Antoine Pitrou in branch 'default':
Issue #15526: try to fix test_startfile's inability to clean up after itself in time.
http://hg.python.org/cpython/rev/1704deb7e6d7
msg170535 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-15 22:15
Ok, hopefully there'll no more be failures now.
msg293463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-10 23:23
New changeset 3837d9797c14c13d170256143c841d29645e772a by Victor Stinner in branch '2.7':
bpo-15526: test_startfile changes the cwd (#1537)
https://github.com/python/cpython/commit/3837d9797c14c13d170256143c841d29645e772a
msg293468 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-11 00:43
I backported the fix to Python 2.7 to fix bpo-30334.
History
Date User Action Args
2017-05-11 00:43:00vstinnersetmessages: + msg293468
2017-05-10 23:23:21vstinnersetnosy: + vstinner
messages: + msg293463
2017-05-10 16:30:47vstinnersetpull_requests: + pull_request1636
2012-12-10 15:21:57sbtsetstatus: open -> closed
2012-09-15 22:15:36pitrousetresolution: fixed
stage: resolved
messages: + msg170535
versions: + Python 3.2
2012-09-15 22:13:45python-devsetnosy: + python-dev
messages: + msg170534
2012-09-15 15:00:22jklothsetmessages: + msg170520
2012-08-21 13:22:08jklothsetfiles: + test_startfile.diff

nosy: + jkloth
messages: + msg168767

keywords: + patch
2012-08-19 15:26:20jeremy.klothsetmessages: + msg168578
2012-08-19 14:43:24pitrousetmessages: + msg168577
2012-08-19 14:32:26jeremy.klothsetmessages: + msg168576
2012-08-19 10:31:37sbtsetnosy: + sbt
messages: + msg168553
2012-08-19 00:29:51jeremy.klothsetmessages: + msg168543
2012-08-18 22:53:36pitrousetmessages: + msg168537
title: regrtest crash on Windows 7 AMD64 -> test_startfile crash on Windows 7 AMD64
2012-08-02 03:02:58chris.jerdoneksetnosy: + chris.jerdonek
2012-08-02 01:16:58jeremy.klothsetmessages: + msg167184
2012-08-01 14:17:33jeremy.klothsetmessages: + msg167133
2012-08-01 13:36:16pitrousetnosy: + jeremy.kloth, pitrou
messages: + msg167128
2012-08-01 13:30:03ncoghlancreate