Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden directory removal for tests on Windows #59701

Closed
jkloth opened this issue Jul 30, 2012 · 23 comments
Closed

harden directory removal for tests on Windows #59701

jkloth opened this issue Jul 30, 2012 · 23 comments
Assignees
Labels
OS-windows tests Tests in the Lib/test dir type-feature A feature request or enhancement

Comments

@jkloth
Copy link
Contributor

jkloth commented Jul 30, 2012

BPO 15496
Nosy @loewis, @ncoghlan, @pitrou, @tjguk, @jkloth, @briancurtin, @cjerdonek
Files
  • support.diff: Patch for test.support
  • support.diff: Patch without ctypes
  • support.diff: v3: updated comments
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/briancurtin'
    closed_at = <Date 2012-08-13.22:28:00.495>
    created_at = <Date 2012-07-30.02:14:55.041>
    labels = ['type-feature', 'tests', 'OS-windows']
    title = 'harden directory removal for tests on Windows'
    updated_at = <Date 2012-08-13.22:28:00.494>
    user = 'https://github.com/jkloth'

    bugs.python.org fields:

    activity = <Date 2012-08-13.22:28:00.494>
    actor = 'brian.curtin'
    assignee = 'brian.curtin'
    closed = True
    closed_date = <Date 2012-08-13.22:28:00.495>
    closer = 'brian.curtin'
    components = ['Tests', 'Windows']
    creation = <Date 2012-07-30.02:14:55.041>
    creator = 'jkloth'
    dependencies = []
    files = ['26590', '26663', '26693']
    hgrepos = []
    issue_num = 15496
    keywords = ['patch']
    message_count = 23.0
    messages = ['166853', '166862', '166892', '167132', '167197', '167200', '167201', '167213', '167217', '167220', '167222', '167223', '167226', '167232', '167441', '167445', '167446', '167448', '168148', '168150', '168151', '168154', '168157']
    nosy_count = 9.0
    nosy_names = ['loewis', 'ncoghlan', 'pitrou', 'tim.golden', 'jkloth', 'brian.curtin', 'jeremy.kloth', 'chris.jerdonek', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15496'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Jul 30, 2012

    Currently, removing directories during testing on Windows w/NTFS can causing sporadic failures due to access denied errors. This is caused by other processes getting a handle to the directory itself (change notifications) or a handle to a file within the directory. The most notable offender is the Indexing Service.

    Most (but not all!) external programs can be configured to ignore the development directory. For example, the Indexing Service can be disabled or have those directories ignored. TortoiseSVN is another offender that can exclude directories but each directory needs to be listed separately so it is easy to forgot one. On my machine I have programs that simply do not have an option to ignore any directories thus causing some grief during testing.

    The attached patch to test.support eliminates the need to disable or and ignores to any programs (tested on a Win7-x64 i7-3770K@4.3GHz).

    It achieves this by checking for the removal of the directory before returning to the caller. It performs an exponential backoff timeout loop that amounts to a total of ~1 second in the worst case. If the directory is not removed from the filesystem by then, it will probably be in error anyway. However, the loop is seldom executed more than once.

    @jkloth jkloth added tests Tests in the Lib/test dir type-feature A feature request or enhancement labels Jul 30, 2012
    @tjguk
    Copy link
    Member

    tjguk commented Jul 30, 2012

    This is a (near) duplicate of bpo-7443, I think.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Jul 30, 2012

    This is a (near) duplicate of bpo-7443, I think.

    Partially so it seems. However, my testing with Process Monitor (from
    sysinterals) shows that most of the access denied errors occur when
    removing directories.

    The blind rename-remove dance doesn't work when removing entire
    directory trees as now the renamed file/directory might be held
    pending delete when the removing parent directory.

    For some background for newcomers, see http://support.microsoft.com/kb/159199

    For testing, I used regrtest -F -j6 test_import

    Without patching it would fail consistently (albeit randomly).

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Aug 1, 2012

    I must also add that the proposed solution works well within the test suite as the access denied error can also occur when creating subsequent files, not just removing them.

    This solution eliminates the need to wrap all creation calls with access denied handling, a huge plus IMHO.

    @tjguk
    Copy link
    Member

    tjguk commented Aug 2, 2012

    I'm +1 on the approach in principle. I'm tentative about using ctypes
    for this just because I don't believe we use it anywhere else. But at
    the least I suggest applying the patch to see how Jeremy's buildbot
    behaves. If there are wider objections to ctypes we could always work up
    a C patch. (In 3.3+ we could use the new _winapi module).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 2, 2012

    My inclination is to say that using ctypes is a reasonable option for improving Windows buildbot stability in the near term, but we'd probably want to move this into _winapi long term.

    Adding Antoine & MvL to get their opinion.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 2, 2012

    I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 2, 2012

    On my machine I have programs that simply do not have an option to
    ignore any directories thus causing some grief during testing.

    What are those programs exactly? A buildbot should ideally have a lean system install that does not interfere with the tests running.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Aug 2, 2012

    What are those programs exactly? A buildbot should ideally have a lean system install that does not interfere with the tests running.

    My development machine has add'l programs, not the buildbot machine.
    Sorry is there was any confusion. I get the same occasional access
    denied errors when running the tests locally.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Aug 2, 2012

    Tim Golden added the comment:
    I'm tentative about using ctypes this just because I don't believe we use it anywhere else.

    ctypes is already used later in test_support so I figured it was fine
    to use for this as well.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Aug 2, 2012

    I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.

    It is possible to do the same thing with just os.listdir. The use of
    the Find*File functions was chosen to eliminate duplicating the wait
    function for the two different cases. Also, it is just less resource
    and time intensive to use the Find*File API directly (no need to build
    an entire list when just testing if any entries exist).

    @pitrou
    Copy link
    Member

    pitrou commented Aug 2, 2012

    > I wonder why it couldn't use os.listdir to find out whether the directory is empty, and os.stat to find out whether a specific file or directory still exists.

    It is possible to do the same thing with just os.listdir. The use of
    the Find*File functions was chosen to eliminate duplicating the wait
    function for the two different cases. Also, it is just less resource
    and time intensive to use the Find*File API directly (no need to build
    an entire list when just testing if any entries exist).

    But it's certainly much easier to maintain using regular Python APIs. We
    would use ctypes if the functionality wasn't accessible from pure
    Python, but since it is, we don't want to reinvent the wheel.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Aug 2, 2012

    OK, here is another patch that uses just os.listdir

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 2, 2012

    I fail to see the point of not using os.stat. IIUC, Windows will delete the file once the last handle is been closed. Since stat will close any handle it temporarily gets, it will not prolong the live of the file; the file will still go away when the last process has closed it.

    Performance is not an issue at all here, since we are waiting for the deletion of the file anyway. So checking whether the file is in the directory listing is fine with me as well. Unless someone can demonstrate how os.stat can prevent removal of the file, I'd like to see the comment corrected, though.

    @jkloth
    Copy link
    Contributor Author

    jkloth commented Aug 4, 2012

    I've updated the comment in the patch to reflect Martin's concern.

    Martin is partially correct in that the handle opened in the stat() call will not prolong the pending status. It is due to the fact that it does not open the handle with any sharing mode set, thus effectively blocking any other process from grabbing another handle to the file while the stat function has its handle open.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Aug 4, 2012

    With the latest changes, is there anything left preventing the
    inclusion of this patch?

    Without some change, the Win64 buildbot is relatively irrelevant as it
    is nearly always in a state of failure due to these errors.

    @briancurtin
    Copy link
    Member

    Without some change, the Win64 buildbot is relatively irrelevant as it
    is nearly always in a state of failure due to these errors.

    Not that some change isn't necessary, but what else are you running on your build slave? I ran a Windows 2008 R2 x64 slave for some time and it never had issues around file/directory removal. I only had to decommission it because the physical machine became unreliable.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Aug 4, 2012

    Not that some change isn't necessary, but what else are you running on your build slave? I ran a Windows 2008 R2 x64 slave for some time and it never had issues around file/directory removal. I only had to decommission it because the physical machine became unreliable.

    The errors only started happening after upgrading the HD from a PATA
    Ultra133 to an SATA3 SSD. The super-fast HD is allowing for these
    timing errors to show through.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2012

    Brian, Tim, do you think this should be committed?

    @briancurtin
    Copy link
    Member

    The latest patch to test.support looks reasonable. Go for it.

    @tjguk
    Copy link
    Member

    tjguk commented Aug 13, 2012

    Fine with me

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 13, 2012

    New changeset fcad4566910b by Brian Curtin in branch '3.2':
    Fix bpo-15496. Add directory removal helpers to make Windows tests more reliable. Patch by Jeremy Kloth
    http://hg.python.org/cpython/rev/fcad4566910b

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 13, 2012

    New changeset c863dadc65eb by Brian Curtin in branch '2.7':
    Fix bpo-15496. Add directory removal helpers to make Windows tests more reliable. Patch by Jeremy Kloth
    http://hg.python.org/cpython/rev/c863dadc65eb

    @briancurtin briancurtin self-assigned this Aug 13, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows tests Tests in the Lib/test dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants