Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_compileall fails randomly on Windows when tests are run in parallel #81568

Closed
vstinner opened this issue Jun 24, 2019 · 18 comments
Closed
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

BPO 37387
Nosy @tim-one, @pfmoore, @vstinner, @tjguk, @jkloth, @zware, @eryksun, @zooba, @gvanrossum, @pablogsal, @shreyanavigyan
Superseder
  • bpo-47089: Avoid sporadic failure of test_compileall on Windows
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-04-02.21:56:40.718>
    created_at = <Date 2019-06-24.13:40:26.039>
    labels = ['3.10', 'tests', '3.9', 'OS-windows', '3.11']
    title = 'test_compileall fails randomly on Windows when tests are run in parallel'
    updated_at = <Date 2022-04-02.21:56:40.717>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2022-04-02.21:56:40.717>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-04-02.21:56:40.718>
    closer = 'vstinner'
    components = ['Tests', 'Windows']
    creation = <Date 2019-06-24.13:40:26.039>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37387
    keywords = []
    message_count = 18.0
    messages = ['346402', '359474', '392227', '392310', '392315', '392323', '392335', '392336', '392337', '392344', '392346', '392352', '392385', '392425', '392444', '392498', '416562', '416596']
    nosy_count = 11.0
    nosy_names = ['tim.peters', 'paul.moore', 'vstinner', 'tim.golden', 'jkloth', 'zach.ware', 'eryksun', 'steve.dower', 'Guido.van.Rossum', 'pablogsal', 'shreyanavigyan']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '47089'
    type = None
    url = 'https://bugs.python.org/issue37387'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @vstinner
    Copy link
    Member Author

    I'm not sure that it's a good practice to rewrite .pyc files of the stdlib while tests are run in parallel.

    Maybe the test should copy all .py files of the stdlib into a temporary directory and work there, to not impact other tests run in parallel.

    Example on AMD64 Windows7 SP1 3.8:
    https://buildbot.python.org/all/#/builders/208/builds/105

    ======================================================================
    FAIL: test_no_args_respects_force_flag (test.test_compileall.CommmandLineTestsWithSourceEpoch)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_py_compile.py", line 30, in wrapper
        return fxn(*args, **kwargs)
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_py_compile.py", line 20, in wrapper
        return fxn(*args, **kwargs)
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_compileall.py", line 322, in test_no_args_respects_force_flag
        self.assertRunOK('-f', PYTHONPATH=self.directory)
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_compileall.py", line 271, in assertRunOK
        rc, out, err = script_helper.assert_python_ok(
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\support\script_helper.py", line 157, in assert_python_ok
        return _assert_python(True, *args, **env_vars)
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\support\script_helper.py", line 143, in _assert_python
        res.fail(cmd_line)
      File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\support\script_helper.py", line 70, in fail
        raise AssertionError("Process return code is %d\n"
    AssertionError: Process return code is 1
    command line: ['C:\\buildbot.python.org\\3.8.kloth-win64\\build\\PCbuild\\amd64\\python_d.exe', '-X', 'faulthandler', '-S', '-m', 'compileall', '-f']

    stdout:
    ---
    (... truncated stdout ...)ling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\ipaddress.py'...
    Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\keyword.py'...
    Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\linecache.py'...
    Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\locale.py'...
    (...)
    *** PermissionError: [WinError 5] Access is denied: 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\pycache\\site.cpython-38.pyc.25436544' -> 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\pycache\\site.cpython-38.pyc'
    (...)
    Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\zipapp.py'...
    Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\zipfile.py'...
    Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\zipimport.py'...
    ---

    stderr:
    ---

    ---

    ----------------------------------------------------------------------

    Ran 81 tests in 25.827s

    FAILED (failures=1, skipped=2)
    test test_compileall failed

    @vstinner vstinner added 3.9 only security fixes tests Tests in the Lib/test dir labels Jun 24, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 6, 2020

    x86 Windows7 3.8:
    https://buildbot.python.org/all/#/builders/223/builds/59

    FAIL: test_no_args_respects_force_flag (test.test_compileall.CommmandLineTestsNoSourceEpoch)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_py_compile.py", line 20, in wrapper
        return fxn(*args, **kwargs)
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_py_compile.py", line 20, in wrapper
        return fxn(*args, **kwargs)
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_compileall.py", line 322, in test_no_args_respects_force_flag
        self.assertRunOK('-f', PYTHONPATH=self.directory)
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_compileall.py", line 271, in assertRunOK
        rc, out, err = script_helper.assert_python_ok(
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\support\script_helper.py", line 157, in assert_python_ok
        return _assert_python(True, *args, **env_vars)
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\support\script_helper.py", line 143, in _assert_python
        res.fail(cmd_line)
      File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\support\script_helper.py", line 70, in fail
        raise AssertionError("Process return code is %d\n"
    AssertionError: Process return code is 1
    command line: ['D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\PCbuild\\win32\\python_d.exe', '-X', 'faulthandler', '-S', '-m', 'compileall', '-f']

    stdout:
    ---
    (...)

    Compiling 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\socketserver.py'...

    Compiling 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\sre_compile.py'...

    *** PermissionError: [WinError 5] Access is denied: 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\pycache\\sre_compile.cpython-38.pyc.12753880' -> 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\pycache\\sre_compile.cpython-38.pyc'

    @gvanrossum
    Copy link
    Member

    Well, in the normal course of running multiple Python programs on the same machine, pyc files may be rewritten, and this should not cause crashes. So perhaps there’s a real bug here?

    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 29, 2021

    I've experienced that when a module is imported in the interpreter that module file (.py, .pyc, .pyd) cannot be deleted, renamed, rewritten, etc. When that session is closed only then the file can be deleted, renamed, rewritten, all that stuff. Running tests in parallel means running 2 or more tests at the same time right? Therefore may be when test_compileall tries to rewrite the .pyc it results in a error because the file may be open as a module in another test that is also being ran at the same time. This should also clarify why test_compileall fails randomly and not all the time.

    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 29, 2021

    I don't think it's a bug at all. It's a feature of Python that's causing this. And since it's not a bug I don't think copying the files in temporary directory would help at all because Windows is not giving the error. Python is giving the error to Python itself!

    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 29, 2021

    @vstinner What do you think? I'm not sure if there's a solution to this.

    @tim-one
    Copy link
    Member

    tim-one commented Apr 29, 2021

    @Sheyvan, whether it's possible to delete (rename, etc) an open file is a property not of Python, but of the operating system. Windows doesn't allow it; Linux (for example) does.

    It's generally considered to be "a bug" in CPython's implementation whenever it contains code that _assumes_ such things are safe (which is, alas, very easy for a Linux programmer to do by accident).

    So that test_compileall fails in this way is inarguably "a bug". Guido's larger point is harder to address, though. In theory, we control everything our test suite does, but little of what arbitrary users do.

    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 29, 2021

    What should we do? Should we skip test_compileall if the test suite is run in parallel?

    @tim-one
    Copy link
    Member

    tim-one commented Apr 29, 2021

    A "good" solution would be one that runs the test in such a way that it doesn't fail only on Windows ;-)

    There are presumably many ways that could be accomplished, including ugly ones. For example, if test_compileall is in the collection of tests to be run, and it's a parallel run, run it first by itself, before starting anything else in parallel.

    No, I'm not suggesting we do that. Just trying to get across that there are _always_ ways to worm around the bug du jour.

    I don't know enough about when and why CPython decides to replace .pyc files now to make another suggestion worth anyone else's time to evaluate. Victor already has something else in mind, though.

    @eryksun
    Copy link
    Contributor

    eryksun commented Apr 29, 2021

    I don't know enough about when and why CPython decides to
    replace .pyc files

    It's straight-forward in the case of py_compile.compile():

        >>> pyc = py_compile.compile('test.py')
        >>> f = open(pyc)
        >>> try: py_compile.compile('test.py')
        ... except OSError as e: print(e)
        ...
        [WinError 5] Access is denied: '__pycache__\\test.cpython-310.pyc.1914201873840' -> '__pycache__\\test.cpython-310.pyc'

    Rewriting the file uses importlib._bootstrap_external._write_atomic(), which writes the PYC to a temporary name and then tries to replace the existing file via os.replace().

    Replacing an existing filename will fail with access denied in Windows if the target filename exists and is already open, mapped as image/data, readonly, or a directory. Notably it's not a sharing violation in the case of an open file, which means it doesn't matter whether the open(s) share delete access or not. In Windows 10 the NT API has a new file rename operation that supports a flag to replace an open file, but in this case we're still stuck with the problem that existing opens have to share delete access. Most opens could share delete access without a problem, but most don't.

    @eryksun eryksun added OS-windows 3.10 only security fixes 3.11 only security fixes labels Apr 29, 2021
    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 29, 2021

    What is the name of the new function introduced in Windows 10 NT API? Should we use it if the version is detected as Windows 10 (through a callback to a platform function or through GetVersion API)?

    @eryksun
    Copy link
    Contributor

    eryksun commented Apr 29, 2021

    What is the name of the new function introduced in Windows 10 NT API?
    Should we use it if the version is detected as Windows 10 (through a
    callback to a platform function or through GetVersion API)?

    No, _bootstrap_external._write_atomic() or py_compile.compile() could implement the typical retry with exponential backoff to address a race condition. Otherwise I don't see what could be done currently in the main code. It would have to be addressed in the test code.

    Regarding the new support for POSIX semantics, the base NT API isn't organized as a separate function for every operation or query, so it's not a new function name per se. Each kernel object type (e.g. File) has many information classes and a few query/set functions such as NtQueryInformationFile() and NtSetInformationFile(). Windows 10 added file information classes that support the ability to replace/unlink an open file, including FileLinkInformationEx (link), FileRenameInformationEx (relink), and FileDispositionInformationEx (unlink). WinAPI DeleteFileW() and RemoveDirectoryW() were updated to use the latter. I expect that CreateHardlinkW() and MoveFileExW() will be updated as well. It's only POSIX-'ish' semantics because the shared read-write-delete model is deeply ingrained in the NT file API. Opens have to opt in by sharing delete access, and most don't.

    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 30, 2021

    Though I've never been able to reproduce this test failure. Maybe it's fixed (maybe it was a side effect of another test)?

    @vstinner
    Copy link
    Member Author

    I proposed a way to fix the issue:

    "Maybe the test should copy all .py files of the stdlib into a temporary directory and work there, to not impact other tests run in parallel."

    Shreyan Avigyan:

    Though I've never been able to reproduce this test failure. Maybe it's fixed (maybe it was a side effect of another test)?

    It's a race condition. It's hard to reproduce in a reliable way, but I'm sure that it has not been fixed.

    The bug happens when one process tries to recreate the .pyc, whereas another process has the old .pyc file open. You can maybe make the issue more likely by adding a sleep in the code reading pyc content.

    For example, in the FileLoader.get_data() method of importlib._bootstrap_external. Hacking importlib is non trivial, since the Python code is compiled as a frozen module.

    @shreyanavigyan
    Copy link
    Mannequin

    shreyanavigyan mannequin commented Apr 30, 2021

    Ok

    @tim-one
    Copy link
    Member

    tim-one commented Apr 30, 2021

    Yes, test_compileall can still fail for this reason on Windows. From a run just now with -j0 (same as -j10 on this box, which has 8 logical cores: a -j value <= 0 is treated the same as "2 + number of logical cores"):

    """
    Compiling 'C:\\Code\\Python\\lib\\types.py'...

    *** PermissionError: [WinError 5] Access is denied: 'C:\\Code\\Python\\lib\\pycache\\types.cpython-310.pyc.2205988433776' -> 'C:\\Code\\Python\\lib\\pycache\\types.cpython-310.pyc'
    """

    I did nothing in particular to try to provoke it. "It's random." So is the specific .py fail it fails on.

    @jkloth
    Copy link
    Contributor

    jkloth commented Apr 2, 2022

    bpo-47089 is a duplicate of this issue and is fixed. This issue should be closed as well.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 2, 2022

    I close this issue as a duplicate of bpo-47089.

    @vstinner vstinner closed this as completed Apr 2, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants