classification
Title: test_compileall fails randomly on Windows when tests are run in parallel
Type: Stage:
Components: Tests, Windows Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Guido.van.Rossum, eryksun, jkloth, pablogsal, paul.moore, shreyanavigyan, steve.dower, tim.golden, tim.peters, vstinner, zach.ware
Priority: normal Keywords:

Created on 2019-06-24 13:40 by vstinner, last changed 2021-04-30 18:48 by tim.peters.

Messages (16)
msg346402 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-24 13:40
I'm not sure that it's a good practice to rewrite .pyc files of the stdlib while tests are run in parallel.

Maybe the test should copy all .py files of the stdlib into a temporary directory and work there, to not impact other tests run in parallel.

Example on AMD64 Windows7 SP1 3.8:
https://buildbot.python.org/all/#/builders/208/builds/105

======================================================================
FAIL: test_no_args_respects_force_flag (test.test_compileall.CommmandLineTestsWithSourceEpoch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_py_compile.py", line 30, in wrapper
    return fxn(*args, **kwargs)
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_py_compile.py", line 20, in wrapper
    return fxn(*args, **kwargs)
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_compileall.py", line 322, in test_no_args_respects_force_flag
    self.assertRunOK('-f', PYTHONPATH=self.directory)
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\test_compileall.py", line 271, in assertRunOK
    rc, out, err = script_helper.assert_python_ok(
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\support\script_helper.py", line 157, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\support\script_helper.py", line 143, in _assert_python
    res.fail(cmd_line)
  File "C:\buildbot.python.org\3.8.kloth-win64\build\lib\test\support\script_helper.py", line 70, in fail
    raise AssertionError("Process return code is %d\n"
AssertionError: Process return code is 1
command line: ['C:\\buildbot.python.org\\3.8.kloth-win64\\build\\PCbuild\\amd64\\python_d.exe', '-X', 'faulthandler', '-S', '-m', 'compileall', '-f']

stdout:
---
(... truncated stdout ...)ling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\ipaddress.py'...
Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\keyword.py'...
Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\linecache.py'...
Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\locale.py'...
(...)
*** PermissionError: [WinError 5] Access is denied: 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\__pycache__\\site.cpython-38.pyc.25436544' -> 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\__pycache__\\site.cpython-38.pyc'
(...)
Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\zipapp.py'...
Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\zipfile.py'...
Compiling 'C:\\buildbot.python.org\\3.8.kloth-win64\\build\\lib\\zipimport.py'...
---

stderr:
---

---

----------------------------------------------------------------------

Ran 81 tests in 25.827s

FAILED (failures=1, skipped=2)
test test_compileall failed
msg359474 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-06 22:20
x86 Windows7 3.8:
https://buildbot.python.org/all/#/builders/223/builds/59

FAIL: test_no_args_respects_force_flag (test.test_compileall.CommmandLineTestsNoSourceEpoch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_py_compile.py", line 20, in wrapper
    return fxn(*args, **kwargs)
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_py_compile.py", line 20, in wrapper
    return fxn(*args, **kwargs)
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_compileall.py", line 322, in test_no_args_respects_force_flag
    self.assertRunOK('-f', PYTHONPATH=self.directory)
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\test_compileall.py", line 271, in assertRunOK
    rc, out, err = script_helper.assert_python_ok(
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\support\script_helper.py", line 157, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\support\script_helper.py", line 143, in _assert_python
    res.fail(cmd_line)
  File "D:\cygwin\home\db3l\buildarea\3.8.bolen-windows7\build\lib\test\support\script_helper.py", line 70, in fail
    raise AssertionError("Process return code is %d\n"
AssertionError: Process return code is 1
command line: ['D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\PCbuild\\win32\\python_d.exe', '-X', 'faulthandler', '-S', '-m', 'compileall', '-f']

stdout:
---
(...)

Compiling 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\socketserver.py'...

Compiling 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\sre_compile.py'...

*** PermissionError: [WinError 5] Access is denied: 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\__pycache__\\sre_compile.cpython-38.pyc.12753880' -> 'D:\\cygwin\\home\\db3l\\buildarea\\3.8.bolen-windows7\\build\\lib\\__pycache__\\sre_compile.cpython-38.pyc'
msg392227 - (view) Author: Guido van Rossum (Guido.van.Rossum) Date: 2021-04-28 15:44
Well, in the normal course of running multiple Python programs on the same machine, pyc files may be rewritten, and this should not cause crashes. So perhaps there’s a real bug here?
msg392310 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-29 13:54
I've experienced that when a module is imported in the interpreter that module file (.py, .pyc, .pyd) cannot be deleted, renamed, rewritten, etc.   When that session is closed only then the file can be deleted, renamed, rewritten, all that stuff. Running tests in parallel means running 2 or more tests at the same time right? Therefore may be when test_compileall tries to rewrite the .pyc it results in a error because the file may be open as a module in another test that is also being ran at the same time. This should also clarify why test_compileall fails randomly and not all the time.
msg392315 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-29 14:23
I don't think it's a bug at all. It's a feature of Python that's causing this. And since it's not a bug I don't think copying the files in temporary directory would help at all because Windows is not giving the error. Python is giving the error to Python itself!
msg392323 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-29 15:47
@vstinner What do you think? I'm not sure if there's a solution to this.
msg392335 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-04-29 17:21
@Sheyvan, whether it's possible to delete (rename, etc) an open file is a property not of Python, but of the operating system. Windows doesn't allow it; Linux (for example) does.

It's generally considered to be "a bug" in CPython's implementation whenever it contains code that _assumes_ such things are safe (which is, alas, very easy for a Linux programmer to do by accident).

So that test_compileall fails in this way is inarguably "a bug". Guido's larger point is harder to address, though. In theory, we control everything our test suite does, but little of what arbitrary users do.
msg392336 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-29 17:27
What should we do? Should we skip test_compileall if the test suite is run in parallel?
msg392337 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-04-29 17:48
A "good" solution would be one that runs the test in such a way that it doesn't fail only on Windows ;-)

There are presumably many ways that could be accomplished, including ugly ones. For example, if test_compileall is in the collection of tests to be run, and it's a parallel run, run it first by itself, before starting anything else in parallel.

No, I'm not suggesting we do that. Just trying to get across that there are _always_ ways to worm around the bug du jour.

I don't know enough about when and why CPython decides to replace .pyc files now to make another suggestion worth anyone else's time to evaluate. Victor already has something else in mind, though.
msg392344 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-04-29 19:09
> I don't know enough about when and why CPython decides to 
> replace .pyc files 

It's straight-forward in the case of py_compile.compile():

    >>> pyc = py_compile.compile('test.py')
    >>> f = open(pyc)
    >>> try: py_compile.compile('test.py')
    ... except OSError as e: print(e)
    ...
    [WinError 5] Access is denied: '__pycache__\\test.cpython-310.pyc.1914201873840' -> '__pycache__\\test.cpython-310.pyc'

Rewriting the file uses `importlib._bootstrap_external._write_atomic()`, which writes the PYC to a temporary name and then tries to replace the existing file via os.replace(). 

Replacing an existing filename will fail with access denied in Windows if the target filename exists and is already open, mapped as image/data, readonly, or a directory. Notably it's not a sharing violation in the case of an open file, which means it doesn't matter whether the open(s) share delete access or not. In Windows 10 the NT API has a new file rename operation that supports a flag to replace an open file, but in this case we're still stuck with the problem that existing opens have to share delete access. Most opens could share delete access without a problem, but most don't.
msg392346 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-29 19:40
What is the name of the new function introduced in Windows 10 NT API? Should we use it if the version is detected as Windows 10 (through a callback to a platform function or through GetVersion API)?
msg392352 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-04-29 20:49
> What is the name of the new function introduced in Windows 10 NT API? 
> Should we use it if the version is detected as Windows 10 (through a
> callback to a platform function or through GetVersion API)?

No, _bootstrap_external._write_atomic() or py_compile.compile() could implement the typical retry with exponential backoff to address a race condition. Otherwise I don't see what could be done currently in the main code. It would have to be addressed in the test code.

Regarding the new support for POSIX semantics, the base NT API isn't organized as a separate function for every operation or query, so it's not a new function name per se. Each kernel object type (e.g. File) has many information classes and a few query/set functions such as NtQueryInformationFile() and NtSetInformationFile(). Windows 10 added file information classes that support the ability to replace/unlink an open file, including FileLinkInformationEx (link), FileRenameInformationEx (relink), and FileDispositionInformationEx (unlink). WinAPI DeleteFileW() and RemoveDirectoryW() were updated to use the latter. I expect that CreateHardlinkW() and MoveFileExW() will be updated as well. It's only POSIX-'ish' semantics because the shared read-write-delete model is deeply ingrained in the NT file API. Opens have to opt in by sharing delete access, and most don't.
msg392385 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-30 06:19
Though I've never been able to reproduce this test failure. Maybe it's fixed (maybe it was a side effect of another test)?
msg392425 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-30 12:03
I proposed a way to fix the issue:

"Maybe the test should copy all .py files of the stdlib into a temporary directory and work there, to not impact other tests run in parallel."


Shreyan Avigyan:
> Though I've never been able to reproduce this test failure. Maybe it's fixed (maybe it was a side effect of another test)?

It's a race condition. It's hard to reproduce in a reliable way, but I'm sure that it has not been fixed.

The bug happens when one process tries to recreate the .pyc, whereas another process has the old .pyc file open. You can maybe make the issue more likely by adding a sleep in the code reading pyc content.

For example, in the FileLoader.get_data() method of importlib._bootstrap_external. Hacking importlib is non trivial, since the Python code is compiled as a frozen module.
msg392444 - (view) Author: Shreyan Avigyan (shreyanavigyan) * Date: 2021-04-30 14:30
Ok
msg392498 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-04-30 18:48
Yes, test_compileall can still fail for this reason on Windows. From a run just now with -j0 (same as -j10 on this box, which has 8 logical cores: a -j value <= 0 is treated the same as "2 + number of logical cores"):

"""
Compiling 'C:\\Code\\Python\\lib\\types.py'...

*** PermissionError: [WinError 5] Access is denied: 'C:\\Code\\Python\\lib\\__pycache__\\types.cpython-310.pyc.2205988433776' -> 'C:\\Code\\Python\\lib\\__pycache__\\types.cpython-310.pyc'
"""

I did nothing in particular to try to provoke it. "It's random." So is the specific .py fail it fails on.
History
Date User Action Args
2021-04-30 18:48:41tim.peterssetmessages: + msg392498
2021-04-30 14:30:30shreyanavigyansetmessages: + msg392444
2021-04-30 12:03:50vstinnersetmessages: + msg392425
2021-04-30 06:19:59shreyanavigyansetmessages: + msg392385
2021-04-29 20:49:09eryksunsetmessages: + msg392352
2021-04-29 19:40:18shreyanavigyansetmessages: + msg392346
2021-04-29 19:09:21eryksunsetversions: + Python 3.10, Python 3.11
nosy: + paul.moore, tim.golden, eryksun, zach.ware, steve.dower

messages: + msg392344

components: + Windows
2021-04-29 17:48:05tim.peterssetmessages: + msg392337
2021-04-29 17:27:04shreyanavigyansetmessages: + msg392336
2021-04-29 17:21:16tim.peterssetnosy: + tim.peters
messages: + msg392335
2021-04-29 15:47:33shreyanavigyansetmessages: + msg392323
2021-04-29 14:23:59shreyanavigyansetmessages: + msg392315
2021-04-29 13:54:09shreyanavigyansetnosy: + shreyanavigyan
messages: + msg392310
2021-04-28 15:44:38Guido.van.Rossumsetnosy: + Guido.van.Rossum
messages: + msg392227
2020-01-06 22:20:27vstinnersetmessages: + msg359474
2019-06-24 16:14:43jklothsetnosy: + jkloth
2019-06-24 13:40:26vstinnercreate