Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 threading.Event().wait time is twice as large as Python27 #85471

Closed
SD mannequin opened this issue Jul 14, 2020 · 29 comments
Closed

Python3 threading.Event().wait time is twice as large as Python27 #85471

SD mannequin opened this issue Jul 14, 2020 · 29 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@SD
Copy link
Mannequin

SD mannequin commented Jul 14, 2020

BPO 41299
Nosy @pfmoore, @pitrou, @vstinner, @tjguk, @ned-deily, @zware, @eryksun, @zooba, @pablogsal, @miss-islington, @sweeneyde, @lunixbochs
PRs
  • bpo-41299: fix 16ms jitter in Windows threading timeouts #26568
  • [3.9] bpo-41299: Reduce lag in Windows threading timeouts by using a higher precision time source (GH-26568) #26579
  • [3.10] bpo-41299: Reduce lag in Windows threading timeouts by using a higher precision time source (GH-26568) #26580
  • bpo-41299: Fix EnterNonRecursiveMutex() #28548
  • bpo-41299: QueryPerformanceFrequency() cannot fail #28552
  • bpo-41299: Mark private thread_nt.h functions as static #28553
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-24.23:17:07.296>
    created_at = <Date 2020-07-14.22:15:57.850>
    labels = ['3.9', '3.10', 'performance', '3.11', 'library', 'OS-windows']
    title = 'Python3 threading.Event().wait time is twice as large as Python27'
    updated_at = <Date 2021-09-24.23:17:07.295>
    user = 'https://bugs.python.org/SD'

    bugs.python.org fields:

    activity = <Date 2021-09-24.23:17:07.295>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-24.23:17:07.296>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2020-07-14.22:15:57.850>
    creator = 'SD'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41299
    keywords = ['patch']
    message_count = 29.0
    messages = ['373660', '373675', '373680', '373694', '373695', '373707', '373736', '373737', '374070', '387880', '395220', '395237', '395283', '395284', '395665', '395777', '395780', '395781', '395788', '395789', '395835', '395848', '396194', '402564', '402565', '402592', '402593', '402595', '402596']
    nosy_count = 13.0
    nosy_names = ['paul.moore', 'pitrou', 'vstinner', 'tim.golden', 'ned.deily', 'zach.ware', 'eryksun', 'steve.dower', 'pablogsal', 'miss-islington', 'Dennis Sweeney', 'SD', 'lunixbochs2']
    pr_nums = ['26568', '26579', '26580', '28548', '28552', '28553']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue41299'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @SD
    Copy link
    Mannequin Author

    SD mannequin commented Jul 14, 2020

    The overhead in Python 3 for threading.Event().wait() is much larger than Python 2. I am trying to run a script at 60hz which worked correctly in Python2 but does not in Python 3. Here is a minimal example to demonstrate:

    #!/usr/bin/env python

    import threading
    import time
    
    def sample_thread(stop_ev):
        while not stop_ev.is_set():
            t2 = time.time() 
            stop_ev.wait(0.016999959945)
            print((time.time() - t2))
    
    def main():
        stop_ev = threading.Event()
        sample_t = threading.Thread(target=sample_thread, args=(stop_ev, ))
        sample_t.start()
    
        # Other stuff here, sleep is just dummy
        time.sleep(14)
    
        stop_ev.set()
    
        print('End reached.')
    
    if __name__ == '__main__':
        main()

    Python 2.7.0 consistently prints :

    0.0169999599457
    0.0169999599457
    0.0170001983643
    0.0169999599457
    0.0169999599457
    0.0169999599457
    0.0169999599457
    0.0169999599457

    Python 3.8.2 waits much longer

    0.031026363372802734
    0.0320279598236084
    0.031026363372802734
    0.031026840209960938
    0.031527042388916016
    0.031026601791381836
    0.03103041648864746
    0.03302431106567383

    @SD SD mannequin added 3.8 only security fixes performance Performance or resource usage labels Jul 14, 2020
    @ned-deily
    Copy link
    Member

    Please provide the results of running the following with the two pythons you are using:

    python -m test.pythoninfo

    For what it's worth, running on macOS 10.15.5, I see a slight difference with more variability in the Python 3 results but nothing as pronounced as what you see. Also the Python 3 results were very similar across recent versions of 3.7.x, 3.8.x, and pre-release 3.9.

    $ /usr/local/bin/python2.7 ~/test_41299.py | sort
    0.017021894455
    0.0170249938965
    0.017028093338
    0.017028093338
    0.0170290470123
    0.0170290470123
    0.0170290470123
    0.0170300006866
    0.017030954361
    0.0170311927795
    [...]
    0.0190448760986
    0.0190479755402
    0.0190498828888
    0.0190510749817
    0.0190589427948
    0.019061088562
    0.0190629959106
    0.019079208374
    0.0190861225128
    0.0190970897675
    End reached.0.0158078670502
    
    $ /usr/local/bin/python3.8 ~/test_41299.py | sort
    0.009722232818603516
    0.017028093338012695
    0.017033100128173828
    0.017034053802490234
    0.017034053802490234
    0.017035961151123047
    0.017037153244018555
    0.017037153244018555
    0.01703786849975586
    0.017038822174072266
    [...]
    0.021351099014282227
    0.021355152130126953
    0.021355152130126953
    0.02135610580444336
    0.021361112594604492
    0.021363258361816406
    0.02136397361755371
    0.021379947662353516
    0.02138543128967285
    0.021442890167236328
    End reached.

    @sweeneyde
    Copy link
    Member

    I reproduced something similar on Python 3.9.0b1, Windows 64-bit version:

    py -m pyperf timeit -s "import threading; E = threading.Event()" "E.wait(<NUMBER>)"
    

    NUMBER Mean +- std dev
    -------------------------------------------
    0.0 5.79 us +- 0.13 us
    0.000000000001 15.6 ms +- 0.1 ms
    0.001 15.6 ms +- 0.1 ms
    0.01 15.6 ms +- 0.6 ms
    0.013 15.5 ms +- 0.6 ms
    0.015 15.9 ms +- 0.9 ms
    0.016 25.2 ms +- 0.5 ms
    0.017 31.2 ms +- 0.2 ms
    0.018 31.2 ms +- 0.4 ms
    0.025 31.2 ms +- 1.0 ms
    0.05 62.2 ms +- 0.8 ms
    0.1 109 ms +- 2 ms
    0.2 201 ms +- 3 ms
    0.5 500 ms +- 0 ms
    1.0 1.00 sec +- 0.00 sec

    On the smaller scale, it looks quantized to multiples of ~15ms (?), but then it gets more accurate as the times get larger. I don't think it's a measurement error since the first measurement manages microseconds.

    Perhaps this is just an OS-level thread-scheduling issue?

    @SD
    Copy link
    Mannequin Author

    SD mannequin commented Jul 15, 2020

    I have tested this both on vs code terminal and cygwin for windows.

    3.8.2 :

    python -m test.pythoninfo
    Python debug information
    ========================

    Py_DEBUG: No (sys.gettotalrefcount() missing)
    _decimal.__libmpdec_version__: 2.4.2
    builtins.float.double_format: IEEE, little-endian
    builtins.float.float_format: IEEE, little-endian
    datetime.datetime.now: 2020-07-15 09:09:34.936153
    expat.EXPAT_VERSION: expat_2.2.8
    locale.encoding: cp1252
    os.cpu_count: 12
    os.environ[APPDATA]: C:\Users\XXXX\AppData\Roaming
    os.environ[COMPUTERNAME]: XXXX
    os.environ[COMSPEC]: C:\Windows\system32\cmd.exe
    os.environ[HOMEDRIVE]: C:
    os.environ[HOMEPATH]: \Users\XXXX
    os.environ[LANG]: en_GB.UTF-8
    os.environ[PATHEXT]: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.PY;.PYW;.CPL
    os.environ[PATH]: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\libnvvp;C:\Program Files\SlikSvn\bin;C:\Python\Python38\Scripts\;C:\Python\Python38\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\dotnet\;C:\Program Files\TortoiseSVN\bin;C:\Program Files\CMake\bin;C:\Program Files (x86)\IncrediBuild;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Users\XXXX\AppData\Local\Microsoft\WindowsApps;;C:\Users\XXXX\AppData\Local\Programs\Microsoft VS Code\bin
    os.environ[SYSTEMROOT]: C:\Windows
    os.environ[TEMP]: C:\Users\XXXX\AppData\Local\Temp
    os.environ[TMP]: C:\Users\XXXX\AppData\Local\Temp
    os.environ[USERPROFILE]: C:\Users\XXXX
    os.environ[WINDIR]: C:\Windows
    os.getcwd:
    os.login: XXXX
    os.name: nt
    os.supports_bytes_environ: False
    os.supports_effective_ids: []
    os.supports_fd: ['stat', 'truncate']
    os.supports_follow_symlinks: ['stat']
    os.umask: 000
    platform.architecture: 64bit WindowsPE
    platform.platform: Windows-10-10.0.18362-SP0
    platform.python_implementation: CPython
    pymem.allocator: pymalloc
    pymem.with_pymalloc: True
    socket.hostname:
    sqlite3.sqlite_version: 3.28.0
    sqlite3.version: 2.6.0
    ssl.HAS_SNI: True
    ssl.OPENSSL_VERSION: OpenSSL 1.1.1d 10 Sep 2019
    ssl.OPENSSL_VERSION_INFO: (1, 1, 1, 4, 15)
    ssl.OP_ALL: -0x7fffffac
    ssl.OP_NO_TLSv1_1: 0x10000000
    ssl.SSLContext.maximum_version: TLSVersion.MAXIMUM_SUPPORTED
    ssl.SSLContext.minimum_version: TLSVersion.MINIMUM_SUPPORTED
    ssl.SSLContext.options: Options.OP_NO_SSLv3|OP_CIPHER_SERVER_PREFERENCE|OP_ENABLE_MIDDLEBOX_COMPAT|OP_NO_COMPRESSION|OP_ALL
    ssl.SSLContext.protocol: _SSLMethod.PROTOCOL_TLS
    ssl.SSLContext.verify_mode: VerifyMode.CERT_NONE
    ssl.default_https_context.maximum_version: TLSVersion.MAXIMUM_SUPPORTED
    ssl.default_https_context.minimum_version: TLSVersion.MINIMUM_SUPPORTED
    ssl.default_https_context.options: Options.OP_NO_SSLv3|OP_CIPHER_SERVER_PREFERENCE|OP_ENABLE_MIDDLEBOX_COMPAT|OP_NO_COMPRESSION|OP_ALL
    ssl.default_https_context.protocol: _SSLMethod.PROTOCOL_TLS
    ssl.default_https_context.verify_mode: VerifyMode.CERT_REQUIRED
    ssl.stdlib_context.maximum_version: TLSVersion.MAXIMUM_SUPPORTED
    ssl.stdlib_context.minimum_version: TLSVersion.MINIMUM_SUPPORTED
    ssl.stdlib_context.options: Options.OP_NO_SSLv3|OP_CIPHER_SERVER_PREFERENCE|OP_ENABLE_MIDDLEBOX_COMPAT|OP_NO_COMPRESSION|OP_ALL
    ssl.stdlib_context.protocol: _SSLMethod.PROTOCOL_TLS
    ssl.stdlib_context.verify_mode: VerifyMode.CERT_NONE
    subprocess._USE_POSIX_SPAWN: False
    sys.api_version: 1013
    sys.builtin_module_names: ('_abc', '_ast', '_bisect', '_blake2', '_codecs', '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp', '_codecs_kr', '_codecs_tw', '_collections', '_contextvars', '_csv', '_datetime', '_functools', '_heapq', '_imp', '_io', '_json', '_locale', '_lsprof', '_md5', '_multibytecodec', '_opcode', '_operator', '_pickle', '_random', '_sha1', '_sha256', '_sha3', '_sha512', '_signal', '_sre', '_stat', '_statistics', '_string', '_struct', '_symtable', '_thread', '_tracemalloc', '_warnings', '_weakref', '_winapi', '_xxsubinterpreters', 'array', 'atexit', 'audioop', 'binascii', 'builtins', 'cmath', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal', 'math', 'mmap', 'msvcrt', 'nt', 'parser', 'sys', 'time', 'winreg', 'xxsubtype', 'zlib')
    sys.byteorder: little
    sys.dont_write_bytecode: False
    sys.executable: C:\Python\Python38\python.exe
    sys.filesystem_encoding: utf-8/surrogatepass
    sys.flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0, dev_mode=False, utf8_mode=0)
    sys.float_info: sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
    sys.float_repr_style: short
    sys.hash_info: sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
    sys.hexversion: 50856688
    sys.implementation: namespace(cache_tag='cpython-38', hexversion=50856688, name='cpython', version=sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0))
    sys.int_info: sys.int_info(bits_per_digit=30, sizeof_digit=4)
    sys.maxsize: 9223372036854775807
    sys.maxunicode: 1114111
    sys.path: ['C:\\Python\\Python38\\python38.zip', 'C:\\Python\\Python38\\DLLs', 'C:\\Python\\Python38\\lib', 'C:\\Python\\Python38', 'C:\\Users\\XXXX\\AppData\\Roaming\\Python\\Python38\\site-packages', 'C:\\Python\\Python38\\lib\\site-packages']
    sys.platform: win32
    sys.prefix: C:\Python\Python38
    sys.stderr.encoding: utf-8/backslashreplace
    sys.stdin.encoding: utf-8/surrogateescape
    sys.stdout.encoding: utf-8/surrogateescape
    sys.thread_info: sys.thread_info(name='nt', lock=None, version=None)
    sys.version: 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)]
    sys.version_info: sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0)
    sys.windowsversion: sys.getwindowsversion(major=10, minor=0, build=18363, platform=2, service_pack='')
    sys.winver: 3.8
    sysconfig[prefix]: C:\Python\Python38
    test_socket.HAVE_SOCKET_ALG: False
    test_socket.HAVE_SOCKET_CAN: False
    test_socket.HAVE_SOCKET_CAN_ISOTP: False
    test_socket.HAVE_SOCKET_QIPCRTR: False
    test_socket.HAVE_SOCKET_RDS: False
    test_socket.HAVE_SOCKET_VSOCK: False
    test_support.IPV6_ENABLED: True
    test_support._is_gui_available: True
    test_support.python_is_optimized: False
    time.altzone: 14400
    time.daylight: 1
    time.get_clock_info(monotonic): namespace(adjustable=False, implementation='GetTickCount64()', monotonic=True, resolution=0.015625)
    time.get_clock_info(perf_counter): namespace(adjustable=False, implementation='QueryPerformanceCounter()', monotonic=True, resolution=1e-07)
    time.get_clock_info(process_time): namespace(adjustable=False, implementation='GetProcessTimes()', monotonic=True, resolution=1e-07)
    time.get_clock_info(thread_time): namespace(adjustable=False, implementation='GetThreadTimes()', monotonic=True, resolution=1e-07)
    time.get_clock_info(time): namespace(adjustable=True, implementation='GetSystemTimeAsFileTime()', monotonic=False, resolution=0.015625)
    time.time: 1594818575.0377405
    time.timezone: 18000
    time.tzname: ('Eastern Standard Time', 'Eastern Summer Time')
    tkinter.TCL_VERSION: 8.6
    tkinter.TK_VERSION: 8.6
    tkinter.info_patchlevel: 8.6.9
    windows.RtlAreLongPathsEnabled: True
    windows.dll_path: C:\Python\Python38\python38.dll
    zlib.ZLIB_RUNTIME_VERSION: 1.2.11
    zlib.ZLIB_VERSION: 1.2.11

    2.7.18 :

    python -m test.pythoninfo
    Python debug information
    ========================

    Py_DEBUG: No (sys.gettotalrefcount() missing)
    builtins.float.double_format: IEEE, little-endian
    builtins.float.float_format: IEEE, little-endian
    datetime.datetime.now: 2020-07-15 09:28:00.404000
    expat.EXPAT_VERSION: expat_2.2.8
    locale.encoding: cp1252
    os.cwd: C:\Python\Python27
    os.environ[APPDATA]: C:\Users\XXXX\AppData\Roaming
    os.environ[COMPUTERNAME]:
    os.environ[COMSPEC]: C:\Windows\system32\cmd.exe
    os.environ[HOMEDRIVE]: C:
    os.environ[HOMEPATH]: \Users\XXXX
    os.environ[PATHEXT]: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.PY;.PYW
    os.environ[PATH]: C:\Program Files\ConEmu\ConEmu\Scripts;C:\Program Files\ConEmu;C:\Program Files\ConEmu\ConEmu;C:\Python\Python384\Scripts\;C:\Python\Python384\;C:\Users\XXXX\AppData\Local\Microsoft\WindowsApps;;C:\Users\XXXX\AppData\Local\Programs\Microsoft VS Code\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\libnvvp;C:\Program Files\SlikSvn\bin;C:\Python\Python38\Scripts\;C:\Python\Python38\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\dotnet\;C:\Program Files\TortoiseSVN\bin;C:\Program Files\CMake\bin;C:\Program Files (x86)\IncrediBuild;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR
    os.environ[SYSTEMROOT]: C:\Windows
    os.environ[TEMP]: C:\Users\XXXX\AppData\Local\Temp
    os.environ[TMP]: C:\Users\XXXX\AppData\Local\Temp
    os.environ[USERPROFILE]: C:\Users\XXXX
    os.environ[WINDIR]: C:\Windows
    os.name: nt
    os.umask: 000
    platform.architecture: 64bit WindowsPE
    platform.platform: Windows-10-10.0.18362
    platform.python_implementation: CPython
    socket.hostname:
    sqlite3.sqlite_version: 3.28.0
    sqlite3.version: 2.6.0
    ssl.HAS_SNI: True
    ssl.OPENSSL_VERSION: OpenSSL 1.0.2t 10 Sep 2019
    ssl.OPENSSL_VERSION_INFO: (1, 0, 2, 20, 15)
    ssl.OP_ALL: -0x7ffffc01
    ssl.OP_NO_TLSv1_1: 0x10000000
    ssl.SSLContext(PROTOCOL_TLS).options: OP_NO_SSLv3|OP_NO_SSLv2|OP_CIPHER_SERVER_PREFERENCE|OP_SINGLE_DH_USE|OP_SINGLE_ECDH_USE|OP_NO_COMPRESSION|OP_ALL
    ssl.SSLContext(PROTOCOL_TLS).protocol: PROTOCOL_TLS
    ssl.SSLContext(PROTOCOL_TLS).verify_mode: CERT_NONE
    ssl.default_https_context.options: OP_NO_SSLv3|OP_NO_SSLv2|OP_CIPHER_SERVER_PREFERENCE|OP_SINGLE_DH_USE|OP_SINGLE_ECDH_USE|OP_NO_COMPRESSION|OP_ALL
    ssl.default_https_context.protocol: PROTOCOL_TLS
    ssl.default_https_context.verify_mode: CERT_REQUIRED
    ssl.stdlib_context.options: OP_NO_SSLv3|OP_NO_SSLv2|OP_CIPHER_SERVER_PREFERENCE|OP_SINGLE_DH_USE|OP_SINGLE_ECDH_USE|OP_NO_COMPRESSION|OP_ALL
    ssl.stdlib_context.protocol: PROTOCOL_TLS
    ssl.stdlib_context.verify_mode: CERT_NONE
    sys.api_version: 1013
    sys.builtin_module_names: ('__builtin__', '__main__', '_ast', '_bisect', '_codecs', '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp', '_codecs_kr', '_codecs_tw', '_collections', '_csv', '_functools', '_heapq', '_hotshot', '_io', '_json', '_locale', '_lsprof', '_md5', '_multibytecodec', '_random', '_sha', '_sha256', '_sha512', '_sre', '_struct', '_subprocess', '_symtable', '_warnings', '_weakref', '_winreg', 'array', 'audioop', 'binascii', 'cPickle', 'cStringIO', 'cmath', 'datetime', 'errno', 'exceptions', 'future_builtins', 'gc', 'imageop', 'imp', 'itertools', 'marshal', 'math', 'mmap', 'msvcrt', 'nt', 'operator', 'parser', 'signal', 'strop', 'sys', 'thread', 'time', 'xxsubtype', 'zipimport', 'zlib')
    sys.byteorder: little
    sys.dont_write_bytecode: False
    sys.executable: C:\Python\Python27\python.exe
    sys.filesystem_encoding: mbcs
    sys.flags: sys.flags(debug=0, py3k_warning=0, division_warning=0, division_new=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, tabcheck=0, verbose=0, unicode=0, bytes_warning=0, hash_randomization=0)
    sys.float_info: sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
    sys.float_repr_style: short
    sys.hexversion: 34018032
    sys.maxsize: 9223372036854775807
    sys.maxunicode: 65535
    sys.path: ['', 'C:\\Windows\\SYSTEM32\\python27.zip', 'C:\\Python\\Python27\\DLLs', 'C:\\Python\\Python27\\lib', 'C:\\Python\\Python27\\lib\\plat-win', 'C:\\Python\\Python27\\lib\\lib-tk', 'C:\\Python\\Python27', 'C:\\Python\\Python27\\lib\\site-packages']
    sys.platform: win32
    sys.prefix: C:\Python\Python27
    sys.stderr.encoding: cp850
    sys.stdin.encoding: cp850
    sys.stdout.encoding: cp850
    sys.version: 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)]
    sys.version_info: sys.version_info(major=2, minor=7, micro=18, releaselevel='final', serial=0)
    sys.windowsversion: sys.getwindowsversion(major=6, minor=2, build=9200, platform=2, service_pack='')
    sys.winver: 2.7
    sysconfig[prefix]: C:\Python\Python27
    test_support.IPV6_ENABLED: True
    test_support._is_gui_available: True
    test_support.python_is_optimized: False
    time.altzone: 14400
    time.daylight: 1
    time.time: 1594819680.4
    time.timezone: 18000
    time.tzname: ('Eastern Standard Time', 'Eastern Summer Time')
    tkinter.TCL_VERSION: 8.5
    tkinter.TK_VERSION: 8.5
    zlib.ZLIB_VERSION: 1.2.11

    @ned-deily
    Copy link
    Member

    Thanks for the version info.

    Windows and/or threading experts: any ideas?

    @ned-deily ned-deily added stdlib Python modules in the Lib dir OS-windows labels Jul 15, 2020
    @zooba
    Copy link
    Member

    zooba commented Jul 15, 2020

    At a guess, it's probably the signal emulation (a.k.a. Ctrl+C support). We could save some time by checking the requested handle first without blocking, and only beginning the blocking call if it's not available.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jul 16, 2020

    On the smaller scale, it looks quantized to multiples of ~15ms (?),
    but then it gets more accurate as the times get larger. I don't
    think it's a measurement error since the first measurement manages
    microseconds.

    NT's default system timer resolution for thread dispatching is 15.625 ms. If a wait is for 17 ms, expect to see actual wait times mostly in the range 17 ms up to 31.25 ms. A few outliers may wait longer because the due time on the kernel wait is when the thread is ready to be dispatched, but other threads may preempt it.

    The system timer resolution can be increased to about 1 ms (or 500 us with the undocumented NtSetSystemTime system call). But stricter timing still requires combining a dispatcher wait for long course-grained waits with a performance-counter busy wait for short precise waits.

    ---

    But something else is also involved here. I lowered the system timer resolution to 1 ms, and this allowed time.sleep(0.017) to wait for 17-18 ms in about 95% of cases. But the wait for acquiring a _thread.lock stubbornly refused to cooperate. Ultimately it's just calling WaitForSingleObjectEx on a semaphore, so I used ctypes to make a simple alternative lock via CreateSemaphoreW, ReleaseSemaphore, and WaitForSingleObject. This simple implementation performed exactly like the time.sleep wait with regard to the system timer resolution, so the difference is in the _thread.lock wait. I traced it to the following code in EnterNonRecursiveMutex in Python/thread_nt.h:

            /* wait at least until the target */
            ULONGLONG now, target = GetTickCount64() + milliseconds;
            while (mutex->locked) {
                if (PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs, (long long)milliseconds*1000) < 0) {
                    result = WAIT_FAILED;
                    break;
                }
                now = GetTickCount64();
                if (target <= now)
                    break;
                milliseconds = (DWORD)(target-now);
            }

    GetTickCount64 is documented to be "limited to the resolution of the system timer", but what they don't say is that changing the resolution of the system timer has no effect on the minimum increment of the tick count. It still increments by 15-16 ms even if the system timer resolution is set to 1 ms.

    OTOH, the value of QueryInterruptTime 1 and QueryUnbiasedInterruptTime 2 is incremented with the timer interrupt. The biased version 1 is preferred for long waits that may straddle a system sleep or hibernate, but it's only available in Windows 10. I patched EnterNonRecursiveMutex to call QueryInterruptTime instead of GetTickCount64, and this did enable increased precision when waiting on a lock. For example (patched behavior):

        >>> lock = _thread.allocate_lock()
        >>> lock.acquire()
        True
        >>> setup = 'from __main__ import lock'
        >>> stmt = 'lock.acquire(True, 0.017)'

    15.625 ms system timer:

        >>> timeit.timeit(stmt, setup, number=1000)
        30.173713599999985

    1 ms system timer:

        >>> with set_timer_resolution(0.001):
        ...     timeit.timeit(stmt, setup, number=1000)
        ...
        17.66828049999998

    That said, increasing the timer resolution is discouraged in most cases, so we may simply document that lock waits are limited to the default system timer resolution of 15.625 ms, and increasing the system timer resolution has no effect on this limit.

    ---

    Note that this is unrelated to cancel support via Ctrl+C. Windows Python has no support for canceling a wait on a _thread.lock. It's just a single-object wait in _PyCOND_WAIT_MS, not a multiple-object wait that we set up to include the SIGINT event when called on the main thread (or a variant that I like, which queues a user APC to the main thread for SIGINT instead of using an event, and switches to using alertable waits with SleepEx, WaitForSingleObjectEx, and without needing a wait slot in WaitForMultipleObjectsEx). It's possible to implement a Ctrl+C cancel as long as the lock implementation waits on a kernel semaphore object. However, some effort has gone into developing a different implementation based on condition variables and SRW locks. I don't know whether there's a way to cancel SleepConditionVariableSRW, or whether maybe a different implementation could be used for _thread.lock instead of sharing an implementation with the GIL.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jul 16, 2020

    or 500 us with the undocumented NtSetSystemTime system call

    Umm... that should be NtSetTimerResolution. ;-)

    @SD
    Copy link
    Mannequin Author

    SD mannequin commented Jul 21, 2020

    So there is no advised way around this then. I am currently using time.sleep() to hit my 60hz.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 1, 2021

    So there is no advised way around this then.

    Changing the implementation of EnterNonRecursiveMutex() to use WinAPI QueryInterruptTime() instead of GetTickCount64() would allow increasing the wait precision by setting the system timer resolution to the minimum supported period via timeBeginPeriod() [1]. It seems reasonable to at least support doing this.

    That said, changing the system timer resolution shouldn't be encouraged since it affects the entire system and can increase power consumption. If you have results in 2.7 similar what's shown in msg373660, check clockres in a separate terminal to see the current timer interval [2].

    ---
    [1] https://docs.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
    [2] https://docs.microsoft.com/en-us/sysinternals/downloads/clockres

    @eryksun eryksun added 3.9 only security fixes 3.10 only security fixes labels Mar 1, 2021
    @lunixbochs
    Copy link
    Mannequin

    lunixbochs mannequin commented Jun 6, 2021

    I just ran into this. GetTickCount64() is a bad choice even without improving the Windows timer resolution, as every mutex wait will have 16ms of jitter. Here are some lock.acquire(timeout=0.001) times measured with time.perf_counter():

    elapsed=21.215ms
    elapsed=30.960ms
    elapsed=21.686ms
    elapsed=30.998ms
    elapsed=30.794ms

    Here's the same lock.acquire(timeout=0.001) with CPython patched to use QueryPerformanceCounter() instead of GetTickCount64(). Notice this is less overhead than even the original post's Python 2.x times.

    elapsed=9.554ms
    elapsed=14.516ms
    elapsed=13.985ms
    elapsed=13.434ms
    elapsed=13.724ms

    Here's the QueryPerformanceCounter() test in a timeBeginPeriod(1) block:

    elapsed=1.135ms
    elapsed=1.204ms
    elapsed=1.189ms
    elapsed=1.052ms
    elapsed=1.052ms

    I'd like to submit a PR to fix the underlying issue by switching to QueryPerformanceCounter() in EnterNonRecursiveMutex().

    QueryInterruptTime() is a bad candidate because it's only supported on Windows 10, and CPython still supports Windows 8. Improvements based on QueryPerformanceCounter() can be backported to at least 3.8 (3.8 dropped Windows XP support, which was the last Windows version where QueryPerformanceCounter() could fail).

    I checked and the only other use of GetTickCount64() seems to be in time.monotonic(). Honestly I would vote to change time.monotonic() to QueryPerformanceCounter() as well, as QueryPerformanceCounter() can no longer fail on any Windows newer than XP (which is no longer supported by Python), but that's probably a topic for a new BPO.

    @lunixbochs lunixbochs mannequin added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes and removed 3.9 only security fixes 3.10 only security fixes labels Jun 6, 2021
    @lunixbochs lunixbochs mannequin added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Jun 6, 2021
    @lunixbochs
    Copy link
    Mannequin

    lunixbochs mannequin commented Jun 6, 2021

    Ok, I filed a PR for this. I used pytime's interface to avoid duplicating the QueryPerformanceFrequency() code.

    I found a StackOverflow answer that says QueryPerformance functions will only fail if you pass in an unaligned pointer: https://stackoverflow.com/a/27258700

    Per that, I used Py_FatalError to catch this case, as there is probably something wrong with the compiler at that point, and the other recovery options would be likely to result in incorrect program behavior (e.g. dead lock).

    @zooba
    Copy link
    Member

    zooba commented Jun 7, 2021

    Change looks good. 3.8 is security fixes only at this stage, but it can go back to 3.9

    @zooba zooba removed 3.8 only security fixes labels Jun 7, 2021
    @zooba
    Copy link
    Member

    zooba commented Jun 7, 2021

    New changeset 449e6f0 by Ryan Hileman in branch 'main':
    bpo-41299: Reduce lag in Windows threading timeouts by using a higher precision time source (GH-26568)
    449e6f0

    @miss-islington
    Copy link
    Contributor

    New changeset 57b3ca7 by Miss Islington (bot) in branch '3.9':
    bpo-41299: Reduce lag in Windows threading timeouts by using a higher precision time source (GH-26568)
    57b3ca7

    @vstinner
    Copy link
    Member

    I suggest to remove the (now <= 0) check.

            _PyTime_t now = _PyTime_GetPerfCounter();
            if (now <= 0) {
                Py_FatalError("_PyTime_GetPerfCounter() == 0");
            }

    Please don't add such Py_FatalError() in the middle of a function. If this case happens, it means that Python immediately exits and creates a debug report (ex: coredump on Unix, popup on Windows asking to open a debugger), and worse: it exits the process immediately.

    I'm not aware of anyone reporting that _PyTime_GetPerfCounter() returns 0. Do you fear that the clock is broken and *always* report 0? If yes, please add at check when Python starts. But why do you think that it could happen?

    @lunixbochs
    Copy link
    Mannequin

    lunixbochs mannequin commented Jun 14, 2021

    I agree with not throwing fatal errors, but that check is unlikely to actually be hit, and you removed the startup checks covering the underlying clocks here: ae6cd7c

    I think if the time source is broken, a user would probably prefer an exception or fatal error to the deadlock they will otherwise get (as time not advancing means it's impossible to timeout), so it doesn't make sense to remove the check without doing something else about it.

    There are three places win_perf_counter_frequency() can fail:

    win_perf_counter_frequency(LONGLONG *pfrequency, int raise)

    I mention the QueryPerformanceFrequency error case here (stack misalignment): https://bugs.python.org/issue41299#msg395237

    The other option, besides startup checks, would be to better parameterize the timer used (mentioned in bpo-44328): Prefer QueryUnbiasedInterruptTimePrecise if available (Win 10+ via GetProcAddress), then QPC, then QueryUnbiasedInterruptTime (which has the original bug wrt jitter but should never be used as QPC is unlikely to be broken).

    @lunixbochs
    Copy link
    Mannequin

    lunixbochs mannequin commented Jun 14, 2021

    Perhaps the simplest initial fix would be to move that check down to PyThread__init_thread() in the same file. I'm not sure what the cpython convention for that kind of init error is, would it just be the same Py_FatalError block or is there a better pattern?

    @vstinner
    Copy link
    Member

    I'm fine with adding _PyTime_Init() again, and maybe only implement checks on Windows.

    On Linux, I only saw a single failure in _PyTime_Init() once, when Python was run in a sandbox, and the time syscalls were not allowed. It was a corner case about the new 64-bit time_t syscalls on a 32-bit system (container):
    https://bugzilla.redhat.com/show_bug.cgi?id=1887445

    @vstinner
    Copy link
    Member

    I mention the QueryPerformanceFrequency error case here (stack misalignment): https://bugs.python.org/issue41299#msg395237

    Do you think that pytime.c has the bug? I don't think so.

    There are three places win_perf_counter_frequency() can fail:

    win_perf_counter_frequency(LONGLONG *pfrequency, int raise)

    In theory yes, in practice we got zero error reports. So it sounds like it cannot happen.

    I don't think that it's a good pratice to start to add checks in all functions using a clock "just in case" the clock might fail.

    @lunixbochs
    Copy link
    Mannequin

    lunixbochs mannequin commented Jun 14, 2021

    Do you think that pytime.c has the bug? I don't think so.

    No, a misaligned stack would be an issue in the caller or compiler, not pytime.c. I have hit misaligned stack in practice, but it should be rare enough to check on init only.

    In theory yes, in practice we got zero error reports. So it sounds like it cannot happen.
    I don't think that it's a good practice to start to add checks in all functions using a clock "just in case" the clock might fail.

    My read is that as long as we're not confident enough to remove those checks from pytime.c, a caller should assume they're reachable. If the pytime checks need to stay, adding a Windows only pytime init check to make sure that locks won't deadlock sounds fine to me.

    @vstinner
    Copy link
    Member

    My read is that as long as we're not confident enough to remove those checks from pytime.c, a caller should assume they're reachable. If the pytime checks need to stay, adding a Windows only pytime init check to make sure that locks won't deadlock sounds fine to me.

    If you consider that the risk is non-zero, you can add a check at startup.

    Do you expect that a startup check is enough, or do you expect failures later?

    I understood that if the first call succeed, later calls are very unlikely to fail. The first call should be enough to check for all possible errors.

    @pablogsal
    Copy link
    Member

    New changeset 8673b77 by Miss Islington (bot) in branch '3.10':
    bpo-41299: Reduce lag in Windows threading timeouts by using a higher precision time source (GH-26568) (GH-26580)
    8673b77

    @vstinner
    Copy link
    Member

    I created PR bpo-28548 to remove the Py_FatalError() code path.

    @vstinner
    Copy link
    Member

    Calling Py_FatalError() must be avoided by any mean. I spent significant time to remove Py_FatalError() at Python startup in my PEP-587 implementation: https://www.python.org/dev/peps/pep-0587/

    I also removed Py_FatalError() calls to report errors to the caller: https://bugs.python.org/issue38631

    Calling Py_FatalError() is really bad when Python is embedded in an application. It kills the whole process, the caller has no way to catch the error.

    @vstinner
    Copy link
    Member

    New changeset f35ddf2 by Victor Stinner in branch 'main':
    bpo-41299: QueryPerformanceFrequency() cannot fail (GH-28552)
    f35ddf2

    @vstinner
    Copy link
    Member

    New changeset d639e31 by Victor Stinner in branch 'main':
    bpo-41299: Fix EnterNonRecursiveMutex() (GH-28548)
    d639e31

    @vstinner
    Copy link
    Member

    New changeset 6bc8911 by Victor Stinner in branch 'main':
    bpo-41299: Mark private thread_nt.h functions as static (GH-28553)
    6bc8911

    @vstinner
    Copy link
    Member

    I close the issue.

    bpo-41299: Fix EnterNonRecursiveMutex() (GH-28548)

    I don't think that it's worth it to backport this change to 3.9 and 3.10. In practice, Py_FatalError() will never be called, and _PyTime_GetMonotonicClock() result must always be greater than 0.

    But I wanted to remove it in the main branch to reduce the number of functions calling Py_FatalError().

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants