Message 324342 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	Gammaguy, josh.r, paul.moore, rhettinger, steve.dower, tim.golden, vstinner, xtreak, zach.ware
Date	2018-08-29.17:44:19
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1535564659.32.0.56676864532.issue34535@psf.upfronthosting.co.za>
In-reply-to

Content
Victor, that was a little overboard. By that logic, there doesn't need to be a Windows version of Python. That said, Paul doesn't seem to understand that the real resolution limit isn't 1 ms; that's the lower limit on arguments to the API, but the real limit is the system clock, which has a granularity in the 10-16 ms range. It's a problem with Windows in general, and the cure is worse than the disease. Per https://msdn.microsoft.com/en-us/library/windows/desktop/ms724411(v=vs.85).aspx , the resolution of the system timer is typically in the range of 10 milliseconds to 16 milliseconds. Per https://docs.microsoft.com/en-us/windows/desktop/Sync/wait-functions#wait-functions-and-time-out-intervals : > Wait Functions and Time-out Intervals > The accuracy of the specified time-out interval depends on the resolution of the system clock. The system clock "ticks" at a constant rate. If the time-out interval is less than the resolution of the system clock, the wait may time out in less than the specified length of time. If the time-out interval is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on. All the Windows synchronization primitives (e.g. WaitForSingleObjectEx https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-waitforsingleobjectex , which is what ultimately implements timed lock acquisition on Windows) are based on the system clock, so without drastic measures, it's impossible to get better granularity than the 10-16 ms of the default system clock configuration. The link on "Wait Functions and Time-out Intervals" does mention that this granularity can be increased, but it recommends against fine-grained tuning (so you can't just tweak it before a wait and undo the tweak after; the only safe thing to do is change it on program launch and undo it on program exit). Even then, it's a bad idea for Python to use it; per timeBeginPeriod's own docs ( https://docs.microsoft.com/en-us/windows/desktop/api/timeapi/nf-timeapi-timebeginperiod ): > This function affects a global Windows setting. Windows uses the lowest value (that is, highest resolution) requested by any process. Setting a higher resolution can improve the accuracy of time-out intervals in wait functions. However, it can also reduce overall system performance, because the thread scheduler switches tasks more often. High resolutions can also prevent the CPU power management system from entering power-saving modes. Setting a higher resolution does not improve the accuracy of the high-resolution performance counter. Basically, to improve the resolution of timed lock acquisition, we'd have to change the performance profile of the entire OS while Python was running, likely increasing power usage and possibly reducing performance. Global solutions to local problems are a bad idea. The most reasonable solution to the problem is to simply document it (maybe not for queue.Queue, but for the threading module). Possibly even provide an attribute in the threading module similar to threading.TIMEOUT_MAX that reports the system clock's granularity for informational purposes (might need to be a function so it reports the potentially changing granularity). Other, less reasonable solutions, would be: 1. Expose a function (with prominent warnings about not using it in a fine grained manner, and the effects on power management and performance) that would increase the system clock granularity as much as possible timeGetDevCaps reports possible (possibly limited to a user provided suggestion, so while the clock could go to 1 ms resolution, the user could request only 5 ms resolution to reduce the costs of doing so). Requires some additional state (whether timeBeginPeriod has been called, and with what values) so timeEndPeriod can be called properly before each adjustment and when Python exits. Pro is the code is relatively simple and would mostly fix the problem. Cons are that it wouldn't be super discoverable (unless we put notes in every place that uses timeouts, not just in threading docs), it encourages bad behavior (one application deciding its needs are more important that conserving power), and we'd have to be really careful to pair our calls universally (timeEndPeriod must be called, even when other cleanup is skipped, such as when calling os._exit; AFAICT, the docs imply that per-process adjustments to the clock aren't undone even when the process completes, which means failure to pair all calls would leave the system with a suboptimal system clock resolution that would remain in effect until rebooted). 2. (Likely a terrible idea, and like option 1, should be explicitly opt-in, not enabled by default) Offer the option to have Python lock timeouts only use WaitForSingleObjectEx only to sleep to within one system clock tick of the target time (and not at all if the timeout is less than the clock resolution), then, before reacquiring the GIL, perform a time slice yielding busy loop until you pass the target time (as determined by a higher resolution clock than the system clock). Bad for power management, bad for single core machines (where even with time slice yielding, you're still constantly getting scheduled), etc.

Victor, that was a little overboard. By that logic, there doesn't need to be a Windows version of Python.

That said, Paul doesn't seem to understand that the real resolution limit isn't 1 ms; that's the lower limit on arguments to the API, but the real limit is the system clock, which has a granularity in the 10-16 ms range. It's a problem with Windows in general, and the cure is worse than the disease.

Per https://msdn.microsoft.com/en-us/library/windows/desktop/ms724411(v=vs.85).aspx , the resolution of the system timer is typically in the range of 10 milliseconds to 16 milliseconds.

Per https://docs.microsoft.com/en-us/windows/desktop/Sync/wait-functions#wait-functions-and-time-out-intervals :

> Wait Functions and Time-out Intervals

> The accuracy of the specified time-out interval depends on the resolution of the system clock. The system clock "ticks" at a constant rate. If the time-out interval is less than the resolution of the system clock, the wait may time out in less than the specified length of time. If the time-out interval is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on.

All the Windows synchronization primitives (e.g. WaitForSingleObjectEx https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-waitforsingleobjectex , which is what ultimately implements timed lock acquisition on Windows) are based on the system clock, so without drastic measures, it's impossible to get better granularity than the 10-16 ms of the default system clock configuration.

The link on "Wait Functions and Time-out Intervals" does mention that this granularity *can* be increased, but it recommends against fine-grained tuning (so you can't just tweak it before a wait and undo the tweak after; the only safe thing to do is change it on program launch and undo it on program exit). Even then, it's a bad idea for Python to use it; per timeBeginPeriod's own docs ( https://docs.microsoft.com/en-us/windows/desktop/api/timeapi/nf-timeapi-timebeginperiod ):

> This function affects a global Windows setting. Windows uses the lowest value (that is, highest resolution) requested by any process. Setting a higher resolution can improve the accuracy of time-out intervals in wait functions. However, it can also reduce overall system performance, because the thread scheduler switches tasks more often. High resolutions can also prevent the CPU power management system from entering power-saving modes. Setting a higher resolution does not improve the accuracy of the high-resolution performance counter.

Basically, to improve the resolution of timed lock acquisition, we'd have to change the performance profile of the entire OS while Python was running, likely increasing power usage and possibly reducing performance. Global solutions to local problems are a bad idea.

The most reasonable solution to the problem is to simply document it (maybe not for queue.Queue, but for the threading module). Possibly even provide an attribute in the threading module similar to  threading.TIMEOUT_MAX that reports the system clock's granularity for informational purposes (might need to be a function so it reports the potentially changing granularity).

Other, less reasonable solutions, would be:

1. Expose a function (with prominent warnings about not using it in a fine grained manner, and the effects on power management and performance) that would increase the system clock granularity as much as possible timeGetDevCaps reports possible (possibly limited to a user provided suggestion, so while the clock could go to 1 ms resolution, the user could request only 5 ms resolution to reduce the costs of doing so). Requires some additional state (whether timeBeginPeriod has been called, and with what values) so timeEndPeriod can be called properly before each adjustment and when Python exits. Pro is the code is *relatively* simple and would mostly fix the problem. Cons are that it wouldn't be super discoverable (unless we put notes in every place that uses timeouts, not just in threading docs), it encourages bad behavior (one application deciding its needs are more important that conserving power), and we'd have to be *really* careful to pair our calls universally (timeEndPeriod must be called, even when other cleanup is skipped, such as when calling os._exit; AFAICT, the docs imply that per-process adjustments to the clock aren't undone even when the process completes, which means failure to pair all calls would leave the system with a suboptimal system clock resolution that would remain in effect until rebooted).

2. (Likely a terrible idea, and like option 1, should be explicitly opt-in, not enabled by default) Offer the option to have Python lock timeouts only use WaitForSingleObjectEx only to sleep to within one system clock tick of the target time (and not at all if the timeout is less than the clock resolution), then, before reacquiring the GIL, perform a time slice yielding busy loop until you pass the target time (as determined by a higher resolution clock than the system clock). Bad for power management, bad for single core machines (where even with time slice yielding, you're still constantly getting scheduled), etc.

History
Date	User	Action	Args
2018-08-29 17:44:19	josh.r	set	recipients: + josh.r, rhettinger, paul.moore, vstinner, tim.golden, zach.ware, steve.dower, xtreak, Gammaguy
2018-08-29 17:44:19	josh.r	set	messageid: <1535564659.32.0.56676864532.issue34535@psf.upfronthosting.co.za>
2018-08-29 17:44:19	josh.r	link	issue34535 messages
2018-08-29 17:44:19	josh.r	create