msg211966 - (view) |
Author: (raruler) |
Date: 2014-02-22 22:20 |
I've tried this with both the 32-bit and 64-bit versions of 3.3.4 on two Windows 7 x64 machines.
threading.Event().wait(2148) and a lock obtained from _thread used as lock.acquire(True, 2148) will never return. Anything under 2148 seems to work just fine, but anything 2148 or larger never wakes up.
The same call works on 3.3.4/x64 on OS X.
|
msg218004 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-06 19:04 |
In thread_nt.h:EnterNonRecursiveMutex the millisecond version (here: 2148000) of the seconds you passed in are converted to microseconds (so: 2148000000).
This is then passed to condvar.h:PyCOND_TIMEDWAIT which expects a long, whose 32-bit limit is 2147483647. So it wraps around and becomes -2146967296.
That value is then divided by 1000 once again to become milliseconds (-2146967) and passed to condvar.h:_PyCOND_WAIT_MS which expects a DWORD, which is a synonym for an unsigned long. Thus the signed value becomes an unsigned 4292820329. Which then passed in as the millisecond timeout to WaitForSingleObjectEx.
So that's what's happening; but I'm really not sure at what stage a change should be made. It looks to me as if PyCOND_TIMEDWAIT should be accepting something more than a long, but I'm really not competent to assess the impact of such a change.
|
msg218045 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-07 10:20 |
The attached patch uses DWORD (essentially: unsigned long) in condvar.h:PyCOND_TIMEDWAIT.
Adding Kristjan as it was his code.
|
msg218047 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 11:06 |
changing long to DWORD doesn't really fix the overflow issue.
The fundamental problem is that some of the apis, e.g. WaitForSingleObject have a DWORD maximum. so, we cannot support sleep times longer than some particular time.
Microseconds was chosen in the api because that is the resolution of the api in pthreads.
IMHO, I think it is okay to have an implicit ceiling on the timeout, e.g. some 4000 seconds. We can add a caveat somewhere that anyone intending to sleep for extended periods of time should be prepared for a timeout occurring early, and should have his own timing logic to deal with that.
My suggestion then is to
a) change the apis to DWORD
b) add a macro something like PyCOND_MAX_WAIT set to 2^32-1
c) properly clip the argument where we call this cunfion, e.g. in lock.acquire.
|
msg218055 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-07 14:14 |
Thanks for the feedback, Kristjan. You're obviously correct in that we
can't account for timeouts greater than DWORD-size milliseconds and your
proposed solution looks reasonable.
However, I'd like to close off *this* particular issue which turns on
the implicit and, presumably unintended, conversion between unsigned and
signed long in the condition variable code. Can you see any adverse
effect from moving to DWORD parameters per my patch?
|
msg218056 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-05-07 14:24 |
An implicit ceiling of 4000 seconds on the timeout? I routinely use timeouts of approximately 24 hours in calls to Event().wait(). What am I misunderstanding? If I'm not misunderstanding, then no, I don't think that change would be acceptable.
|
msg218058 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 15:15 |
Tim, how about changing the variable to "unsigned long"? I'd like the signature of the function to be the same for all platforms.
This will change the code and allow waits for up to 4000 seconds.
There is still an overflow problem present, though.+
David, in general the maximum wait times of these primitives are platform specific. If you don't want any ceiling, then we would have to add code all over the place (in C) to do looping timeouts. Not sure which is better, to do it in c, or to accept in python that waits may timeout earlier than specified.
|
msg218059 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-05-07 16:02 |
I have production code on Windows using python2.7 that calls Event().wait() with a timeout of approximately 24 hours, and it works just fine. Having that no longer work is, IMO, an unacceptable regression. (I'm ready to move this code to python3 as soon as cx_Freeze supports Windows services under python3).
|
msg218060 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-07 16:20 |
Just to be clear: the change *I'm* proposing for this issue has nothing
to do with limiting the wait, artificially or otherwise. It's simply
undoing an unintended conversion from unsigned to signed and back again,
whicih currently causes any wait of more than 2147 seconds to hang
pretty much indefinitely.
It looks to me as though Kristjan's "4000" is off by an order of
magnitude: the parameter to WaitFor... is a DWORD number of
milliseconds. That allows for 2**31 -1 milliseconds which is something
short of 50 days, I believe.
|
msg218071 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-07 19:11 |
Updated patch with unsigned long applied throughout
|
msg218077 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 21:11 |
Hi there.
When I said "4000", that was because of the conversion to microseconds which happens early on.
I'm not trying to be difficult here Tim, it's just that you've pointed out a problem and I'd like us to have a comprehensive fix.
"unsigned long", I realized, is also not super, because on unix that can be either 32 or 64 bits :)
The reason 24 hour waits work on 2.7 is that the conversion to microseconds is never done, rather it uses a DWORD of milliseconds. I agree that this is a regression that needs fixing. Even if there is a theroetical maximum, it should be higher than that :)
My latest suggestion? Let's just go ahead and use a "double" for the argument in PyCOND_TIMEDWAIT().
We then have two conversion cases:
1) to a DWORD of milliseconds for both windows apis. Here we should truncate to the max size of a DWORD
2) to the timeval used on pthreads.
for 1, that can be done like:
if (ds*1e3 > (double)DWORD_MAX)
ms = DWORD_MAX;
else
ms = (DWORD)(ds*1e3)
for 2, modifying the PyCOND_ADD_MICROSECONDS macro into something like:
#define PyCOND_ADD_MICROSECONDS(tv, ds)
do {
long oldsec, sec, usec;
assert(ds >= 0.0);
// truncate ds into theoretical maximum
if (ds > (double)long_max)
ds = (double)long_max; // whatever that may be
sec = (long)ds;
usec = (long)((ds - (double)sec) * 1e6))
oldsec = tv.tv_sec;
tv.tv_usec += usec;
tv.tv_sec += sec;
if (usec >= 1000000) {
tv.tv_usec -= 1000000;
tv.tv_sec += 1;
}
if (tv.tv_sec < oldsec)
/* detect overflow */
tv.sec = max_long;
I'm not super experienced with integer arithmetic like this or the pitfalls of overflow, so this might need some pondering. Perhaps it is better to do the tv_sec and tv_usec arithmetic in doubles before converting them back.
Does this sound ok?
Let me see if I can cook up an alternative patch.
|
msg218078 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-05-07 21:15 |
I'm adding Mark Dickinson to nosy to see if our math expert has a comment on the arithmetic :)
|
msg218079 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 21:40 |
Ah, I saw this code here in thread_nt.h:
if ((DWORD) milliseconds != milliseconds)
Py_FatalError("Timeout too large for a DWORD, "
"please check PY_TIMEOUT_MAX");
the PyCOND_TIMEDWAIT is currently only used by the GIL code and by the locks on NT. The GIL code assumes microsecond resolution. So we need to stick to that, at least. But the locking code assumes at least DWORD worth of milliseconds.
|
msg218080 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 21:41 |
(cont.)
so, I suggest that we modify the API to use "Py_LONG_LONG usec"
Does that sound reasonable?
|
msg218081 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 21:49 |
Here is a proposed alternative patch.
No additional checks, just a wider "Py_LONG_LONG us" wide enough to accommodate 32 bits of milliseconds as before.
|
msg218082 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-07 21:51 |
fix patch, was using git format....
|
msg218096 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-08 07:29 |
+1 for Kristjan's latest patch. And thanks for working this through, Kristjan: I'd missed the fact that the microseconds conversion could itself overflow even an unsigned long.
|
msg218098 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-08 09:12 |
Thanks. Can you confirm that it resolves the issue? I'll get it checked in once I get the regrtest suite run.
|
msg218099 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-08 09:32 |
s/Py_LONG_LONG/PY_LONG_LONG/
|
msg218100 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-08 09:33 |
I see, I wasn't able to compile it yesterday when I did it :)
|
msg218104 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2014-05-08 10:15 |
I can confirm that the attached test.py times out after 2150 seconds (ie
30+ minutes) with your (tweaked) patch applied:
python test.py 2150
Running Debug|Win32 interpreter...
2014-05-08 10:33:53.670091
Expected to time out by 2014-05-08 11:09:43.670091
Timed Out
2014-05-08 11:09:43.765079
I don't see us adding a test which would lengthen the test suite run by
more than 30 minutes!
|
msg218105 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2014-05-08 10:16 |
nope, let's not do that :)
|
msg218106 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-05-08 10:37 |
New changeset 80b76c6fad44 by Kristján Valur Jónsson in branch 'default':
The PyCOND_TIMEDWAIT must use microseconds for the timeout argument
http://hg.python.org/cpython/rev/80b76c6fad44
|
msg218108 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-05-08 11:19 |
New changeset ab5e2b0fba15 by Kristján Valur Jónsson in branch '3.3':
The PyCOND_TIMEDWAIT must use microseconds for the timeout argument
http://hg.python.org/cpython/rev/ab5e2b0fba15
New changeset 7764bb7f2983 by Kristján Valur Jónsson in branch '3.4':
Merging from 3.3: The PyCOND_TIMEDWAIT must use microseconds for the timeout argument
http://hg.python.org/cpython/rev/7764bb7f2983
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:59 | admin | set | github: 64936 |
2014-05-08 11:21:27 | kristjan.jonsson | set | status: open -> closed resolution: fixed |
2014-05-08 11:19:31 | python-dev | set | messages:
+ msg218108 |
2014-05-08 10:37:49 | python-dev | set | nosy:
+ python-dev messages:
+ msg218106
|
2014-05-08 10:16:51 | kristjan.jonsson | set | messages:
+ msg218105 |
2014-05-08 10:15:38 | tim.golden | set | files:
+ test.py
messages:
+ msg218104 |
2014-05-08 09:33:38 | kristjan.jonsson | set | messages:
+ msg218100 |
2014-05-08 09:32:12 | tim.golden | set | messages:
+ msg218099 |
2014-05-08 09:12:08 | kristjan.jonsson | set | messages:
+ msg218098 |
2014-05-08 07:29:01 | tim.golden | set | messages:
+ msg218096 |
2014-05-07 21:51:03 | kristjan.jonsson | set | files:
+ condwait.patch
messages:
+ msg218082 |
2014-05-07 21:49:37 | kristjan.jonsson | set | files:
+ condwait.patch
messages:
+ msg218081 |
2014-05-07 21:41:23 | kristjan.jonsson | set | messages:
+ msg218080 |
2014-05-07 21:40:21 | kristjan.jonsson | set | messages:
+ msg218079 |
2014-05-07 21:15:05 | r.david.murray | set | nosy:
+ mark.dickinson messages:
+ msg218078
|
2014-05-07 21:11:51 | kristjan.jonsson | set | messages:
+ msg218077 |
2014-05-07 19:11:24 | tim.golden | set | files:
+ issue20737.condvar.2.patch
messages:
+ msg218071 |
2014-05-07 16:20:54 | tim.golden | set | messages:
+ msg218060 |
2014-05-07 16:02:48 | r.david.murray | set | messages:
+ msg218059 |
2014-05-07 15:15:06 | kristjan.jonsson | set | messages:
+ msg218058 |
2014-05-07 14:24:31 | r.david.murray | set | messages:
+ msg218056 |
2014-05-07 14:14:34 | tim.golden | set | messages:
+ msg218055 |
2014-05-07 11:06:20 | kristjan.jonsson | set | messages:
+ msg218047 |
2014-05-07 10:20:31 | tim.golden | set | stage: patch review |
2014-05-07 10:20:18 | tim.golden | set | files:
+ issue20737.condvar.patch
nosy:
+ kristjan.jonsson messages:
+ msg218045
keywords:
+ patch |
2014-05-06 19:04:20 | tim.golden | set | messages:
+ msg218004 |
2014-02-24 22:51:18 | alex.gronholm | set | nosy:
+ alex.gronholm
|
2014-02-23 06:19:35 | r.david.murray | set | nosy:
+ r.david.murray
|
2014-02-22 22:24:48 | pitrou | set | nosy:
+ tim.golden, brian.curtin
versions:
+ Python 3.4 |
2014-02-22 22:20:26 | raruler | create | |