msg264742 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-03 15:37 |
test_functools hangs on an android emulator running an armv7 system image (but not on x86) at API level 21. The test suite hangs at
test_lru_cache_threaded (test.test_functools.TestLRUC) ...
A cause of this problem may be related to the fact that the android arm emulator runs very, very slowly.
|
msg264756 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-03 19:07 |
The attached test_output.txt file is the corresponding gdb backtrace[1].
Interestingly, all the tests (issues #26938, #26939, #26940 and #26941) that hang on arm use sys.setswitchinterval() aggressively.
[1] The backtrace is interleaved with:
* Python Exception <type 'exceptions.UnicodeDecodeError'> 'utf8' codec can't decode byte 0xb8 in posit ion 0: invalid start byte:
that seem to come from libpython, the python that is embeded in gdb to pretty-print gdb output
* ---Type <return> to continue, or q <return> to quit---
sorry, had to resort to copy/paste, gdb 'set logging' does not seem to work
|
msg264922 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2016-05-05 21:28 |
Can you post the backtraces of all threads (at least the innermost frames)?
Also, can you reproduce on an actual armv7 machine? Just wondering if there may be something peculiar in the emulator.
|
msg264957 - (view) |
Author: (yan12125) * |
Date: 2016-05-06 10:20 |
FWIW, test_functools passes on ASUS Zenfone 2 Laser ZE500KL with stock ROM (rooted) and my patchset. [1] The CPU is ARM64 and the Python is built as 32-bit ARM binaries. test_threading (#26941) and test_importlib (#26940) pass, too. test_concurrent_futures (#26938) fails with 27 errors, all with "OSError: [Errno 38] Function not implemented", which seems the same as #26924.
The attached file contains logs of the four tests mentioned here.
[1] https://github.com/yan12125/python3-android
|
msg264963 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2016-05-06 11:11 |
I suspect this bug is caused by memory ordering differences between x86 and ARM (see https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering ), which is why it may not reproduce easily (may depend on the particular CPU implementation and/or various timing subtleties in the tests themselves).
Xavier / Chi, can you tell if Android has stdatomic.h? See Include/pyatomic.h for the difference it makes.
|
msg264965 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2016-05-06 11:30 |
Xavier, as a strating point, can you try the following patch? (atomic_explicit.patch)
If it solves the issue, can you also measure the performance drop compared to unpatched (on a non-debug build)? `python -m test.pystone` should be sufficient.
|
msg264974 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-06 14:53 |
Android has stdatomic.h and HAVE_STD_ATOMIC is defined.
The build fails with the patch, see the errors in the attached file. This is the native compilation of python, the patches in my build system are also applied to the native build so that they can be checked. I can prevent that if needed.
|
msg264976 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2016-05-06 14:58 |
Oops, sorry, hadn't seen the patch didn't compile. Can you tried with this new patch? (atomic_explicit2.patch)
|
msg264981 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-06 15:28 |
Still hangs at test_lru_cache_threaded with this last patch :(
|
msg264988 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-06 15:55 |
> Can you post the backtraces of all threads (at least the innermost frames)?
There are two threads, gdb does not print the backtrace of the other thread. My current gdb setup with a connection to a gdb-server misses some stuff (access to the loader or some libraries I guess) and needs to be completed with the same setup as the one used by the ndk-gdb script (released by google for android applications).
> Also, can you reproduce on an actual armv7 machine? Just wondering if there may be something peculiar in the emulator.
I don't have an armv7 device that can be used for testing.
|
msg264995 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-06 16:55 |
The strace of test_lru_cache_threaded:
...
gettimeofday({1462553238, 580860}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 581276772}) = 0
gettimeofday({1462553238, 581672}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 582078767}) = 0
gettimeofday({1462553238, 583956}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 585865959}) = 0
gettimeofday({1462553238, 587597}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 589381724}) = 0
gettimeofday({1462553238, 589973}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 591858839}) = 0
gettimeofday({1462553238, 593003}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 593452750}) = 0
gettimeofday({1462553238, 593886}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 594364345}) = 0
gettimeofday({1462553238, 594764}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 595227771}) = 0
gettimeofday({1462553238, 595626}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 596027590}) = 0
gettimeofday({1462553238, 596448}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 596848736}) = 0
gettimeofday({1462553238, 597259}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 597665484}) = 0
gettimeofday({1462553238, 598060}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 598869963}) = 0
gettimeofday({1462553238, 599547}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 601053152}) = 0
gettimeofday({1462553238, 601593}, NULL) = 0
clock_gettime(CLOCK_REALTIME, ^CProcess 908 detached
<detached ...>
|
msg265003 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2016-05-06 17:40 |
> Still hangs at test_lru_cache_threaded with this last patch :(
What about the other tests (test_importlib, test_threading)?
Do you know in which precise test it hangs? (use "-v" to print test names)
> There are two threads, gdb does not print the backtrace of the other thread.
Even with "thread apply all backtrace" (Google tells me this is the incantation)?
|
msg265020 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-06 20:11 |
> Do you know in which precise test it hangs?
They are named in the first msg of each issue, the issues are:
issue #26938: android: test_concurrent_futures hangs on armv7
issue #26939: android: test_functools hangs on armv7
issue #26940: android: test_importlib hangs on armv7
issue #26941: android: test_threading hangs on armv7
The test_threading and test_importlib issues have also now a gdb backtrace, so we know not only the test name but also the full python frame stack with line numbers thanks to python-gdb.py magic. It seems that test_functools, test_threading and test_importlib all hang within take_gil().
test_concurrent_futures is different and hangs in a futex().
With "thread apply all backtrace", nothing changes and gdb still prints:
Cannot access memory at address 0x0
#0 0xb6efae6c in ?? ()
#1 0xb6efd830 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
The problem seems to be that gdb does not have enough information to unwind the stack for the other threads and for thread #1 it only works after an interrupted continue command, and not on gdb startup.
|
msg265027 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-06 20:45 |
Forgot to say that the backtraces for test_threading and test_importlib were obtained with the patch applied while they were hanging.
|
msg265701 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-16 15:52 |
Got now a full fledged gdb. Thread #1 loops infinitely in the "while (_Py_atomic_load_relaxed(&gil_locked))" loop in take_gil(). Thread #2 is stuck in the MUTEX_LOCK(gil_mutex) statement in drop_gil() as can be seen by setting a breakpoint at the following line and seeing that this breakpoint is never hit.
Setting gil_interval to 10 (from the initial value of 1) from gdb, allows the test to pass.
With this patch of PyCOND_TIMEDWAIT():
diff -r eee959fee5f5 Python/condvar.h
--- a/Python/condvar.h Sat May 07 21:13:50 2016 +0300
+++ b/Python/condvar.h Mon May 16 15:52:49 2016 +0200
@@ -99,6 +99,7 @@
PyCOND_ADD_MICROSECONDS(deadline, us);
ts.tv_sec = deadline.tv_sec;
ts.tv_nsec = deadline.tv_usec * 1000;
+ PyCOND_GETTIMEOFDAY(&deadline);
and a breakpoint set on pthread_cond_timedwait(). When this breakpoint is hit, there is 20 usec elapsed time instead of the value of 1 (as requested by 'us') between ts and deadline as shown here:
(gdb) p ts
$1 = {tv_sec = 1463407197, tv_nsec = 468546000}
(gdb) p deadline
$2 = {tv_sec = 1463407197, tv_usec = 468566}
And when iterating the loop, the measured elapsed times are: 20, 12, 12, 11, 11, 11.
Clearly, the ts date is in the past when pthread_cond_timedwait() is called, and the function returns immediately without releasing the mutex. This is the expected behavior[1].
[1] http://linux.die.net/man/3/pthread_cond_wait
|
msg265704 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-05-16 16:02 |
I don't know if it can fix the issue, but you may see my issue #23428: "Use the monotonic clock for thread conditions on POSIX platforms".
|
msg265707 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-16 16:09 |
On a heavily loaded system, there is no guarantee that the date obtained from a call to gettimeofday() in PyCOND_TIMEDWAIT(), is not in the past when pthread_cond_timedwait() is called, as there may be a context switch in between. When this happens, a Python thread cannot take the gil even if it is available.
This tentative patch monitors the load of the system and dynamically increases the value used for the switch interval when needed. With this patch on the android-21-armv7 emulator:
python -m test -m test_lru_cache_threaded -v test_functools
Ok
python -m test -m test_deadlock -v test_importlib
Ok
python -m test -m test_is_alive_after_fork -v test_threading
Ok
The measured elapsed time is not correct in this patch when it is above 1 second.
|
msg265721 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-05-16 19:07 |
> I don't know if it can fix the issue, but you may see my issue #23428: "Use the monotonic clock for thread conditions on POSIX platforms".
pthread_condattr_setclock() is declared in the Android API 21 headers.
When a monotonic clock is set for pthread_cond_timedwait(), one could measure the monotonic time elapsed to execute the instructions from _PyTime_monotonic() to the setting of the ts fields in PyCOND_TIMEDWAIT(), i.e the time to execute those statements taken from your patch:
#ifdef MONOTONIC
_PyTime_monotonic(&deadline);
#else
_PyTime_gettimeofday(&deadline);
#endif
/* TODO: add overflow and truncation checks */
assert(us <= LONG_MAX);
deadline.tv_usec += (long)us;
deadline.tv_sec += deadline.tv_usec / 1000000;
deadline.tv_usec %= 1000000;
and prevent gil_interval to be set below that measured value. This would solve the problem for this soooo slow emulator. _PyTime_monotonic() does not exist, is this _PyTime_GetMonotonicClock() ? So I did not try to test it, not knowing the status of your patch.
IMHO PyCOND_TIMEDWAIT() should use a monotonic clock when available.
|
msg267250 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-06-04 09:20 |
pycond_timedwait_2.patch prints the delta time in microseconds when deadline is in the past. Running the patch with the following command:
python -m test -m test_lru_cache_threaded -v test_functools
gives these results:
a) natively:
deadline is never in the past and the test run prints only a bunch of 'Ok's.
b) on the x86 emulator:
Although the test does not hang when python is not patched, the output of the test run, test_lru_cache_threaded-x86-printf.txt, shows that deadline is almost always in the past.
c) on the armv7 emulator:
The output of the test run, test_lru_cache_threaded-armv7-printf.txt (this is a truncated version, the original file is half a Mbytes long), shows that deadline is always in the past. With the patch, the test does not hang.
It is possible that this problem could be a quirk in the implementation of the qemu emulator, given the results of b), and the fact that the python test suite runs at about the same speed when run natively and on the x86 emulator.
|
msg267410 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-06-05 11:56 |
The problem can be fixed by changing the switch interval to a higher value when the test is run on an Android emulator. Issue 26855 can provide a way to know when this is the case through the value of the ro.kernel.qemu property.
|
msg279588 - (view) |
Author: Xavier de Gaye (xdegaye) * |
Date: 2016-10-28 07:15 |
Patch attached.
|
msg282703 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-12-08 10:10 |
New changeset fd1718badb67 by Xavier de Gaye in branch '3.6':
Issue #26939: Add the support.setswitchinterval() function to fix
https://hg.python.org/cpython/rev/fd1718badb67
New changeset c5d7e46926ac by Xavier de Gaye in branch 'default':
Issue #26939: Merge 3.6.
https://hg.python.org/cpython/rev/c5d7e46926ac
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:30 | admin | set | github: 71126 |
2017-03-31 16:36:11 | dstufft | set | pull_requests:
+ pull_request871 |
2016-12-08 11:02:40 | xdegaye | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
2016-12-08 10:10:36 | python-dev | set | nosy:
+ python-dev messages:
+ msg282703
|
2016-11-17 16:28:25 | xdegaye | set | stage: patch review -> commit review |
2016-10-28 07:29:02 | xdegaye | link | issue26941 dependencies |
2016-10-28 07:25:09 | xdegaye | link | issue26940 dependencies |
2016-10-28 07:15:16 | xdegaye | set | files:
+ setswitchinterval.patch
dependencies:
- android: add platform.android_ver() assignee: xdegaye components:
+ Tests, - Library (Lib) versions:
+ Python 3.7 messages:
+ msg279588 stage: patch review |
2016-06-05 11:56:32 | xdegaye | set | dependencies:
+ android: add platform.android_ver() messages:
+ msg267410 |
2016-06-04 09:20:20 | xdegaye | set | files:
+ pycond_timedwait_2.patch
messages:
+ msg267250 |
2016-06-04 09:16:56 | xdegaye | set | files:
+ test_lru_cache_threaded-armv7-printf.txt |
2016-06-04 09:16:42 | xdegaye | set | files:
+ test_lru_cache_threaded-x86-printf.txt |
2016-05-21 07:06:39 | xdegaye | link | issue26865 dependencies |
2016-05-16 19:07:19 | xdegaye | set | messages:
+ msg265721 |
2016-05-16 16:09:21 | xdegaye | set | files:
+ pycond_timedwait.patch
messages:
+ msg265707 |
2016-05-16 16:02:31 | vstinner | set | nosy:
+ vstinner messages:
+ msg265704
|
2016-05-16 15:52:27 | xdegaye | set | messages:
+ msg265701 |
2016-05-06 20:45:56 | xdegaye | set | messages:
+ msg265027 |
2016-05-06 20:11:19 | xdegaye | set | messages:
+ msg265020 |
2016-05-06 17:40:20 | pitrou | set | messages:
+ msg265003 |
2016-05-06 16:55:47 | xdegaye | set | messages:
+ msg264995 |
2016-05-06 15:55:47 | xdegaye | set | messages:
+ msg264988 |
2016-05-06 15:28:15 | xdegaye | set | messages:
+ msg264981 |
2016-05-06 14:58:38 | pitrou | set | files:
+ atomic_explicit2.patch
messages:
+ msg264976 |
2016-05-06 14:53:04 | xdegaye | set | files:
+ build-failure.txt
messages:
+ msg264974 |
2016-05-06 11:30:24 | pitrou | set | files:
+ atomic_explicit.patch keywords:
+ patch messages:
+ msg264965
|
2016-05-06 11:12:00 | pitrou | set | nosy:
+ neologix
|
2016-05-06 11:11:51 | pitrou | set | messages:
+ msg264963 |
2016-05-06 10:20:55 | yan12125 | set | files:
+ python_tests_on_arm.txt nosy:
+ yan12125 messages:
+ msg264957
|
2016-05-05 21:28:32 | pitrou | set | nosy:
+ pitrou messages:
+ msg264922
|
2016-05-03 19:07:46 | xdegaye | set | files:
+ test_output.txt
messages:
+ msg264756 |
2016-05-03 15:37:46 | xdegaye | create | |