classification
Title: [2.7] test_weakref hangs on Python 2.7 on Windows
Type: behavior Stage: resolved
Components: Tests, Windows Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: paul.moore, pitrou, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2017-03-12 01:01 by zach.ware, last changed 2020-04-27 12:17 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 2553 merged vstinner, 2017-07-03 22:03
Messages (12)
msg289461 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2017-03-12 01:01
See PR493 (https://ci.appveyor.com/project/python/cpython/build/2.7.13+.184) for an example.  I'd rather not merge PR493, which adds AppVeyor to 2.7, until this is resolved.
msg297614 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-03 21:22
Another example (I checked, all tests are run except of test_weakref):

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%202.7/builds/148/steps/test/logs/stdio

...
0:05:30 [400/403] test_file_eintr passed
0:05:30 [401/403] test_glob passed
0:05:54 [402/403] test_mmap passed (41 sec)

command timed out: 1200 seconds without output, attempting to kill
program finished with exit code 1
elapsedTime=1556.477000
msg297615 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-03 21:42
The problem are the 3 tests of test_weakref which uses collect_in_thread():

* test_threaded_weak_valued_setdefault(): 50k iterations
* test_threaded_weak_valued_pop(): 50k iterations
* test_threaded_weak_valued_consistency(): 200k iterations

On my Linux with Python 2.7 compiled in debug mode, the two first tests take 1 second, but the last test takes 10 seconds.
msg297616 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-03 22:03
On Python 3, time.sleep(seconds) rounds seconds towards +Infinity: see bpo-22117, commot cb29f0177c91ebb739b89d8cc4ba223785c94d61. time.sleep(0.0001) sleeps 15.6 ms on Windows.

On Python 2.7, time.sleep(seconds) rounds towards zero. time.sleep(0.0001) sleeps 0 ms on Windows, and so the  collect_in_thread() thread calls gc.collect() in a sleep.
msg297617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-03 22:05
@Antoine: Another example of performance issue caused by rounding ;-)

See also my article on pytime and timestamp rounding, https://haypo.github.io/pytime.html
msg297632 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-04 09:31
Another option is to change time.sleep() rounding, but it would impact all Python2 code using sleep()... I dislike this option. I expect many complains from users about such backward incompatible change in a minor
release.
msg297634 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-04 09:36
New changeset 1fef0154d97bd7f01c5bc59954a278bacf1a3f0e by Victor Stinner in branch '2.7':
bpo-29796: test_weakref: Fix collect_in_thread() on Windows (#2553)
https://github.com/python/cpython/commit/1fef0154d97bd7f01c5bc59954a278bacf1a3f0e
msg297635 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-04 09:40
I looked at test_weakref duration on buildbots before my change:

* AMD64 Windows7 SP1 2.7: (less than 30 seconds)
* AMD64 Windows8 2.7: 36 sec
* AMD64 Windows8.1 Non-Debug 2.7: 450.6 sec

Hum. That's strange. I don't understand why sometimes test_weakref hangs, whereaas sometimes it pass in less than 60 seconds?
msg297638 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-04 10:00
Oh wow! The commit 1fef0154d97bd7f01c5bc59954a278bacf1a3f0e fixed PGO compilation on Python 2.7: http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%202.7/builds/227 is the first successful build since June 20. I didn't notice that the PGO buildbot was also stuck in running test_weakref in the compile step: job killed after 30 min.
---
...
0:06:01 [380/400] test_warnings
0:06:02 [381/400] test_wave
0:06:02 [382/400] test_weakref

command timed out: 1200 seconds without output running ['make', 'all'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1699.111100
---

While build 206 was green, the buildbot started to fail in build 207:

http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%202.7/builds/207

The strange thing is that this change only contains two small changes:
* commit b39a7481ee7e6166d6d2b252a7a514b1f6553dfa
* commit d4324baca4c03eb8d55446cd1b74b32ec5633af5

I fail to understand how these changes make test_weakref hangs.

Well, at least the good news is that it's now fixed ;-)
msg297647 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-04 12:03
With commit 1fef0154d97bd7f01c5bc59954a278bacf1a3f0e, test_weakref takes 11 seconds on my Windows VM.

Without this fix, test_weakref runs for longer than 2 hour and 19 min on my Windows VM. I decided to kill the test, I don't want to burn my CPU too long :-)

So at least, on my Windows VM, the bug is trivial to reproduce!
msg367292 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-26 04:32
Looks like we forgot to close this one; as far as I can tell it was fixed.  Being 2.7-only, it's time to be closed anyway :)
msg367411 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-27 12:17
Yep, I didn't see any test_weakref hang anymore since this commit.
History
Date User Action Args
2020-04-27 12:17:59vstinnersetmessages: + msg367411
2020-04-26 04:32:57zach.waresetstatus: open -> closed
resolution: fixed
messages: + msg367292

stage: test needed -> resolved
2017-07-04 12:03:51vstinnersetmessages: + msg297647
2017-07-04 10:00:02vstinnersetmessages: + msg297638
2017-07-04 09:40:33vstinnersetmessages: + msg297635
2017-07-04 09:36:18vstinnersetmessages: + msg297634
2017-07-04 09:31:32vstinnersetmessages: + msg297632
2017-07-03 22:05:00vstinnersetmessages: + msg297617
2017-07-03 22:03:59vstinnersetpull_requests: + pull_request2622
2017-07-03 22:03:47vstinnersetmessages: + msg297616
2017-07-03 21:42:54vstinnersetmessages: + msg297615
2017-07-03 21:22:37vstinnersettitle: test_weakref hangs on AppVeyor (2.7) -> [2.7] test_weakref hangs on Python 2.7 on Windows
nosy: + pitrou, vstinner

messages: + msg297614

components: + Tests
2017-03-12 01:01:35zach.warecreate