classification
Title: test_mmap.test_large_offset() timeout (1 hour) on "AMD64 Snow Leopard 3.x" buildbot
Type: Stage:
Components: Tests Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ixokai, ned.deily, pitrou, rosslagerwall, sdaoden, vstinner
Priority: normal Keywords:

Created on 2011-04-06 07:59 by vstinner, last changed 2011-05-17 22:25 by vstinner. This issue is now closed.

Messages (16)
msg133115 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-06 07:59
The creation of a file of 5.25 GB took more than 30 min on "AMD64 Snow Leopard 3.x" buildbot, and so regrtest exited:
-----------------
[ 27/354] test_mmap
Thread 0x00007fff70439ca0:
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_mmap.py", line 685 in test_large_offset
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 387 in _executeTestPart
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 442 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 494 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1078 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1166 in _run_suite
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1192 in run_unittest
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_mmap.py", line 706 in test_main
  File "./Lib/test/regrtest.py", line 1032 in runtest_inner
  File "./Lib/test/regrtest.py", line 826 in runtest
  File "./Lib/test/regrtest.py", line 650 in main
  File "./Lib/test/regrtest.py", line 1607 in <module>
make: *** [buildbottest] Error 1
program finished with exit code 2
elapsedTime=1909.711111
-----------------
http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%203.x/builds/46/steps/test/logs/stdio
msg133116 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-06 08:02
The test step was interrupted after "38 mins, 45 secs" (including 30 min of timeout) at the test 27, whereas the previous (success) test step took "46 mins, 55 secs" to execute all (354) tests.
msg133152 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-06 18:38
I can't confirm that.
On my cheap MacBook it takes some five minutes:


20:20 ~ $ time python3 -E -Wd -m test -r -w -uall test_mmap
Using random seed 1490092
[1/1] test_mmap
1 test OK.
[91067 refs]

real	4m50.301s
user	0m0.301s
sys	0m13.232s

...

20:21 ~ $ ll tmp/test_python_478/
total 6291456
 6291456 -rw-r-----  1 steffen  staff  6442450944  6 Apr 20:21 @test_478_tmp

...

Processes: 63 total, 2 running, 2 stuck, 59 sleeping, 251 threads                20:23:00
Load Avg: 0.85, 0.51, 0.24  CPU usage: 9.13% user, 15.38% sys, 75.48% idle
SharedLibs: 8260K resident, 9972K data, 0B linkedit.
MemRegions: 5563 total, 184M resident, 12M private, 391M shared.
PhysMem: 437M wired, 267M active, 113M inactive, 818M used, 1230M free.
VM: 140G vsize, 1042M framework vsize, 24419(0) pageins, 0(0) pageouts.
Networks: packets: 91/17K in, 101/17K out. Disks: 11128/551M read, 11089/8215M written.

PID   COMMAND      %CPU TIME     #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE  VPRVT  VSIZE
478   python3      4.0  00:07.22 2    0    37   137  13M    244K   15M    30M    2402M

...

20:24 ~ $ ll tmp/test_python_478/
total 5505024
 5505024 -rw-r-----  1 steffen  staff  5637144576  6 Apr 20:24 @test_478_tmp

...

Processes: 60 total, 2 running, 1 stuck, 57 sleeping, 246 threads                20:24:00
Load Avg: 1.28, 0.69, 0.33  CPU usage: 8.29% user, 16.9% sys, 75.60% idle
SharedLibs: 8260K resident, 9972K data, 0B linkedit.
MemRegions: 5444 total, 181M resident, 11M private, 535M shared.
PhysMem: 437M wired, 259M active, 124M inactive, 820M used, 1228M free.
VM: 133G vsize, 1042M framework vsize, 24421(0) pageins, 0(0) pageouts.
Networks: packets: 91/17K in, 101/17K out. Disks: 11128/551M read, 14697/11G written.

PID   COMMAND      %CPU TIME     #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE  VPRVT  VSIZE
478   python3      4.4  00:10.04 2    0    37   137  13M    244K   15M    30M    2402M


Is this the bot for which you've tracked those random failures?
Maybe this is a hardware failure, then?
(In the past i have had problems with my old PC and it's Elitegroup
K7S5A motherboard, running on FreeBSD 5.3.
That thing randomly produced "Stray IRQ 11" (about ~handful a day),
and it's ATA controller (i think) randomly caused complete hard disk
regions to become unreadable until after a restart.)
msg133402 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-04-09 17:02
OS X filesystem does not support seeking ahead to create sparse files.
The test is supposed to skip the LargeMmapTests on OS X and Windows with (line 679 of test_mmap.py):

if sys.platform[:3] == 'win' or sys.platform == 'darwin':
   requires('largefile', 'test requires %s bytes and a long time to run' % str(0x180000000))

Perhaps on the Snow Leopard buildbot something causes this line not to become true (what does sys.platform give on the buildbot)?

As for why its creating the file slowly (encrypted fs?), I don't know but the test shouldn't be running anyway...
msg133404 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-09 17:48
Well, buildbots run tests with "-uall", so the "largefile" resource gets enabled.
msg133617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-12 23:10
Antoine changed regrtest default timeout to 60 minutes. It should workaround test_mmap timeout, so can we close this issue?
msg133912 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-16 23:44
Let's close this issue. I will reopen it if a timeout of 60 min was not the right choice to workaround this failure.
msg133991 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-18 20:20
Hum, it does still fail with a timeout of 1 hour:
-------------------------------------------
[ 41/354] test_mmap
Timeout (1:00:00)!
Thread 0x00007fff70439ca0:
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_mmap.py", line 685 in test_large_offset
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 387 in _executeTestPart
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 442 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 494 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1078 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1166 in _run_suite
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1192 in run_unittest
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_mmap.py", line 706 in test_main
  File "./Lib/test/regrtest.py", line 1041 in runtest_inner
  File "./Lib/test/regrtest.py", line 835 in runtest
  File "./Lib/test/regrtest.py", line 659 in main
  File "./Lib/test/regrtest.py", line 1619 in <module>
make: *** [buildbottest] Error 1
program finished with exit code 2
elapsedTime=4261.520698
-------------------------------------------
http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%203.x/builds/62/steps/test/logs/stdio
msg133996 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-04-18 22:26
Note that the various HFS variants, the default file system types on OS X, do not support sparse files at all. So any tests of large files require at some point that the system writes out all the data in the file.

http://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPFileSystem/Articles/Comparisons.html
msg133997 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-04-18 22:52
That said, on my iMac (2.4 Ghz Intel Core 2 Duo):

$ time ./python -m test -v -u largefile test_mmap
[...]

----------------------------------------------------------------------
Ran 24 tests in 87.673s

OK
1 test OK.

real	1m27.875s
user	0m0.186s
sys	0m8.626s
msg133998 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-18 22:57
We can use the following function to check if the filesystem does support sparse files:

def support_sparse_file(directory):
    import tempfile

    with tempfile.NamedTemporaryFile(dir=directory) as tmpfile:
        # Create a file with a size of 1 byte but without content
        # (only zeros)
        with open(tmpfile.name, "wb") as f:
            f.truncate(1)
        filestat = os.stat(tmpfile.name)
        return (filestat.st_blocks == 0)

We may skip the test if the filesystem doesn't support sparse file... But I think that it is already the purpose of the "largefile" resource.

If this issue is not a bug, but just that the timeout is too low, we should use a bigger timeout on this specific buildbot.
msg134006 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-18 23:44
Timing on the x86 Tiger buildbot: "time ./python.exe -m test -v -u largefile test_mmap" gives approx 5 minutes. The buildbot is a Mac mini, Intel Core Duo 1.8 GHz, 2 GB of memory, Mac OS X 10.4.10, HD: 232 GB (182 GB available) WDC WD2500BEVT-00A23T0.

/usr/sbin/system_profiler gives a lot of information about the OS and the hardware, great tool!
msg134007 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-04-19 00:22
Maybe that buildbot machine is just overloaded.  If I understand the buildbot waterfall output correctly, it looks like there are often 4 simultaneous test runs happening on that machine, one each for 2.7, 3.1, 3.2 and 3.x.

http://www.python.org/dev/buildbot/all/waterfall?builder=AMD64+Snow+Leopard+3.x&builder=AMD64+Snow+Leopard+3.2&builder=AMD64+Snow+Leopard+3.1&builder=AMD64+Snow+Leopard+2.7&builder=AMD64+Snow+Leopard+custom&reload=none

Maybe the buildbot config could be changed to stagger the builds a bit?  Otherwise, perhaps it will often take a long time to run.  On my system, a full run of just 3.x with a similar ./configure and regrtest parameters takes about 30 minutes by itself.
msg134034 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 10:11
I think that will also be healed with the patch i've just posted
to #11277.
(However, though i can't ,reproduce`, it seems the
Microkernel/IOKit interaction sometimes has a problem, because
i *really* had messed up sparse files (about 1-2 percent or so)
during the testing phase.)
msg134980 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-02 15:07
I commited Steffen's patch for mmap. We will see if it improves the situation.

New changeset d578fdc9b157 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #11277: mmap.mmap() calls fcntl(fd, F_FULLFSYNC) on Mac OS X
http://hg.python.org/cpython/rev/d578fdc9b157
msg136190 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-17 22:25
I did not see new failure recently, let's close this issue. Reopen if it is not fixed.
History
Date User Action Args
2011-05-17 22:25:58vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg136190
2011-05-02 15:07:40vstinnersetmessages: + msg134980
2011-04-20 08:50:50vstinnersettitle: test_mmap timeout (1 hour) on "AMD64 Snow Leopard 3.x" buildbot -> test_mmap.test_large_offset() timeout (1 hour) on "AMD64 Snow Leopard 3.x" buildbot
2011-04-19 10:11:40sdaodensetmessages: + msg134034
2011-04-19 00:22:09ned.deilysetmessages: + msg134007
2011-04-18 23:44:05vstinnersetmessages: + msg134006
2011-04-18 22:57:30vstinnersetmessages: + msg133998
2011-04-18 22:52:41ned.deilysetmessages: + msg133997
2011-04-18 22:26:11ned.deilysetnosy: + ned.deily
messages: + msg133996
2011-04-18 20:20:31vstinnersetstatus: closed -> open
resolution: out of date -> (no value)
messages: + msg133991

title: test_mmap timeout (30 min) on "AMD64 Snow Leopard 3.x" buildbot -> test_mmap timeout (1 hour) on "AMD64 Snow Leopard 3.x" buildbot
2011-04-16 23:44:18vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg133912
2011-04-12 23:10:28vstinnersetmessages: + msg133617
2011-04-09 17:48:58pitrousetnosy: + pitrou
messages: + msg133404
2011-04-09 17:02:58rosslagerwallsetnosy: + rosslagerwall
messages: + msg133402
2011-04-06 18:38:49sdaodensetnosy: + sdaoden
messages: + msg133152
2011-04-06 08:02:47vstinnersetnosy: + ixokai
messages: + msg133116
2011-04-06 07:59:21vstinnercreate