classification
Title: sporadic failure in test_mailbox
Type: behavior Stage: resolved
Components: Library (Lib), Tests Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, python-dev, r.david.murray, sdaoden
Priority: normal Keywords: patch

Created on 2011-05-04 11:17 by haypo, last changed 2011-11-12 19:28 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
11999.1.diff sdaoden, 2011-05-04 12:09 review
mailbox_mtime.patch r.david.murray, 2011-05-06 02:46 review
Messages (19)
msg135117 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-04 11:17
test_mailbox fails sometimes on FreeBSD. Recent example on AMD FreeBSD 8.2 3.x:
---------------
[200/354] test_mailbox
test test_mailbox failed -- Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_mailbox.py", line 905, in test_reread
    assert refreshed()
AssertionError
---------------
http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%208.2%203.x/builds/180/steps/test/logs/stdio
msg135126 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-04 12:09
I think this relates #6896.
Maybe a two second resolution should be tried?
msg135259 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-06 02:43
Well, it turns out that this sporadic failure is not a test bug, but a real bug in the mailbox module that the test is revealing.

This issue is the same one that motivated the changes in issue 6896.  Those changes, however, merely reduced the problem, but didn't solve it.

The fundamental problem is that mailbox is relying on comparing the system clock with the mtime.  But the mtime, it turns out, is not guaranteed to follow the system clock.  You can see this most easily if you think about, say, an NSF file system.  The mtime is set by the server's clock, and if the server's clock is different, the mtime won't match the local clock.  It appears to be also true for some reason on a vserver virthost: as reported in issue 6896, the mtime is sometimes set to a value a full second before the time.time() time.

Ironically this was discussed in the original bug report that introduced the mtime checking code (issue 1607951), and I found that bug on the first page of hits while searching for mtime/system clock synchronization problems.  The solution that Andrew proposed in that issue is slightly different from the one he actually implemented, but his proposed solution was also flawed.

The actual solution involves dealing correctly with two factors: the mtime "clock" is not synchronized with the system clock, and the mtime only has a resolution of one second.  The first means that when looking for changes to the mtime we must compare it to the previous value of the mtime.  The second means that if _refresh was last called less than a second ago by our clock (the only one we can query), we had best recheck because the directory may have been updated without the mtime changing.

I also added an additional delta in case the file system clock is skewing relative to the system clock.  I made this a class attribute so that it is adjustable; perhaps it should be made public and documented.

Attached is a patch implementing the fix.  It undoes the 6896 patch, since it is no longer needed.  At this writing my buildbot has run test_mailbox 50 times without failing, where before it would fail every third run or so.

Sadly, I had to reintroduce a 1.1 second fixed sleep into the test.  No way around it.  But it is a deterministic sleep, not a "hope this is long enough" sleep.
msg135272 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 10:04
On Fri,  6 May 2011 04:44:00 +0200, R. David Murray wrote:
> [.] the mtime only has a resolution of one second.

You always say that!  But i'm pretty sure from somewhen long ago
that there are filesystems which have a two second time resolution.
And unless i'm mistaken that filesystem is still used widely.

> Attached is a patch implementing the fix.
> It undoes the 6896 patch

I've not yet tried your code but from looking at the patch it
seems to target towards a real approach.

> I also added an additional delta in case the file system clock
> is skewing relative to the system clock.  I made this a class
> attribute so that it is adjustable; perhaps it should be made
> public and documented.

On the other hand, if it shows up after almost five years that the
one second resolution solution doesn't work, and that simply
adjusting to a two second resolution is not smart enough to get
this fixed, then i would not go for something half-automatic which
a user needs to adjust manually, because how could a user do that?

Thus, in my view, if you are *really* looking forward to make
mailbox.py a *good* and *beautiful* thing then the timedelta in
between the filesystem and the host's time() must of course be
tracked-, and the fuzziness should be adjusted automatically.
E.g. similar http://linuxcommand.org/man_pages/adjtimex8.html.
msg135274 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 10:27
> I also added an additional delta in case the file system clock
> is skewing relative to the system clock. 

In fact this idea could even be made public e.g. like this

    class ClockDrifter:
        def add_snapshot(self, exttime, loctime=None):
            if loctime is None:
                loctime = time.time()
            ...
        def drift_tendency(self):
            ...
        def drift_distance(self):
            ...

I could think of usages thereof.
msg135280 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-06 12:17
Isn't this a bit out of scope for the mailbox module, though?  Mailbox is getting called at irregular intervals, and a clock monitor really wants to be a polling daemon of some sort, I think.  In which case a complete application system that used the mailbox library could adjust the _skewtime appropriately based on the output of a daemon, if it really cares that much.

I think we're well into YAGNI territory, though.  We are only concerned about how long one second is to the filesystem, and a filesystem that skews or jitters more than a tenth of a second over one second is probably pathological anyway.

As for the two second filesystem, my understanding is that is the FAT file system, and I say we just declare using maildir on FAT to be unsupported.  (But I suppose that should be documented).

(As a side note, atime on FAT has a resolution of 1 day, which kind of screws up your "new mail" signal....)
msg135282 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-06 12:27
Oh, and to be clear: the problem wasn't that the one second interval was too short, the problem was that the times stamps of the files were being compared to the wrong check value (the system clock instead of their previous state).  The extra one tenth second is a "belt and suspenders" addition.
msg135283 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-06 12:28
I am open to the argument that we should make it two seconds and a bit to support FAT, but how many people are going to use maildir on FAT?
msg135286 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 12:35
I like all kind of single-screw solutions.
And i feel a bit uncomfortable with your usual tightness.
(0.1 seconds - i mean, come on)
msg135287 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 12:40
> how many people are going to use maildir on FAT?

Dunno.
But it's maybe the lowest common denominator of mountable
readwrite filesystems.  Except for that MacBook i always had
a shared FAT partition on my private PCs.
msg135289 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-06 12:56
All right, let's make it two seconds and a bit, then.

Why do you think a tenth of second is too small?  A clock with a skew of half that much would be badly broken.
msg135291 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 13:15
I agree that it's fun to see pieces of code interacting like gears
in a transmission, but often it gets ugly due to the noise from
the outside which requires to add ugly fuzziness or honour stupid
design decisions.
Maybe an environment variable like PYMAILBOX_SKEW and then one
could run test_mailbox.py and see if it works for 0.01?? :)
msg135398 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-07 02:27
New changeset a0147a1f1776 by R David Murray in branch '2.7':
#11999: sync based on comparing mtimes, not mtime to system clock
http://hg.python.org/cpython/rev/a0147a1f1776

New changeset 252451fef13f by R David Murray in branch '3.1':
#11999: sync based on comparing mtimes, not mtime to system clock
http://hg.python.org/cpython/rev/252451fef13f

New changeset 6907c43d1ee7 by R David Murray in branch '3.2':
#11999: sync based on comparing mtimes, not mtime to system clock
http://hg.python.org/cpython/rev/6907c43d1ee7

New changeset 214a34b26e1b by R David Murray in branch 'default':
#11999: sync based on comparing mtimes, not mtime to system clock
http://hg.python.org/cpython/rev/214a34b26e1b
msg135399 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-07 02:28
Now we see if the buildbots agree with me that this is fixed.
msg135416 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-07 08:20
I don't understand anything, but thanks for fixing it :-) (Or at least trying to fix it, buildbots will tell us.)
msg135454 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-07 12:23
That's really a good one.
(In my eyes.)
msg135459 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-07 12:57
This tracker - sorry!
(I surely will try to write and offer a patch for the tracker,
but first i need to understand that mailbox.py jungle for
#11935, next week.)
msg139526 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-30 21:28
> Now we see if the buildbots agree with me that this is fixed.

There is a new failure today on a Windows Seven buildbot:

======================================================================
FAIL: test_reread (test.test_mailbox.TestMaildir)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_mailbox.py", line 900, in test_reread
    self.assertFalse(refreshed())
AssertionError: True is not false

http://www.python.org/dev/buildbot/all/builders/x86%20Windows7%203.x/builds/3353/steps/test/logs/stdio
msg147527 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-12 19:28
New changeset d2b0751174f6 by Petri Lehtinen in branch '2.7':
Update mailbox.Maildir tests
http://hg.python.org/cpython/rev/d2b0751174f6

New changeset b3c5e1c62839 by Petri Lehtinen in branch '3.2':
Update mailbox.Maildir tests
http://hg.python.org/cpython/rev/b3c5e1c62839
History
Date User Action Args
2011-11-12 19:28:43python-devsetstatus: open -> closed
resolution: fixed
messages: + msg147527
2011-06-30 21:28:11hayposetresolution: fixed -> (no value)
messages: + msg139526
2011-05-07 12:57:35sdaodensetmessages: + msg135459
title: sporadic failure in test_mailbox on FreeBSD -> sporadic failure in test_mailbox
2011-05-07 12:23:18sdaodensetmessages: + msg135454
title: sporadic failure in test_mailbox -> sporadic failure in test_mailbox on FreeBSD
2011-05-07 08:20:45hayposetstatus: pending -> open

messages: + msg135416
2011-05-07 02:28:57r.david.murraysetstatus: open -> pending

title: sporadic failure in test_mailbox on FreeBSD -> sporadic failure in test_mailbox
messages: + msg135399
versions: + Python 3.1, Python 2.7, Python 3.2
type: behavior
resolution: fixed
stage: resolved
2011-05-07 02:27:23python-devsetnosy: + python-dev
messages: + msg135398
2011-05-06 13:15:00sdaodensetmessages: + msg135291
2011-05-06 12:56:19r.david.murraysetmessages: + msg135289
2011-05-06 12:41:00sdaodensetmessages: + msg135287
2011-05-06 12:35:17sdaodensetmessages: + msg135286
2011-05-06 12:28:41r.david.murraysetmessages: + msg135283
2011-05-06 12:27:42r.david.murraysetmessages: + msg135282
2011-05-06 12:17:04r.david.murraysetmessages: + msg135280
2011-05-06 10:27:06sdaodensetmessages: + msg135274
2011-05-06 10:04:48sdaodensetmessages: + msg135272
2011-05-06 02:46:35r.david.murraysetfiles: + mailbox_mtime.patch
2011-05-06 02:46:17r.david.murraysetfiles: - mailbox_mtime.patch
2011-05-06 02:43:57r.david.murraysetfiles: + mailbox_mtime.patch

messages: + msg135259
2011-05-04 12:30:59r.david.murraysetnosy: + r.david.murray
2011-05-04 12:09:26sdaodensetfiles: + 11999.1.diff

nosy: + sdaoden
messages: + msg135126

keywords: + patch
2011-05-04 11:17:21haypocreate