classification
Title: MMDF/MBOX mailbox need utime
Type: Stage:
Components: Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, r.david.murray, sdaoden
Priority: normal Keywords: patch

Created on 2011-04-27 11:39 by sdaoden, last changed 2011-09-17 16:14 by sdaoden. This issue is now closed.

Files
File name Uploaded Description Edit
11935.2.diff sdaoden, 2011-05-02 11:03 review
Messages (20)
msg134552 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-27 11:39
According to the de-facto MBOX standard [1] and the MMDF
description [2] mtime and atime are used to detect wether
a mailbox has new mail:

   If the mtime on a nonempty mbox file is greater than the
   atime, the file has new mail.

For [1] this is documented under "UNSPECIFIED DETAILS", though.
The attached patch enables MUAs like mutt(1) to show the
new-mail-has-arrived status in the overview, too.
Note i've randomly chosen 3 seconds; it also could have been 5.

[1] http://qmail.org/man/man5/mbox.html
[2] http://linux.die.net/man/5/mmdf
msg134888 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-30 22:15
The problem with this patch is that it would also show 'new mail' if what had in fact happened was that a message had been *deleted* (see the comments at the beginning of the flush method).  So actually fixing this is a bit more complicated.

A proper fix for this should also consider fixing issue 7359.
msg134958 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-02 10:58
On Sun,  1 May 2011 00:15:11 +0200, R. David Murray <report@bugs.python.org> wrote:
> The problem with this patch is that it would also show 'new
> mail' if what had in fact happened was that a message had been
> *deleted* (see the comments at the beginning of the flush
> method).  So actually fixing this is a bit more complicated.

Well i don't think so because MUA's do some further checks,
like checking the size and of course the status of each mail;
some indeed use the mtime as an entry-gate for further inspection.
And deleting an entry should surely pass that gate, too.

Please do see the file mbox.c of the mutt(1) source repository,
which in fact seems to have been used as an almost copy-and-paste
template for the implementation of large parts of mailbox.py.

But note that i just search less than five minutes in mailbox.py
to find a place where i can add the code of the patch (after
i've added an identical workaround for my S-Postman), so of course
it may not catch all cases.  One error is obvious: it also sets the
mtime for that Babylon format.  I don't use emacs and i'm
a buddhist so i don't care about that Babylon mess anyway.
Right?

> A proper fix for this should also consider fixing issue 7359.

Hm.  #7359 refers to misconfiguration, not to Python or
mailbox.py.  Maybe Doc/library/mailbox.rst should be adjusted to
give users which are new to UNIX a hint about ,group mail` and the
set-group-ID on directories?  I think this would really be a good
thing?!?!  Should i open an issue on that?

But again, mailbox.py reflects almost one-to-one (except for the
naive file lock handling in comparison and AFAIK) mutt(1)'s
mbox.c, and i think that if mutt(1) does
create-temp-work-work-work-rename then this should be ok for
mailbox.py, too.

Did you know that ,bin Laden` ment ,am loading` in german?
msg134959 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-02 11:03
I'll attach a patch with a clearer comment (entry-gate instead
"new mail"), i.e. the comment now reflects what MUAs really do.
msg134999 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-02 19:09
On Sun,  1 May 2011 00:15:11 +0200, R. David Murray <report@bugs.python.org> wrote:
> So actually fixing this is a bit more complicated.

I like Pear OS X!
I've just tried around a bit with the other of the Pear OS
filesystems (claims to "support" two fsys: HFS+ and HFS, case
sensitive, the latter may indeed be UFS).
The following (reduced) HFS, case sensitive (UFS?) session brings
out the problem:

Access: Mon May  2 20:49:51 2011
Modify: Mon May  2 20:49:30 2011
Change: Mon May  2 20:49:50 2011
>>> os.utime('org.python', (t-3,t))
Access: Mon May  2 20:50:19 2011
Modify: Mon May  2 20:49:30 2011
Change: Mon May  2 20:50:17 2011
>>> os.utime('org.python', (t-3,t))
Access: Mon May  2 20:51:12 2011
Modify: Mon May  2 20:49:30 2011
Change: Mon May  2 20:51:11 2011

Thus the HFS, case sensitive (UFS?) implementation of f?utimes(2)
updates ctime.
It also updates atime to one+ second *after* ctime.
Well, i'm lucky it doesn't update mtime too, because ,man 2 stat`
*does* document it!!!
So, to get around yet another Pear OS X bug (unless it's really
UFS which, then, maybe has been taken as-is from FreeBSD??) i've
changed the S-Postman to do

    os.utime(self._path, (currtime, currtime+2.42))

instead, i.e. i'm using a future date.  B-/
I don't think this approach can be used by Python, however.
Non-believers:

posix.stat_result(...st_atime=1304363003, st_mtime=1304363005, st_ctime=1304363005)
>>> print(os.stat('org.python'))
posix.stat_result(...st_atime=1304363011, st_mtime=1304363005, st_ctime=1304363005)

So in fact it seems as if asynchronously the atime is updated once
again by someone, don't know who, because actually this system is
non-journaled.  Pear OS X is a Microkernel, AFAIK.
msg135003 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-02 19:43
Sorry, the last message has been truncated,
i've opened http://psf.upfronthosting.co.za/roundup/meta/issue397.
Forget the first line, but for non-believers:

PYP$ t=time.time(); os.utime('org.python', (t-2.42,t)); print(os.stat('org.python'))
posix.stat_result(...st_atime=1304363003, st_mtime=1304363005, st_ctime=1304363005)
PYP$ print(os.stat('org.python'))
posix.stat_result(...st_atime=1304363011, st_mtime=1304363005, st_ctime=1304363005)
msg135169 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-05 01:52
Not all system mail spools are mode 1777.  Mutt needs to be setgid mail on systems that aren't, if I understand correctly.  Making a python program setgid mail is a bit more of security issue than making a well-tested C program setgid, since it is easier to break out of the box in a python program.

I'm pretty sure that the shell does not parse the mbox when it produces its 'you have new mail' message.  I believe it just looks at the mtime/atime.

mailbox is an mbox manipulation program, not a mail delivery agent.  If you are using it to write a mail delivery agent, I think perhaps the mtime setting code belongs in your application, not the mailbox module.
msg135184 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 10:54
On Thu,  5 May 2011 03:52:29 +0200, R. David Murray wrote:
> [..] the shell [..] I believe it just looks at the mtime/atime.

/* check_mail () is useful for more than just checking mail.  Since it has
   the paranoids dream ability of telling you when someone has read your
   mail, it can just as easily be used to tell you when someones .profile
   file has been read, thus letting one know when someone else has logged
   in.  Pretty good, huh? */

          /* If the user has just run a program which manipulates the
             mail file, then don't bother explaining that the mail
             file has been manipulated.  Since some systems don't change
             the access time to be equal to the modification time when
             the mail in the file is manipulated, check the size also.  If
             the file has not grown, continue. */

         /* If the mod time is later than the access time and the file
             has grown, note the fact that this is *new* mail. */

> Not all system mail spools are mode 1777.  Mutt needs to be
> setgid mail on systems that aren't, if I understand correctly.
> Making a python program setgid mail is a bit more of security
> issue than making a well-tested C program setgid, since it is
> easier to break out of the box in a python program.

Ok, maybe set-group-ID on /var/mail isn't even necessary;
    0 drwxrwxx-x    3 root      mail       102  5 May 11:30 mail
is enough as long as
    $ groups $USER
states you are member of group mail.  On my system mailbox.py
doesn't have any problems with modifying the mail directory.
If this is not true on your box go and stress your admin, he's not
worth his money - is he?
I.e., whereas it is possible to rewrite mailbox.py to handle issue
#7359 i would not do so because it is unnecessary on correctly
setup boxes.  Maybe mailbox.py has used so much copy-and-paste
from mutt(1)'s mbox.c because that code works well for many years.
And Jason seems to work as root all of the time.

> mailbox is an mbox manipulation program, not a mail delivery
> agent.  If you are using it to write a mail delivery agent,
> I think perhaps the mtime setting code belongs in your
> application, not the mailbox module.

I really don't understand your point now.
Of course the standart is soft like butter in that it seems to
assume that the spool mailbox is then locally processed and
truncated to zero length, so that "mailbox has grown==new mail
arrived", whereas it is also possible to use that spool file as
a real local mailbox, including resorting, partial deletion etc..

This issue is about fixing mailbox.py to adhere MMDF and MBOX
standarts, which is what the patch implements.
This patch works for me locally in that mutt(1) will mention that
new mail has arrived in the boxes.

The patch uses a safe approach by dating back the access time
instead of pointing modification time into the future, which
however will make the patch fail on Pear OS X if the mailbox is on
HFS, case sensitive, because that is buggy and *always* updates
atime; maybe this is because Apple only provides a shallow wrapper
around UFS to integrate that in the Microkernel/IOKit structure,
just in case HFS, case sensitive is really UFS, but i'm guessing
in all directions here.  I would not adjust the patch to fix this,
but the problem exists and it has been noted in this issue.

--
Steffen
sdaoden@gmail.com
msg135185 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 10:58
On Thu,  5 May 2011 03:52:29 +0200, R. David Murray wrote:
> [..] the shell [..] I believe it just looks at the mtime/atime.

   Pretty good, huh?

Mr. Mojo says:

    Prowd to be a part of this number.
    Successful hills are here to stay.
    Everything must be this way.
msg135190 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-05 11:42
Yes if you are a member of group mail you would not need setgid mail, obviously.

The problem report in question was submitted by one of the Debian maintainers, so I have to believe that the system in question was not misconfigured.  This part of the discussion should move to that issue, I think.  I guess I was wrong to link them :)

So, if the mailbox code is imitating mutt (and it may well be, the bulk of it was written as a summer of code project in 2005), what does mutt do in the case you are talking about?
msg135207 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 14:28
On Thu,  5 May 2011 13:42:29 +0200, R. David Murray wrote:
> what does mutt do in the case you are talking about?

    16 -rwxr-s---  1 steffen  mail  14832 23 Jan 19:13 usr/bin/mutt_bitlock
    set bitlock_program="~/usr/bin/mutt_bitlock -p"

I see.  Unfortunately the world is not even almost perfect.
So should f?truncate(2) be used if the resulting file is empty?

> what does mutt do in the case you are talking about?

Otherwise there is only one solution: a mailbox-is-readonly policy
has to be introduced.
That will surely drive users insane which see that they in fact
have write access to the file.
Python has got bad cards.
msg135209 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 14:42
> The problem report in question was submitted by one of the
> Debian maintainers.

Yeah, a documentainer at least.
I've used Debian (Woody i think that was 3.1).
Actually great because Lehmanns, Heidelberg, Germany did not
include the sources but they've sent me the sources (on seven CD's
as far as i remember) for free after i've complained.

Linux is really great.  You don't need internet access at all
because of that fantastic documentation, everywhere.  You look
into /dev and /sys and /proc and it's all so translucent!!  And
the GNU tools and libraries - they are so nicely designed.
The source code is so clean.  It's really an enlightened system.

Then i discovered FreeBSD 4.8 which released me from all that.
\|/
_ .
 |
 -
 |
(I still had hairs at that time.  But that was long ago.)
msg135213 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-05 15:47
Steffen, your sense of humor is great, but oftentimes I have no clue what you are talking about.  Where does ftruncate factor in?

I was asking what mutt does when it modifies a file in the hopes that it had some pithy algorithm for making sure the mailbox atime and utime conform to the semi-standard you are talking about, so we could steal it.

I'd like to see a solution to this issue.  My two problems with your patch are (1) it feels wrong to set the atime earlier than the last actual atime and (2) unconditionally doing the work in flush means it might get set even when there wasn't an intended "new mail" condition.

In other words, I think the fix is ugly :).  However, neither of those concerns are necessarily blockers.  Practicality beats purity in many cases, and this may be one of them.
msg135221 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 16:38
After half an hour of shallow inspection.

mutt really modifies mailbox files in place (mbox_sync_mailbox())
after creating (all the new tail in) a temporary file.  Then
seek()/write()/truncate() etc..  It however has mutt_dotlock(1)
and it does block signals and it is a standalone program and thus
i don't think this behaviour can be used by Python.

In respect to our issue here i must really admit that mutt does:

    prepare new tail
    stat box
    modify box to incorporate tail
    close box
    utime box with stat result times
    reopen box

So actually the result looks as if it never has been modified.
But maybe it is because like this it is in sync with the standart,
since strictly speaking there is no *new* mail in the box.

Unless you vote against it i'll write a patch tomorrow which will
use a state machine which only triggers the utime if some kind of
setitem has occurred.  I can't help you to overcome your malaise
against soiling an atime's pureness.
'Really want a future date??
msg135224 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-05 17:04
"prepare new tail" means all of the text from the first modified line to the end?  (As opposed to "just the new mail"?)

mailbox does locking.  I see no reason in principle it couldn't stat/restore, it would just be setting the times on the new file rather than on a truncated/rewritten old file.  How hard that would be to incorporate into the existing logic I have no idea.  Of course there may be issues about this I haven't thought of.

From what you said, if mutt is the model for what mailbox should do, it shouldn't set the mtime later than the atime itself, it should only preserve it if it already was.  Which was my point about using mailbox as a delivery agent: if you *are* using it as a delivery agent, then the application using it as a delivery agent would be the one to set the mtime greater than the atime.  mailbox itself would (following the mutt model) just be preserving the existing relationships.

Do you think the mutt model is a good one to follow?
msg135225 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-05-05 17:04
Oh, and does mutt's behavior apply to any mbox, or only the one in the system spool?
msg135248 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 21:21
On Thu,  5 May 2011 19:04:16 +0200, R. David Murray wrote:
> "prepare new tail" means all of the text from the first modified
> line to the end?  (As opposed to "just the new mail"?) mailbox
> does locking.  I see no reason in principle it couldn't
> stat/restore, it would just be setting the times on the new file
> rather than on a truncated/rewritten old file.  How hard that
> would be to incorporate into the existing logic I have no idea.
> Of course there may be issues about this I haven't thought of.

Me too and even more.
Clearly mailbox.py cannot do any dotlocking due to missing
permissions, so this is silently ignored to be able to proceed at
all.  Therefore only fcntl/flock locking is used for
a /var/{spool/}mail box by mailbox.py.  This is fine as long as
all programs agree in locking such a file in the usual way, that
is, use both, dotlocking *and* flock/lock, and restart from the
beginning if one of them fails due to another program holding that
very lock.  mutt does that but i won't do any bet here.

And then the signal handling, and Python even supports threading,
and it is embeddable and there may be third-party modules also
involved.  This is the Death Valley of programming.

    $PYP mb.flush()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 659, in flush
        new_file = _create_temporary(self._path)
      File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 2061, in _create_temporary
        os.getpid()))
      File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 2051, in _create_carefully
        fd = os.open(path, os.O_CREAT | os.O_EXCL | os.O_RDWR, 0o666)
    OSError: [Errno 13] Permission denied: '/var/mail/steffen.1304628960.sherwood.local.37135'

So this seems to be the safest and most useful approach in this
context, because i do not want to imagine what happens if
something weird occurs in the middle of writing "the tail"
otherwise.  So i stop thinking about issue #7359.

> Do you think the mutt model is a good one to follow?

You mean resetting atime/mtime back to before the rename?
I don't like that and i don't understand it because the file has
been modified, so i think i would do (now,now) in that case
instead (because of the MMDF/MBOX newer==new mail case).
And in case a new mail has been inserted (now-2.42,now).
msg135288 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 12:46
@david: note i got stuck on updating my patch for mailbox.py and
switched to do test_mmap.py instead, so that i don't know wether
i will be able to finish it today.  Is it really true that
mailbox.py even writes mailboxes without locking in case of an
appending write?  So i really have to look at that before i will
proceed and write the patch.
msg135791 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-11 18:39
For the record:
On Mac OS X 10.6.7, ,HFS, case sensitive` updates st_atime by
itself *once only*.  It does so ~0.75 seconds after os.utime() (+)
was called.  A time.sleep(0.8) can be used to detect this automatic
update reliably (about 50 tests with changing load all succeeded).
msg144203 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-09-17 16:14
Let me close this!
I've just recently removed the real patch from my postman's "next"
branch, because even that real implementation doesn't work reliable.
I.e., please forget msg135791.  It was true, but on the long run
mutt(1) sometimes sees all, sometimes only some (really nuts), but
most of the time it simply does not see just any box with new mail.
That is, that "plugged-in filesystem" is simply handled as a pendant.

Remarks: because that stdlib MBOX whispered
  "Where Are Tho{u}, Brother"
to me all the {time}, i've done my own, also just recently:

== postman:
  - test: 321 messages (5083760 bytes) [action=hunky-dory]
  = Dispatched 321 tickets to 1 box.
  [69853 refs] real 0m35.538s user 0m6.760s sys 0m0.904s
..
  = Dispatched 1963 tickets to 1 box.
  [93552 refs] real 0m38.860s user 0m8.697s sys 0m0.985s
== stdlib:
  [83010 refs] real 1m3.862s user 0m10.151s sys 0m7.500s
  [93217 refs] real 7m24.958s user 2m0.174s sys 1m35.163s

Was worth it.
Have a good time!
History
Date User Action Args
2011-09-17 16:14:16sdaodensetstatus: open -> closed

messages: + msg144203
2011-05-11 18:39:56sdaodensetmessages: + msg135791
2011-05-06 12:46:48sdaodensetmessages: + msg135288
2011-05-05 21:21:13sdaodensetmessages: + msg135248
2011-05-05 17:04:51r.david.murraysetmessages: + msg135225
2011-05-05 17:04:15r.david.murraysetmessages: + msg135224
2011-05-05 16:38:38sdaodensetmessages: + msg135221
2011-05-05 15:51:10belopolskysetnosy: + belopolsky
2011-05-05 15:47:03r.david.murraysetmessages: + msg135213
2011-05-05 14:42:16sdaodensetmessages: + msg135209
2011-05-05 14:28:17sdaodensetmessages: + msg135207
2011-05-05 11:42:22r.david.murraysetmessages: + msg135190
2011-05-05 10:58:00sdaodensetmessages: + msg135185
2011-05-05 10:54:12sdaodensetmessages: + msg135184
2011-05-05 01:52:28r.david.murraysetmessages: + msg135169
2011-05-02 19:43:22sdaodensetmessages: + msg135003
versions: + Python 3.4
2011-05-02 19:09:43sdaodensetmessages: + msg134999
2011-05-02 11:05:53sdaodensetfiles: - mailbox.diff
2011-05-02 11:03:29sdaodensetfiles: + 11935.2.diff

messages: + msg134959
2011-05-02 10:58:27sdaodensetmessages: + msg134958
2011-04-30 22:15:10r.david.murraysetmessages: + msg134888
2011-04-27 12:54:13pitrousetnosy: + r.david.murray
2011-04-27 11:39:36sdaodencreate